Driver Development Kit (DDK)

cancel
Showing results for 
Search instead for 
Did you mean: 

Analog out DMA performance problems

I'm working on an open-source driver for m-series and e-series boards (http://www.comedi.org). I've discoved some performance problems doing dma to analog outputs that I can't resolve. In summary, dma transfers to the analog output of a PXI-6281 in a pxi crate being controlled through a mxi-4 connection (pxi-pci8336) are VERY slow. I'm talking 250k samples/sec slow. That's the maximum speed the dma controller can fill the board's analog output fifo from host memory. I've also got an older PXI-6713 in the same crate, and dma transfers to it are about 15 times faster (about 3.5M samples/sec). I did notice that clearing the dma burst enable bit in the mite chips channel control register caused the 6713 to slow way down to something comparable to the 6281 (about 500k samples/sec). Setting or clearing the burst enable bit had no effect on the speed of the 6289. Is there some special mojo that needs to be done to enable burst transfers on the 6289? Also, even the relatively speedy 6713 does dma transfers much slower than it should, since the pxi-pci8336 advertises 80MB/sec sustained transfer rates over mxi4. Can you provide any insight into this matter? I've already looked through the ddk, and a register level document describing the mite chip, and example code which had chipobjects for the mite and an analog input example.

By the way, dma transfers for analog input on the 6281 weren't as bad, I didn't measure the transfer time, but I was at least able to do input at 500k samples/sec without fifo overruns.

I'll post more detailed performance measurements in a subsequent post, and include measurements for a couple other similar pci boards (a pci-6289 and pci-6711). In case you're wondering, neither of the pci boards get anywhere close to the bandwidth provided by the pci bus, but they're not as spectacularly bad as the pxi-6281.
0 Kudos
Message 1 of 13
(12,445 Views)
Here are my measurements:

PCI-6711, tested on 1.4GHz Pentium 4:
5.2 to 5.3 milliseconds to load fifo to half-full using dma. 0.9 to 1.0 microseconds to write to a 16-bit register. 1.9 to 2.1 microseconds to read from a 16-bit register. The mite's burst enable bit has no effect.

PXI-6713, tested on 3.2GHz Pentium 😧
2.2 to 2.4 milliseconds to load fifo to half-full using dma. 0.5 to 0.7 microseconds to write to a 16-bit register. 5 to 7 microseconds to read from a 16-bit register. Turning off the mite's burst enable bit causes the dma fifo load time to increase to 16 to 17 milliseconds.

PCI-6289, tested on 3GHz Pentium 4:
2.0 to 2.2 milliseconds to load fifo to half-full using dma. 0.4 to 0.6 microseconds to write to a 16-bit register. About 1.2 microseconds to read from a 16-bit register. The mite's burst enable bit has no effect. I could do streaming analog output on 1 channel with an update rate of about 2.1MHz before the board's fifo started to underrun.

PXI-6281, tested on 3.2GHz Pentium 😧
18 to 19 milliseconds to load fifo to half-full using dma. 0.3 to 0.4 microseconds to write to a 16-bit register. 4 to 6 microseconds to read from a 16-bit register. The mite's burst enable bit has no effect. I could do streaming analog output on 1 channel with an update rate of about 250kHz before the board's fifo started to underrun.

Notes: the 671x boards have a 16k sample ao fifo, the 628x boards have 8k.

The 4 to 7 microseconds times to read a register on the PXI boards seems large too, is that normal overhead for going over the mxi-4 connection?

I wasn't doing anything else intensive on the pci bus during these tests. For what it's worth, according to pci specs the two pci boards should be able to dma their analog output fifos to half full in less than 150 microseconds.
0 Kudos
Message 2 of 13
(12,433 Views)
It appears the standout bad performance of the pxi-6281 was due to a bad pci-8336 mxi4 board. I swapped in a different pci-8336 board and the analog output dma speed improved by a factor of 4. Prior to swapping out the pci8336, I had also tried using a different model crate, and tried moving the old pci8336 to a different computer with no effect.
0 Kudos
Message 3 of 13
(12,404 Views)
Blah, scratch that about the pxi-6281 being fixed. Changing the mxi4 board had no effect. I must have fooled myself by accidentally writing to the 6713 when I thought I was using the 6281. I'm giving up on the pxi-6281.
0 Kudos
Message 4 of 13
(12,399 Views)
I did a few tests with DAQmx on Windows, MXI-3 and PXI-6289.  Not exactly the setup you are using, but it should give some clues if it's a hardware configuration problem.

The one parameter I noticed that really degraded performance is the request condition.  Whenever it's set to "Onboard memory Empty" I get even lower generation rates, indenpendent of the DMA buffer size. When set to "Onboard Memory Half Full or Less" or "Onboard Memory Less Than Full" generation rates go all the way up to 2.8 MS/s

At the register level the request condition is set using the AO FIFO Mode (Bitfield AO_FIFO_Mode in AO_Mode_2 register).  Valid values are:
   
   kAO_FIFO_ModeEmpty                       = 0,
   kAO_FIFO_ModeLess_Than_Half_Full         = 1,
   kAO_FIFO_ModeLess_Than_Full              = 2,
   kAO_FIFO_ModeLess_Than_Half_Full_to_Full = 3,

Let me know if changing the request condition makes any difference.

Diego
0 Kudos
Message 5 of 13
(12,325 Views)
Hi Diego, thanks for looking into this.

I've already tried changing the fifo request condition but it didn't help. I use half-full to full and I tried less than full (which is what the windows analog output test panel was using) but it didn't help the transfer rate (it did have the expected effect of making dma transfer happen constantly instead of waiting for the fifo to empty half-way. The fifo empty mode is expected to limit the maximum analog output rate, but that's a latency thing since it causes the dma controller to only try and keep one sample in the analog output fifo. My maximum analog output speed isn't limited by latency, but by total throughput. I measured the time it takes for the dma controller to fill the analog output fifo to half-full at the beginning of the waveform, before it has actually started. The dma controller should be going full-bore to fill the fifo.

One thing I noticed when reading the mite registers while a windows computer was doing analog out, is that the bits 0x8f0 (among others) are set in the channel control register. The bits 0x8f0 aren't mentioned in any of the mite docs I have. I tried setting them in my driver, but they didn't seem to have any effect.
0 Kudos
Message 6 of 13
(12,314 Views)
I'll take a closer look at the mite configuration.  As far as I can tell, there should not be any performance difference between MXI-3 and MXI-4, so I think I'm ok testing with the MXI-3 setup.  I'll let you know as soon as I have something you can try out.

Could you post the code (or part of it) showing the current mite configuration? Thanks.

Diego
0 Kudos
Message 7 of 13
(12,299 Views)
Here you go. Probably the mite_prep_dma() function is what you are interested in. We call it with num_memory_bits and num_device_bits equal to 16 for the m-series analog out. The windows driver seems to use 32 bit transfers to memory for analog output, which allows two 16 bit samples per 32 bit pci transfer. However, it also puts the samples into the analog output fifo in the wrong order, so I didn't bother.

If you want to look at any other files in comedi, instructions for checking it out of cvs are here:

http://www.comedi.org/download.html
Download All
0 Kudos
Message 8 of 13
(12,297 Views)
Here's another bit of information. When I read the mite status register while it is doing the initial load of the analog output fifo, only the "mrdy" and "drq" bits are set. "drdy" is not set. Also, the fifo count register usually reads as empty. Does this provide any hint as to what is slowing down the dma transfer? Do you have any insight into what causes the "drdy" bit to get set?
0 Kudos
Message 9 of 13
(12,263 Views)


@fmhess wrote:
One thing I noticed when reading the mite registers while a windows computer was doing analog out, is that the bits 0x8f0 (among others) are set in the channel control register. The bits 0x8f0 aren't mentioned in any of the mite docs I have. I tried setting them in my driver, but they didn't seem to have any effect.




Well, I figured out what two of the mystery bits do. Bit 4 does 32 bit byte swapping, and bit 6 does 16 bit byte swapping. They can both be set in combination to allow 32 bit memory to 16 bit device transfers (without them, the 16 bit samples get put into the fifo in the wrong order). I still have no idea what bits 5 and 11 are for though.
0 Kudos
Message 10 of 13
(12,249 Views)