High-Speed Digitizers

cancel
Showing results for 
Search instead for 
Did you mean: 

NI 5112 DMA Performance

Hi,
I am currently doing continuous data acquisition with the NI 5112 using an sample rate of 2*20 MSamples/s (using fetchBinary8) for software defined radio. I transfer the data from the 5112 into the main memory in real-time and it works fine.

However, this continouos data transfer needs much CPU performance of nearly 100% of one 3 GHz P4 (I have a dual-processor system) and I would need this CPU also to process the data. I don't understand that since DMA transfer should not require CPU cycles. Has anyone an explanaition of that?

thanks, thomas
0 Kudos
Message 1 of 5
(6,748 Views)
What is the "chunk size" when you are continuously fetching data? If
you're using LabVIEW and you leave the number of points to fetch as -1
in the read/fetch VI, the VI will return as soon as it finds a non-zero
number of points to fetch. Depending on how fast your computer is and
your sample rate, you may be getting only a few points per iteration in
the loop. Setting the number of points to a larger value, say 1M for
example, may improve your CPU usage. Also, be sure to set a non-zero
timeout value because if the timeout is zero, the available points are
returned regardless of the number of points requested. Also, is this system in PCI or PXI?

The CPU usage may still be higher than you expect because we DMA the
data from the board to a temporar
y buffer before copying it to your
buffer. We do this because we can only transfer a minimum of 256 bytes
at a time with DMA. If an acquisition is less than that size or a
non-multiple of 256 bytes, some of the points at the beginning and end
of the acquisition will be invalid. Since we don't want to make the
user have to think about all that stuff, we return a buffer that is
exactly what was asked for. Unfortunately, that requires an extra copy
of the data. We may add direct DMA to the user buffer in a future
release of NI-SCOPE.

I'd be interested to hear how well your application still runs after you
add your processing algorithms into your program. You may find that the
CPU yields time and it still works ok. If not, check back with us and
we can look at alternative ways to improve the performance.
Message 2 of 5
(6,748 Views)
1M may be too high for a chunk size. As with all things of this nature, you should run a series of benchmarks and find out the optimum chunk size for your computer and your application. I typically use 300k, but I have a three-year-old Pentium III 650. Your value will probably differ. I have been able to effect my performance by an order of magnitude or more by changing the chunk size. Good luck! Let us know how you do and what values you ended up with.
0 Kudos
Message 3 of 5
(6,748 Views)
First thank you Josh and DFGray for your interesting answers. To clarify, I use Visual C and I hava PCI system.

I played around with the chunk size and from about 256k to 512k I find optimal performance for 2*16.6 MHz sample rate. I also changed to a non-zero timeout value. However, when using 2*20 MHz always 100% of one CPU (P4, 3GHz, HT disabled) was occupied. If I use a 'bad' chunk size there, the old data gets overwritten, before I got transferred in the main memory and the rountine stops.

It is very interesting that for 2*10 MHz only 10% of one CPU (P4, 3GHz, HT disabled) was used, and for 2*2MHz the CPU was virtually idle. There seems to be a strong non-linear increase when the data-transfer rate approaches 40 MSamples/s. By the way, this was
the maximum transfer rate I could get continuously.

A major improvement was found when I activated the Hyper-threading Technology HT on the Pentium IV. After that the 2*20 MHz used only 50 % of one CPU, i.e. the DMA transfer blocks one of two virtual CPU of one P4. Is there e an explaination for that?

Regarding the direct DMA transfer: The NI-SCOPE is now very easy to use and I highly appreciate that I do not have to care for the details you mentioned. However, for our application (GPS-signal processing) we would actually need as much processing power as we can get (basically to increase the number of channels). So if there is some more or less easy way to reduce the processing load used for the DMA transfer and also to increase the contiuous transfer rate to 2*33.3 MHz I would be very interested in that.

thanks again, thomas
0 Kudos
Message 4 of 5
(6,748 Views)
The theoretical maximum transfer rate for the 5112 is 66.6MBytes/sec, so it is unlikely you will be able to achieve 2x33.3MHz with the setup you have. There is a possible solution, but you would have to spend a lot more cash to achieve it. You would need a separate PCI bus for each of two 5112s. Take data from a single channel of each. You could theoretically get to 50MBytes/sec on each (limited by the highest available sample rate less than 66.6MBytes/sec), provided the chipset on the motherboard will handle this (it should, you usually get multiple PCI buses on servers, which are designed for such things). I would check your current setup for a single channel acquisition at 50MBytes/sec before doing this, as memory copies and other problems m
ay prevent you from getting there. You should be able to do 33.3MBytes/sec on one channel, given your current performance.
0 Kudos
Message 5 of 5
(6,748 Views)