Fetching many records all at once is no faster than fetching one at a time

ESD · ‎04-09-2008

Hello,

I am having a problem getting NI-Scope to perform adequately for my application. I am sorry for the long post, but I have been going around and around with an NI engineer through email and I need some other input.

I have the following software and equipment:
LabView 8.5
NI-Scope 3.4
PXI-1033 chassis
PXI-5105 digitizer card
DELL Latitude D830 notebook computer with 4 GB RAM.

I tested the transfer speed of my connection to the PXI-1033 chassis using the niScope Stream to Memory Maximum Transfer Rate.vi found here:
http://zone.ni.com/devzone/cda/epd/p/id/5273. The result was 101 MB/s.

I am trying to set up a system whereby I can press the start button and acquire short waveforms which are individually triggered. I wish to acquire these individually triggered waveforms indefinitely. Furthermore, I wish to maximize the rate at which the triggers occur. In the limiting case where I acquire records of one sample, the record size in memory is 512 bytes (Using the formula to calculate 'Allocated Onboard Memory per Record' found in the NI PXI/PCI-5105 Specifications under the heading 'Waveform Specifications' pg. 16.). The PXI-5105 trigger re-arms in about 2 microseconds (500kHz), so to trigger at that rate indefinetely I would need a transfer speed of at least 256 Mb/s. So clearly, in this case the limiting factor for increasing the rate I trigger at and still be able to acquire indefinetely is the rate at which I transfer records from memory to my PC.

To maximize my record transfer rate, I should transfer many records at once using the Multi Fetch VI, as opposed to the theoretically slower method of transferring one at a time. To compare the rate that I can transfer records using a transfer all at once or one at a time method, I modified the niScope EX Timestamps.vi to allow me to choose between these transfer methods by changing the constant wired to the Fetch Number of Records property node to either -1 or 1 repectively. I also added a loop that ensures that all records are acquired before I begin the transfer, so that acquisition and trigger rates do not interfere with measuring the record transfer rate. This modified VI is attached to this post.

I have the following results for acquiring 10k records. My measurements are done using the Profile Performance and Memory Tool.
I am using a 250kHz analog pulse source.
Fetching 10000 records 1 record at a time the niScope Multi Fetch Cluster takes a total time of 1546.9 milliseconds or 155 microseconds per record.
Fetching 10000 records at once the niScope Multi Fetch Cluster takes a total time of 1703.1 milliseconds or 170 microseconds per record.

I have tried this for larger and smaller total number of records, and the transfer time per is always around 170 microseconds per record regardless if I transfer one at a time or all at once. But with a 100MB/s link and 512 byte record size, the Fetch speed should approach 5 microseconds per record as you increase the number of records fetched at once.

With this my application will be limited to a trigger rate of 5kHz for running indefinetely, and it should be capable of closer to a 200kHz trigger rate for extended periods of time. I have a feeling that I am missing something simple or am just confused about how the Fetch functions should work. Please enlighten me.

Jennifer O · ‎04-12-2008

Hi ESD

Your numbers for testing the PXI bandwidth look good. A value of approximately 100MB/s is reasonable when pulling data accross the PXI bus continuously in larger chunks. This may decrease a little when working with MXI in comparison to using an embedded PXI controller. I expect you were using the streaming example "niScope Stream to Memory Maximum Transfer Rate.vi" found here: http://zone.ni.com/devzone/cda/epd/p/id/5273.

Acquiring multiple triggered records is a little different. There are a few techniques that will help to make sure that you are able to fetch your data fast enough to be able to keep up with the acquired data or desired reference trigger rate. You are certainly correct that it is more efficient to transfer larger amounts of data at once, instead of small amounts of data more frequently as the overhead due to DMA transfers becomes significant.

The trend you saw that fetching less records was more efficient sounded odd. So I ran your example and tracked down what was causing that trend. I believe it is actually the for loop that you had in your acquisition loop. I made a few modifications to the application to display the total fetch time to acquire 10000 records. The best fetch time is when all records are pulled in at once. I left your code in the application but temporarily disabled the for loop to show the fetch performance. I also added a loop to ramp the fetch number up and graph the fetch times. I will attach the modified application as well as the fetch results I saw on my system for reference. When the for loop is enabled the performance was worst at 1 record fetches, The fetch time dipped around the 500 records/fetch and began to ramp up again as the records/fetch increases to 10000.

Note I am using the 2D I16 fetch as it is more efficient to keep the data unscaled. I have also added an option to use immediate triggering - this is just because I was not near my hardware to physically connect a signal so I used the trigger holdoff property to simulate a given trigger rate.

Hope this helps. I was working in LabVIEW 8.5, if you are working with an earlier version let me know.

Message Edited by Jennifer O on 04-12-2008 09:30 PM

ESD · ‎04-13-2008

Jennifer O,

Thank you for your reply. Yes I was using the "niScope Stream to Memory Maximum Transfer Rate.vi" to measure my PXI connection transfer rate and the results look good for that.

I am using LabView 8.5 with NiScope 3.4 and I ran your VI. But, I get much different results for the "Records per Fetch" graph. My results are nearly the same wether I disable that for loop or not. I can't save my graphs in a format that the forums will accept, they look like what you describe seeing when the for loop was enabled, only I see that regardless of wether I enable or disable the for loop.

I've also attached another VI that let's you compare the transfer rate and records per second for fetching all at once versus one at a time. I get the best results for fetching 100 records at once. But this is still only about 6000 records per second. The records are 100 samples long. My transfer speed with this arrangement is about 4.5 MB/s
If I make the records really long and fetch just a few at a time, then my transfer speed is very close to 100 MB/s, but this won't work for my application.

In the end I want to have short records, around 1-2 hundred samples, and be able to transfer at a much higher number of records per second than I can do now. I don't see why I can't transfer multiple records at the PCI bus speed, but I can transfer one large record at that speed.

Thanks for your help.

HSD · ‎04-14-2008

The 5122 stores each record in separate segments of the memory. For each record the data from the digitizer streams into a circular buffer until a trigger arrives. After a trigger arrives the digitizer streams the data for next record into the next segment of memory. By default the digitizer is setup to allocate records as large as possible in onboard memory. For instance if you have 256 MB of onboard memory per channel and configure 100 records then the onboard memory is split into 100 equal segments with 25.6 MB for each record (even if you configure a record length of 100 samples). This method of allocation is the default because it lowers the possibility of getting data overwrite errors.

There are two things to note here. First, the driver will not fetch all records at the sametime in the default case if it would cause slower performance (in the example above 1 fetch of 256 Mb is slower than 100 fetches of 100 samples). Second, data is stored in circular buffers this allows the user to fetch data relative to "now" or "start" using the fetch relative attribute. A concequence of this functionality is that the driver needs to figure out where the data the user wants is located in the memory segment. As a result transfering data off the device is not straight forward. There is overhead associated with each record.

There is a way to change the memory layout by setting the "enable records greater than memory" attribute to true. This will tell the driver to allocate records very compact. So for the example above we would allocate segments of 100 samples + padding that will fill up the onboard memory (there are more than 100 segments). Now when you fetch the data off the board the driver will pool the records for one DMA operation. The driver must still DMA all the padding in the record. The driver also needs to figure out where the users data is inside each record, so the performance is not as good as one record streaming.

I modified your example.

As a side note we are working on some documentation that will be part of the driver help to clarify all these streaming usecases because none of this is simple or obvious.

Hope this helps,

Kunal

HSD · ‎04-14-2008

I just read your orginal post again and realized you are using the 5105. This device doesn't currently support pooling multiple records into one fetch transfer. We do plan on supporting this feature in the future (it is in development). I'll post an update when I know the time frame for this support.

Kunal

ESD · ‎04-14-2008

HSD,

I am anxious to hear when this will become available for the 5105 board. In the meantime, can someone please show me where there is any indication in the PCI/PXI-5105 specifications or Ni-Scope 3.4 documentation that says the 5105 doesn't support multiple records in one transfer or the Records > Memory property. The documentation suggests that all SMC devices support this, but dklipec in this thread says the 5105 is the only one that doesn't.

This should be in the documentation. At the very least the salesman and Application engineers should know about it.

dklipec · ‎09-05-2008

As a follow-up. As of NI-Scope 3.5 - the 5105 (and all SMC devices supported by NI-Scope 3.5) support the multirecord optimization.

StevenUK · ‎10-07-2008

In the 4th Post, Kunal Posted;

As a side note we are working on some documentation that will be part of the driver help to clarify all these streaming usecases because none of this is simple or obvious.

Is this doc avalible now - would you be able to point me to it - these Digitizers are hard to get to grips with.

Cheers

Jennifer O · ‎10-08-2008

Hi Steven,

Unfortunately this documentation is not yet available. I will try to post back here with an update when it is, so you can check back periodically or subscribe to the thread to be notified when there is a new post.

In the meantime, perhaps I can help answer some of your questions or point you to some of the resources that are currently available. To keep things organized, I will do this on the other thread that you started: Limitless number of records and Software Event/Interrupt.

ESD · ‎04-29-2009

Sorry to dig up an old thread. I have just upgraded to the newest NI-Scope driver version 3.5.1. I am using the PXI-5105 board and I find that the multi-record optimization that is supported in this version only actually works when you have the property "Enable Record > Memory" set to true. As I mentioned at the start of this thread, I am acquiring many short waveforms (around 300 samples each) so this multirecord optimization makes a huge difference in my transfer speeds. You can see the difference in transfer speeds when turning "Enable Record > Memory" on and off with the attached VI.

So why doesn't the optimization work when "Enable Record > Memory" is set to false?

High-Speed Digitizers

Fetching many records all at once is no faster than fetching one at a time

Fetching many records all at once is no faster than fetching one at a time

Re: Fetching many records all at once is no faster than fetching one at a time

Re: Fetching many records all at once is no faster than fetching one at a time

Re: Fetching many records all at once is no faster than fetching one at a time

Re: Fetching many records all at once is no faster than fetching one at a time

Re: Fetching many records all at once is no faster than fetching one at a time

Re: Fetching many records all at once is no faster than fetching one at a time

Re: Fetching many records all at once is no faster than fetching one at a time

Re: Fetching many records all at once is no faster than fetching one at a time

Re: Fetching many records all at once is no faster than fetching one at a time