PCI-5152 Varying processing time/lag/delay for different record lengths and number of records

KaiKeem · ‎05-31-2012

Hello,

First of all, I would like you to look at the graphs below.

1 record takes 53 us. This value is fixed. Records were triggered by a digital pulse train of 63 us period (about 16KHz frequency). Since the rearm time of PCI-5152 card is 8 us, this condition is very reasonable. Ideally, the time needs for one cycle should be linearly proportional to the number of records per cycle. However, a labview application or hardware will need some time to transfer the measured data. The goal is characterize the delay or processing time which multi record fetch vi needs to fetch different amount of data from PCI-5152 board to the PC memory. I assume the larger the amount of data (# of total data points or such) is the more time the application needs.

The result denies my assumption. Looking at the graph on the top right corner, the delay (processing time, lag... I seriously don't know the cause of this time discrepancy) actually decreases as the total amount of data the multi record fetch vi handles increases. It is actually the opposite to what I expected. I thought the delay would be proportional to the amount of data and it would be the time to take to transfer the data through the PCI bus (133MB/s).

When the number of records per cycle is below 500, the delay per cycle does increase. For the same number of records per cycle, the delay increases as the record length goes up (Tow graphs on the bottom). However, when the number of records per cycle is larger than 500, the delay shows unexpected behavior. The delay per cycle decreases as the number of records per cycles increases. Between 500 records and 1000 records, the delay increases, however, from 1000, the delay keeps decreasing. I have no explantion for this observation.

I would like to ask why the delay actually is smaller when the data stored in PCI-5152's onboard memory is larger. In addition, the delay is quite difficult to to characterized since it shows practically no reasonable trend. I would like other high speed digitizer users to help me figure out why this is happening. Thank you.

mudson · ‎06-01-2012

Hello KaiKeem,

Could you tell me a little more about the setup you used to gather these results?

In order to eliminate any system-dependent outliers, how many times did you run the benchmark tests?

Looking at the graphs, the top-right plot does certainly seem very strange.

The graph next to it relating # of cycles in 10s vs record size also looks like it has some anomalies, but not on the order that one might expect with the data displayed in the graph on the right.

Overall, the number of iterations completed in 10 seconds does follow a decreasing polynomial curve. This is consistent with an increasing per-cycle delay.The only series in the # of Iterations for 10 Seconds graph that exhibits genuienly anomalous behavior is the #4077 series. If we compare the Delay per Cycle with the # Iterations for 10 seconds for the #5300 series, we would expect a decrease in per-cycle-delay to result in an increase in the number of cycles per second, allowing more iterations in a 10 second span. However, a decreasing trend is observed in both plots. The two graphs on the bottom display much more expected behavior.

I will do more research on this end, and speak with my colleagues about what could contribute to the unusual behavior you have observed.

Best regards.

Matthew H.
Applications Engineer
National Instruments

KaiKeem · ‎06-01-2012

I ran the tests 10 times per each data point. In other words, each data point is an average value of ten measurements. The standard deviation of the measurements was zero.

One interesting obsevation is when the number of records is larger than 1000, depending on the record length, but usually, the delay per cycle is very minimal.

Since the period of the trigger pulse train is 63 us, assuming no delay, one cycle should take 63 us.

For example, when the number of records is 500, the number of iteration is expected to be about 317 with no delay ( 10/(500×63e-6) ~ 317.xx). However, in the test, the number of iterations for the number of records of 500 is between 250 and 150 depending on the record length as shown in the top left graph. Therefore, it is quite evident that there is a delay between iterations.

However, when the number of records is 2000, the number of iteration is expected to be about 79 with no delay (10/(2000x63e-6) ~ 79.xx). And in the test, the number of iterations for the number of records of 2000 is 78, which is nearly the ideal value. It can be interpreted that the delay between iterations is insignificant.

Looking at the top right graph, for the number of records over 2000, the delay is much smaller than that of the same record lengths for the number of records less than 1500.

At the moment, I am having a hard time giving measurement parameters for my VI due to this nonlinear abnormal behavior.

mudson · ‎06-04-2012

Hello KaiKeem,

I've spoken with a couple of my coworkers about this, and we'd like to know a little more about your setup.

In the graphs, is "cycles" synonymous with "iterations"?

Tell me about the physical configuration of your setup. How are signals being routed?

Some software-setup information will also be helpful.
Are you using Windows? Or a RealTime operating system?
What mechanism are you using to measure the per-Cycle overhead?

How are you taking the other measurements?

Thank you for your patience.

Matthew H.
Applications Engineer
National Instruments

KaiKeem · ‎06-05-2012

Yes, cycles means iterations (1 cycle of the while loops = 1 iteration of the while loop)

The physical configuration is NI PCI-5152 and NI PCIe-6353. 6353 generates a trigger pulse train (~16KHz), sends it to 5152 via RTSI.

I am using Windows 7 64Bit.

The following is how I calculate the overhead.

The period of the tirgger pulse train is 63.2111 us. The number of records is set N. Then, the time for a fetch (one cycle) is (N × 63.2111) us.

10 seconds is measured by placing a 'Get Date/Time In Seconds Function' outside the loop and one inside the loop and making the loop stop when the difference between those two functions' values are 10 seconds. The number of iterations is I. 10 s / (N × 63.2111 us) = the ideal number of iterations. By comparing the ideal number and the actually measured number I obtain the ovrehead.

I attached the VI I used to measure the overhead.

mudson · ‎06-06-2012

Hello KaiKeem,

I would recommend using much simpler code for any kind of benchmarking application. There is a lot going on in the code you have attached. IMAQ functions, queus and notifiers all introduce variable loading on the processor, and may interfere with the orderly execution of fetch commands.

I did not see any of the benchmarking tests you described though. Is the code you attached for a different project?

The physical setup and calculations you describe are sound. After poking around with the data in the top-right graph though, I noticed that there seems to be transition region somewhere around 2.038 - 2.039 MS/cycle. Have you tried running similar benchmarks with tighter granularity between 500 and 1500 records per cycle? Unfortunately, any discussion that might cover specific DMA optimization routines would involve National Instruments IP. As such this line of inquiry may be a dead end.

So let's step back and start at the top.

You were needing to characterize the performance of NI-SCOPE fetch. Why is this? There may be another way to meet the underlying need without quantifying the per-cycle overhead of the fetch command.

Matthew H.
Applications Engineer
National Instruments

KaiKeem · ‎06-07-2012

Hello,

I wrote the code in the way because I wanted to simulate the actual VI I was writing. The attached VI is a sort of simpler version of the goal. I also have a simpler VI which has only one loop with the niScope fetch function and unfortunately, it gives pretty much the same reuslts. So, my conclusion is the cause is not the heavy code. There is something else.

I ran benchmarks with a smaller increment between 100 and 500 but not between 500 and 1500. (from 100 to 500 result is in the graphs above) It seems from 100 to 1000 the overhead increase linearly proportional to the amount of data (record length × number of records). However, when the amount of data in the onboard memory is larger than a certain amount, which is not very clear, the overhead or the delay actually seems smaller.

PCI-5152 needs to fetch according to a pre-determined condition, the record length and the number of records. The record length can be any value between 400 and 7000 and the number of records can be between 100 and 4000. This is one iteration (or cycle) Let's call this a chunk (the record length × the number of records). The niScope has to send this chunk at a given frequency punctually. Ideally, there should be no delay or overhead time between chunks but there is. The punctuality is crucial. That's why I tested the overhead time for a given record length and number of records. This chunk is send to another loop via queue function and processed, analyzed and, if necessary, dispalyed as an image on the screen.

I would like to either minimize the overhead time between chunks (iterations of the while loop). If the frequency of the record reference trigger is 16KHz and the number of records is 400, the number of iterations of the while loop should be 40 per second and if the number of records is 4000, the number of iterations should be 4. However, the benchmarks show som discrepancies.

TimmTheEnchanter · ‎06-07-2012

Hi KaiKeem,

What results do you get when you enable the property Enable Records > Memory? There is a correlated LabVIEW shipping example for this.

-Andrew

National Instruments

KaiKeem · ‎06-07-2012

Every time I enable 'Enable Records > Memory?', it gives me errors. I wonder why. Does PCI-5152 support this?

TimmTheEnchanter · ‎06-07-2012

What errors? We'll need to be more specific.

-Andrew

National Instruments

High-Speed Digitizers

PCI-5152 Varying processing time/lag/delay for different record lengths and number of records

PCI-5152 Varying processing time/lag/delay for different record lengths and number of records

Re: PCI-5152 Varying processing time/lag/delay for different record lengths and number of records

Re: PCI-5152 Varying processing time/lag/delay for different record lengths and number of records

Re: PCI-5152 Varying processing time/lag/delay for different record lengths and number of records

Re: PCI-5152 Varying processing time/lag/delay for different record lengths and number of records

Re: PCI-5152 Varying processing time/lag/delay for different record lengths and number of records

Re: PCI-5152 Varying processing time/lag/delay for different record lengths and number of records

Re: PCI-5152 Varying processing time/lag/delay for different record lengths and number of records

Re: PCI-5152 Varying processing time/lag/delay for different record lengths and number of records

Re: PCI-5152 Varying processing time/lag/delay for different record lengths and number of records