Dynamic DMA FIFO size: Working formulas

pepulenko · ‎04-29-2014

We need your help here!

There are many forums regarding to determination of the size DMA FIFO and first we would like summarize what we have found here. Second we would like to address our problem overflow despite of our calculations.

We started from recommendation of NI engineering of best practice (http://goo.gl/oPAvTE) to more elaborated solution proposed to dump the fifo into the disk (e.g.http://goo.gl/Am2DTc). The latter streaming is actually our interest.

Please check these formulas and let us know if they are wrong:

1. Total FIFO size = FPGA fifo + RT FIFO.

2. Setting FPGA size of, Nef = 1024 Elements.

3. RT side: Number of Elements (Ner) = Sampling Rate (SR) x Number of channels (Nc) x Reading time (Rt). e.g. NI-9222 IO module and 5 channels and 500ms Rt, we have:

Ner= 500kS/s x 5 x 0.5s =1250k samples in half second.

4. Then, the dynamic depth on the RT is set as per recommendation to:

Depth = Ner x 5.

The latter number is according to best practices (http://goo.gl/wDQq8u). From the above example we get:

Depth = 1024kx5=6,250k samples (~6.3M samples).

5. Depth in bytes: NB= Depth x 4 (e.g. U32 encoding) = 6,250kx4 = 25MB available for half second.

6. FPGA writing speed (actual speed is dictated by the PCI-bus rate, e.g. 132MB/s for 32bits machines) onto DMA FIFO for 40MHz clock is:

Swf= 40MHz/SR=40MHz/500kS/s = 80ticks (80 x0.25us => 1Sample every 2us). Thus the number of elements written from the FPGA side after 500ms is:

Nef = 500ms/2us = 250k elements for one channel. In total we have 250k x 5channels = 1250k elements or 5MB in half second.

7. Thus, setting on RT a while loop (timed loop) with a delay of 500ms (dt=500ms) we should be able to read Ner = 1250kS (or 5MB) elements (size) at once upon invoking the FIFO read function. And yet have 20MB for the FPGA to write while RT reading is happening. Right?

Well, there is something we are missing from above because we get a time out flag true. To remedy that we have tried to bring the educated guess of 5 up to 13 and it does not help. More problematic for us is that we’ve noticed that Ner in the first iteration is not 1250k samples as we may expect but rather the total FIFO size, i.e. 6,250k Samples! After first iteration the remaining elements are = 6,250k-1250k=5M Samples!!!

If you find any glitch on the above formulas please let us know. Our code is coming right after your feedbacks.

Pie566942.0 · ‎04-30-2014

Are you sure the FIFO is empty when you run the application? I.e. make sure Elements Remaining == 0 on RT and Elements Available == Nef, or just clear it (FIFO.Stop, FIFO.Configure(new size), or FPGA.Reset).

If you get 6,250KSa on the first read, then the RAM buffer is full by the time RT reads it. How are you synchronizing the two ends of your FIFO? Maybe the producer starts a half second before the consumer.

Personally, I don't recommend Timed Loops for execution timing target-to-host FIFOs. I recommend you let the FIFO.Read method throttle the loop by blocking (-1) or polling (>=0).

-Steve K

pepulenko · ‎04-30-2014

Hi Steve,

Thank you for your inputs. Before doing a reply your questions by parts I want to add that all above formulas are correct and works perefectly OK for us. Now back to you:

>Are you sure the FIFO is empty when you run the application?

Yes, we have the reset method just right after we load the FPGA module. That gurantees the emptiness of the FIFO every time we run the RT.

> How are you synchronizing the two ends of your FIFO?

This is another good point. We use the IRQ for the handshake. We write and read zero onto FPGA every now and then.

> Maybe the producer starts a half second before the consumer.

That's correct. In fact there is where riside the problem but not between the producer and customer loops. Digging into the problem we found out that there were a MISTERIOS delay of about 1.5 seconds" for RT before it latches its first iteration. This is a "time race" that was pointed in another discussion (http://goo.gl/ql09eQ). That may not be a problem for low sampling rates but high streaming data with 500KS/s and up for sure. We would like to see delay properly documented by the NI-engineers.

Knowing that it was easy to deploy a solution by triggering the streaming of the data into DMA FIFO with a predefined control.

>

Personally, I don't recommend Timed Loops for execution timing target-to-host FIFOs. I recommend you let the FIFO.Read method throttle the loop by blocking (-1) or polling (>=0).

That's correct as well. We define a time critical vi to handle the reading and use while loop instead.

Thanks Steve!

Regards,

José

Pie566942.0 · ‎04-30-2014

Hi José,

I suspect some latency on the first FIFO.Read call is designed behavior, but 1.5 seconds seems like a very long time. I suspect something else is going on. For example, if a time critical VI isn't scheduled before some other process releases, the TC VI is blocked until the first process sleeps (I've proven this in development mode, I'm not sure it applies to a built RTEXE). Some screenshots or attachments might help us get to the bottom of the issue you're reporting. If you want NI to weigh in, you'll probably want to start another thread and ask for their opinion up front. If they see the post is marked as Solved, they might skip over it.

-Steve K

pepe70 · ‎05-01-2014

Hi Steve

Thank you for your reply once again. We basically modify the code suggested by NI-engineers (http://goo.gl/Am2DTc) to serve our application. Basically you can download the orginal code to see the performance regarding to the FIFO.

We've made few changes as you can see in the attachments. The first image on the top are the main loop running on the RT. We blew up the two main via (initialization) and Streaming FIFO.

The main problem persist and the very first read of FIFO.Read method shows all the elements defined from the RT side (aka MAXIMUM FIFO DEPTH DEFINED). That means the streaming FIFO.vi, which is priority vi, is taking too much time to execute its first iteratoin. Please see the code attached to see if you find any glitch there. Please let us know if you see why this streaming FiFO.vi is taking so much time to execute at first. Note that Synchronization happen really fast and we have tried with and without IRQ interrupt and the problem remains.

Thanks Steve!

Pie566942.0 · ‎05-01-2014

Hi José,

NI calls User Controlled I/O Sampling an "advanced FPGA interface". One should start with the "NI 9222 User-Controlled I/O Sampling" project in the LabVIEW Example Finder to get 500KHz out of the 9222.

Focusing on the delay issue, I would like to see the rest of the RT code. If you're using a single-core cRIO system, and if other process(es) run (AKA release) before the TC VI initially releases, the TC VI may just be waiting on the other process(es) to sleep.

For example, consider this code:

If you put this code into subVIs of different priority, and place them in parallel like this:

You might expect the Time Critical VI to run first. On a single-core target, like most cRIOs, you can wind up with the following trace, which I obtained from the code shown above:

This trace shows the Above Normal VI ran before the Time Critical VI and the High Priority VI. This happens on the first run, resulting in an unintended delay before the Time Critical VI runs. In my example, the unintended delay is 4ms long, because that's how long I programmed the Above Normal process to run (note the duration input to the subVI).

You can design-out this scheduling quirk by forcing each loop to sleep at the beginning, which you can implement with a Flat Sequence structure, a subVI with data flow, etc. Timed Structures mitigate this as well, but IMHO they introduce their own set of caveats. By the way, the "quirk" is actually the LabVIEW Clumper and Scheduler doing exactly what they were designed to do. This is not a bug.

Without observing the RT trace from your application, there's no way to say for sure if the quirk I show above is the cause of the delay you're observing. However, you could create a test VI, without the other processes, and see if you still observe a delay. You could also instrument your code to isolate the delay. For example, you could latch the system time of each process when it first releases. The First Call? node may be useful. That would immediately confirm the release order. Please consider starting with the shipping example.

-Steve K

pepulenko · ‎05-02-2014

Hi Steve

No problem. See the RT codes attached.

You are right the Controlled I/O Sampling allowed us to reach the mark of 80ticks (2us) with the NI-9222. That was a bit of a challange that took us a while to nail down.

Yesterday we move a bit forward.

1. We were streaming 8333 samples within 17ms. Now we controlled the streaming right from the FPGA side such we stream only regular intervals. So instead of 17ms we can stream 1ms, 2ms or any other multiple. We expect to no overflow but still.. 😞

2. We dig more into how the DMA engine works (see pp. 84-87 here: http://goo.gl/zymYQh) and I place here the part where they talk about the latency of the engine.

"

FPGA- or host-side buffers.

When transferring data from the FPGA to the host, the host-side buffer should be nearly empty in the

steady state. Additionally, the FPGA-side buffer should be relatively small, so the latency is determined

by how often the DMA engine transfers blocks of data to the host. The DMA engine transfers data to the

host whenever any of the following conditions are met:

(a) The FPGA-side buffer is one quarter full

(b) The FPGA-side buffer has at least 512 bytes (a full PCI Express packet)

(c)The eviction timer of the DMA controller fires—this timer has a period of approximately one microsecond

"

For us (a) means [1024Eelements x2us/5channels*1/4 =0.1ms] the engine will start transferring the data from one fifo to another after 0.1ms. As expected this process is really fast.

From the other side some people have suggested put some delays to ensure the DMA engine started (see the last image of this discussion: http://goo.gl/x1uZlf).

We will try your suggestion today because we know our problem resides on the reading side. We are not able to get those vi within the timing they are set. So your inputs comes right on target. Please bring our attention on anything wierd you find in our code.

Thank you once again Steve. We deeply appreciate your comments.

José

Pie566942.0 · ‎05-02-2014

Hi José,

I don't think DMA controller latency is the issue, because the host buffer is already full on the first read. If the controller were the issue, the FPGA FIFO would overflow but the RAM FIFO wouldn't be full.

- Steve K

Real-Time Measurement and Control

Dynamic DMA FIFO size: Working formulas

Dynamic DMA FIFO size: Working formulas

Re: Dynamic DMA FIFO size: Working formulas

Re: Dynamic DMA FIFO size: Working formulas

Re: Dynamic DMA FIFO size: Working formulas

Re: Dynamic DMA FIFO size: Working formulas

Re: Dynamic DMA FIFO size: Working formulas

Re: Dynamic DMA FIFO size: Working formulas

Re: Dynamic DMA FIFO size: Working formulas