03-11-2025 09:08 PM
Hi there! (sorry for the long description and for the lack of attached VIs, no permission here)
I'm using LabVIEW 2019 (32-bit) and a cRIO-9049 to implement a XYZ motion control system, which must follow trajectory references. Closed loop control (feedback readings + control laws calculation + actuators commands) is implemented in FPGA at 20 kHz (50 µs period). Trajectory references discretized at 20 kHz are read from a single CSV file by RT and streamed to FPGA using a single Host to Target DMA FIFO channel.
The problem: sometimes I get a read timeout on FPGA side, and I just can't explain why, neither solve it. For a same trajecory file, this error occurs roughly in 4 of 10 attempts, so most of time it runs without problems, but it eventually fails at some random sample of the trajectory, no regularity observed.
I can't ignore this error by waiting until a new sample arrives on FPGA side, because this would distort the motion profile.
A "trajectory sample" consists of 8 elements (SGL type) :
As suggested here and here, to make an efficient usage of DMA FIFO, I write the data by blocks, which size is configurable before starting the streaming. I've been mostly working with a 200 ms block size, which corresponds to (200 ms/block)*(20 kHz)*(8 elements/sample) = 32000 elements/block, although I tested others sizes, and didn't solve the problem.
FPGA consumes these blocks sample-wise (8 elements per reading) at the 20 kHz controller rate. On RT side, I implement a loop to send a block per iteration at a rate equal to the block duration. This way, I try to keep the "mean flow of writes" equal to the "mean flow of reads". For example, with 200 ms block size, my RT loop runs with a 200 ms iteration loop time. To give some advantage for the RT, I only start reading blocks in FPGA after writing two blocks, i.e., I enable FPGA FIFO reading (using a FPGA front-panel boolean) only at the end of the second iteration from the RT loop.
I've tried different implementations of this RT loop, all of which I'm pretty confident can run every iteration faster than the specified loop time/block duration.
First, I tried different loop structures and timing monitoring, such as:
For these options, I've tried:
For a 200 ms block size, the first approach resulted in iteration of ~130 ms, and the second approach around 200 us. FIFO read timeout in FPGA occured in all combinations of the two previous lists.
I've also played with buffer sizes. In RT it's depth is set to 160768, corresponding to ~1 second of trajectory (8 elements/sample * 20 kHz), and I never got even close to fill it. In FPGA, I've tried the following sizes:
Other things I've tried, without success:
Since all iterations of RT loop seem to run on time (based on tick counter and timed loops left node readings) and is always at least one block ahead from FPGA, I started to suspect on the FPGA side or the DMA FIFO controller itself.
I started monitoring both FIFOs using "Empty Elements Remaining" (RT) and "Get Number of Elements to Read" (FPGA) Invoke Methods and noticed that:
I've checked this 2 kHz and 22 kHz by reading the Number of Elements in both FIFOs periodically using different methods in RT and post-analysis, and it really seems that the FPGA suddenly increases the rate at which it reads the FIFO. This could explain why I get the timeout, but then I have no explanation for how this could occur.
I doubled-checked (by using, for example, a dedicated Timed Loop and monitoring its Iteration Durations) whether this sudden rate increase could be just a problem in the RT implementation of this measurement, and I'm confident it's not a false conclusion.
The FPGA loop responsible for reading the FIFO at 20 kHz is implemented simply using a While loop, a Flat Sequence and a Loop Timer with a constant input, as recommended in many examples. I tried configuring this Loop Timer by ticks and by µs, no difference observed. And when I check the difference of its output between each iteration, it just stays constant and equal to its input value, so no indication it increased the loop rate.
Lastly, both implementations in RT and FPGA have a lot going on besides this data streaming, and we consider this could also explain the problem, although all loops had their iteration duration checked, CPU loads in RT are fine, and the most used resources in FPGA are slices (71%) and LUTs (52%).
In summary:
I'm stuck with this problem for at least 2 weeks already, and need to get a solution as soon as possible. If anyone have any ideas of something I'm missing new tests I could try, I would really appreaciate! Again, I wish I could post some VI's, but at the moment I have no permission.
Best Regards,
Gabriel O. Brunheira
Mechatronics Engineer
Brazilian Synchrotron Light Source
03-12-2025 07:34 AM
Update: I created a version of the application in RT and FPGA containing only the code responsible for the data streaming, and the timeout error still occured, so the parallel execution of other stuff mentioned in the previous post seems to have nothing to do with it.
03-12-2025 03:44 PM
Gabriel,
This post may not directly match your problem, but there is a lot of useful detail about DMA FIFO setup some of which is fairly obscure, and I know our LV developer used some of insights from this to solve a problem we were having with loosing data (rather than timeouts):
https://forums.ni.com/t5/LabVIEW/DMA-FIFO-switching-beteen-channels-after-FPGA-sends/td-p/2556251
Good luck,
Andy
03-14-2025 02:08 PM - edited 03-14-2025 02:08 PM
Hi Andy!
Thank you very much for sending this information, I'll take a look!
Since my last message, I was able to get new information about the problem. I modified the main VIs in RT and FPGA to toggle two digital outputs: one that toggles at the RT write loop (DIO1), and the other toggles at the FPGA read loop (DIO2). Then I monitored them with a oscilloscope (see below). Red curve is an analog output that reproduces the trajectory reference:
As expected, the first signal (RT) toggles at a period 200 ms, and the second one at 50 us (20 kHz FPGA controller rate). But when the problem occurs, the RT period simply increases to around 220 ms (~18 kHz), and keeps it. This way, the “mean flow of writes” decrease, then the number of elements in RT buffer slowly decreases until it’s empty. At this moment, FPGA buffer starts to empty until an underflow occurs. This 18 kHz explains the "2 kHz question" I mentioned in my original post.
I was monitoring RT loop rate using different “software indicators”, including Tick Counts differences, Iteration Duration and Finished Late? from the Timed Loop version, and none of this indicated that RT was taking more time than expected, not even once. Actually, it was even more strange: when the problem occurs, the measured Iteration Duration decrease by a couple of milisecconds!
To test this software indicators, I induced a delay to occur in a specific iteration of the RT write loop by using a Wait VI when the iteration number was equal to some number. In this case, I was indeed able to see not only the RT digital output taking longer to toggle at one iteration (see below), but also with my software indicators (tick counts, iteration duration and finished late):
The only explanation I came up with was that somehow the clock used by RT as a timebase reduces it’s frequency (maybe due to overheating?), so it gets slower relative to the FPGA without noticing it. I believe this could even explain somehow why the Iteration Duration reduced (as mentioned above) when the problem occurs.
Anyway, knowing the problem is the RT, now I changed the streaming strategy, and I no longer write the block at a period equal to the “block duration”. Now I’m just writing the blocks as soon as there’s space available in the FIFO, by setting a timeout different of zero in the Write method and removing the temporization of the write loop. I'm still analyzing the impact this have on CPU load, maybe I'll have to include some strategy to avoid CPU starvation.
This seems to solve the problem, but I still have no explanation for why RT suddenly takes more time to run its loop, without been able to notice it (using the so called “software indicators”).
What do you think?
Thanks again!
03-14-2025 05:38 PM
Great to hear you have identified and developed a work around.