DMA fifo read high cpu usage

St3ve · ‎09-23-2009

Hi al, this is a long one!

I have noticed some strange behaviour when I read from a DMA FIFO between FPGA and RT host on my crio (9014 controller and 9104 backplane). The FPGA is writing 2 data points to the FIFO every millisecond and the RT loop is reading 500 data points every 250 ms. The RT loop period is controlled using the wait untill next ms multiple function.

The use of the wait untill next ms means that in the first iteration the wait will not occur so I gave the DMA FIFO read method a 750ms timeout to allow the data points to accumulate on that first iteration.

I then ran the VI and it read data as expected and there was always 0 elements in the FIFO. Then using the the system monitor i observed the cpu usage on the RIO and was surprised to see it was around 30% !

After much head scratching and many other attempts I decided to set the timeout to zero so the read just timeouts untill there are enough data points and then runs a little bit behind so that there is always a constant (non zero) number of data points in the FIFO. Now the VI runs and uses only around 3% CPU.

I then thought that in the first case that as I was reading exactly the number of points in the FIFO that I was tripping some sort of polling behaviour in the read function which was hogging the cpu.

So then I ran a case with a zero timeout where I was reading with 0 data points left in the FIFO as in the first case and expected to see the read function timeout but it never does and the CPU usage is normal.

So what is happening? I am stumped!

Thanks,

Steve.

Caseyw · ‎09-23-2009

Hi Steve,

I think this KnowledgeBase article may explain the behavior that you have noticed:

KB #4X1GBJDK: Why is My Real-Time CPU at 100% When Reading from a DMA FIFO (FPGA)?

Regards,

Casey Weltzin

Product Manager, LabVIEW Real-Time

National Instruments

St3ve · ‎09-24-2009

Hi Casey,

That is what I thought was happening which is why I checked the case with a zero timeout where the FPGA was writing the data at the same speed (2 points per ms) and the RT was reading at 500 points every 250ms and there was constantly no data points remaining in the FIFO.

I expected to see the function timeout as I assumed when I had a greater than zero timeout it was polling briefly for the last couple of data points if the RT loop executed 1ms ahead of schedule. However, this does not happen, a timeout never occurs indicating that the data is always present.

I also thought that when I was seeing the high CPU usage that it was in reality only spikes caused by brief polling events and that the slow update of the remote system monitor CPU garph was resulting in some aliasing meaning that the CPU usage appeared constantly high. So I tried the same two cases at a much slower RT loop period (1500ms reading 3000 data points per iteration) and I observed exactly the same behaviour. eg high CPU usage when using a timeout>0, low cpu usage with a timeout=0 and no timeout occuring.

I am convinced that when a timeout is enabled and the function is reading exactly in time with the FPGA writes (no data points remaining in the FIFO) that it somehow triggers some unnecessary polling action which results in the high CPU usage.

Many thanks,

Steve.

St3ve · ‎09-24-2009

Hi Casey,

I have attached a project with some sample code. If you run it with no timeout enabled (0ms) and either start it in catchup mode or subsequently use the catchup to get to zero elements remaining in the FIFO and check the cRIO cpu usage it is low (~3%). If you then enable the timeout (500ms) and run it again CPU usage jumps to 30 - 80% on different runs.

Regards,

Steve.

Caseyw · ‎09-24-2009

Hi Steve,

When you set a positive timeout value and there are zero elements in the DMA FIFO, the read method polls very quickly for data as mentioned in the KnowledgeBase article. I believe this explains the behavior that you are seeing in this case (high CPU usage).

When you set a zero timeout value and there are zero elements in the DMA FIFO, I believe the read method should quickly timeout and therefore your loop will run at the rate specified by any timing VIs (unless processing in the loop takes longer than the timing VIs). What do you mean when you say that no timeout is occurring in your VI? How did you test this?

I am sure that we can work together to explain exactly what is going on here. Thank you for your detailed posts Steve!

Regards,

Casey Weltzin

Product Manager, LabVIEW Real-Time

National Instruments

St3ve · ‎09-25-2009

Hi again,

Yes, I expected exactly what you said to happen. So when observing the CPU usage you would expect a brief spike in cpu usage followed by low cpu usage as the loop timing function makes the thread sleep.

I realised that the resource monitor is a crude way to monitor this as it updates slowly (around 1Hz) so I ran the DMA read in a loop with timing forcing it to wait 1500ms between reads. At the rate the FPGA was writing to the FIFO the data should always be present when the function tries to read it (only just hence always running with zero elements remaining).

Supposing that the read on the RT has executed 1ms early it should only have to poll for 1ms for all the data to be present. So on the resource monitor we should see a brief spike in CPU activity followed by 1500ms of CPU idle. In fact we would be lucky to see that CPU spike at all at the update rate of the resource monitor! So the CPU usage would appear continually low if anything!

As you say if this brief polling behaviour is occurring then a read function with zero timeout should immediately timeout (and return an error code-50400). As you can see in the code I posted I check for this error, clear it and set a latch if it does occur.

To summarise:

The expected behaviour of a loop with a timeout enabled is:

Iteration 0: Use of wait until next multiple function results in short 'wait' first iteration so read function polls for the data to be written by the FPGA and does not timeout and reads all of the data as soon as it is present.

Iteration 1...N: As the data is being written as fast as it is being read we expect to see zero elements remaining (we do). Brief polling may occur but only for a maximum period of time equal to any jitter in the loop execution period.

Expected CPU usage: Brief initial spike followed by possible intermittant spikes caused by polling. This is unlikely to be visible on the resolution of the resource monitor so overall a low CPU usage should be observed.

Observed CPU usage: Continually high.

The behaviour of the loop with zero timeout is as follows:

Iteration 0: Use of wait until next multiple function results in short 'wait' first iteration so read function times out (as expected) and sets the latch.

Iteration 1...N: Read function is now reading happily with x elements reaming in the FIFO due to the timeout in iteration 0 so press the catchup to flush the FIFO and simulate the behavior of when a timeout was enabled.

Iteration N...: Loop is now reading with zero elements remaining. We are expecting to see the function possibly timeout every so often due to small jitter in loop period. However it never does!

Expected CPU usage: Low

Expected timeout behaviour: Intermittant if at all

Observed CPU usage: Low

Observed timeout behaviour: Never

It seems that when a timeout is enabled on the first (short) iteration of the loop it polls and then never stops polling - even though we have shown it doesn't need to!

I am really just interested to explain what is happening as I can repeatably make it occur. It may well be that I have made a mistake somewhere!

Regards,

Steve.

Caseyw · ‎09-28-2009

Hi Steve,

I believe that I can explain the processor usage behavior that you are experiencing:

Case 1: Loop with Non-Zero Timeout

- Your code starts by waiting until a ms multiple: this means that a wait time of 0-250 ms occurs (avg 125 ms)

- Next, your code attempts to read 500 data points from the DMA FIFO (if 500 points come in every 250 ms). All data may not be present yet.

- If 500 points aren't ready yet, polling occurs until they are. During the time of polling, CPU usage is high. Data is finally read.

- The next iteration of the loop begins.

- The code now waits until 250 ms after the last Wait Until Next ms Multiple function call. Roughly the same amount of data should be available at the end of this wait as during the last iteration! Therefore, the polling time on the next DMA FIFO read should be roughly the same, and the CPU usage will remain high.

Case 2: Loop with Zero Timeout

- In your code, immediately after the first Wait Until Next ms Multiple call, you "synchronize" by reading out all remaining FIFO data.

- During the next iteration of the loop, we can expect the Wait Until Next ms Multiple call to wait the full 250 ms. This means that the DMA FIFO data (discounting jitter) should be ready. If jitter is low enough, this will result in very few timeouts of the DMA FIFO read, and a low CPU usage.

I would recommend either using a timed loop and reading the exact number of data points available (if your application can handle variable size data sets), or using a while loop like you are and making sure that loop priorities are set such that the high CPU usage doesn't affect critical functionality.

Thank you for the very good question; I really personally enjoy thinking through these topics! Please let me know if you have any questions that warrant additional discussion, and have a great day!

Regards,

Casey Weltzin

Product Manager, LabVIEW Real-Time

National Instruments

St3ve · ‎09-29-2009

Hi Casey,

I think that is correct! It would explain why the cpu usage was variable depending on how long the initial wait was (0 - 250 ms).

Interesting that you mention timed loops as I always thought they were a nuisance untill I recently had a problem with a time critical priority loop causing a normal priority loop using the wait untill milllisecond function to miss its 'slot'. You can probably tell I'm new to RT!

Many thanks for your helpful replys, it's very satisfying to find the solutuion - I knew it would be something I had done!

Regards,

Steve.

captain_only · ‎10-01-2009

I've written similar code. The key to making it work is to use two metronomes, one before the loop and the other inside. The first metronome synchronizes the host VI with the system timer, ensuring that the second waits the full period. This system relies on buffering in the FPGA FIFO to hold data for that first partial period.

Keep in mind that this system doesn't keep the host and FPGA synchronized. They will drift over time. For a perpetual application, you will want to build a automatic "catch-up" mechanism using polling or interrupts.

Hope this helps!

NI Software Engineer - RIO

St3ve · ‎10-02-2009

That's an interesting idea. In the end I accepted that it would be running slightly out of synch and as you said just built in an 'auto catch-up'.

Real-Time Measurement and Control

DMA fifo read high cpu usage

DMA fifo read high cpu usage

Re: DMA fifo read high cpu usage

Re: DMA fifo read high cpu usage

Re: DMA fifo read high cpu usage

Re: DMA fifo read high cpu usage

Re: DMA fifo read high cpu usage

Re: DMA fifo read high cpu usage

Re: DMA fifo read high cpu usage

Re: DMA fifo read high cpu usage

Re: DMA fifo read high cpu usage