Re: Most efficient data transfer between RT and FPGA

ScotiR_22 · ‎01-02-2024

I have a project I am working on where I am feeding an array of U8's (11 Bytes total) to a FIFO that's configured as Host to Target - DMA, with the FPGA code running off the 40MHz system clock on a sbRIO-9637. When I capture the waveform on a scope, the baud rate I am setting is right on, but the best message rate I can achieve is ~13ms (75Hz). This is a very simple program trying to take a message composed of bytes and send it as a message at variable message rates (200Hz nominal)

As best as I can tell the bottleneck appears to be the DMA transfer to the FPGA, but no matter what I have done to try and streamline the RT code, I can't get it to work faster than I stated above.

Any thought or ideas would be greatly appreciated. I'm using LabVIEW Professional 2023 Q4 with the Real Time and FPGA modules.

raphschru · ‎01-02-2024

Hi ScotiR,

For us to help, you should post both your FPGA and Host codes (VI files), saved for a previous version (21.0 maximum so that most people can open it).

Also, we would need more details such as your DMA FIFO configuration (data type, number of elements, …), the execution mode of your FPGA (simulation or real target), and maybe the purpose of doing this in case we can give you a better way.

Regards,

Raphaël.

Carey3255 · ‎01-05-2024

The most efficient way to transfer data with DMA FIFO (if you want the best performance) is 64bit chunks in large transactions. There is a point where too large of a DMA transaction between FPGA and RTOS will start to mess up the RTOS. RAM is always the bottle neck on the RTOS side. You can use interrupts to tell the RTOS data is ready rather than a scanning process or reading the number of elements in the DMA FIFO. I have found the best way to get the data when it is available is to Read the # elements in in the DMA FIFO - get them check the number elements after reading all of them if not 0 go back and loop through reading them and adding them to an array. If you wait for another cycle to check for more elements the system must setup the process again which takes more time but the short time to re-read out DMA FIFO elements takes less time because the step for the transfer is still in place.

In the pasted I used to perform tests to find the best xfer speed but havent done in a while so not sure if still valid using LV2021 - in the past 64bit transfer maximum was 100MB/sec with the RTOS only reading the DMA FIFIO.

Intaris · ‎01-05-2024

@Carey3255 wrote:

The most efficient way to transfer data with DMA FIFO (if you want the best performance) is 64bit chunks in large transactions. There is a point where too large of a DMA transaction between FPGA and RTOS will start to mess up the RTOS. RAM is always the bottle neck on the RTOS side. You can use interrupts to tell the RTOS data is ready rather than a scanning process or reading the number of elements in the DMA FIFO. I have found the best way to get the data when it is available is to Read the # elements in in the DMA FIFO - get them check the number elements after reading all of them if not 0 go back and loop through reading them and adding them to an array. If you wait for another cycle to check for more elements the system must setup the process again which takes more time but the short time to re-read out DMA FIFO elements takes less time because the step for the transfer is still in place.

In the pasted I used to perform tests to find the best xfer speed but havent done in a while so not sure if still valid using LV2021 - in the past 64bit transfer maximum was 100MB/sec with the RTOS only reading the DMA FIFIO.

In my experience, interrupts are terrible.

Make your Buffer on the RT large enough to hold multiple datasets if possible. Then read entire datasets at a time.

We use a periodic fixed-size DMA transfer scheme and find it to be far and away the most efficient was of communicating.

Each DMA transfer has a minimum time overhead of somewhere between 5 and 15 microseconds depending on hardware.

Each PXIE and Hardware version will have its own bandwidth limitations.

I've done tests in the past, you can find them HERE.