LabVIEW

cancel
Showing results for 
Search instead for 
Did you mean: 

High sampling rate, high channel count in cRIO FPGA [read and write] (how to?)

Hi,

recently I started to put a project into a more orthodox way in our cRIO RT 9024 - FPGA 9112.

 

The project is a HOST <--> RT <--> FPGA  structure.

Host to RT uses variables for control and a network stream for captured data transfer.

RT to FPGA uses a DMA FIFO for captured data transfer and normal read/write boxes for control variables (sampling rate, etc.)

 

The FPGA is fully populated: 9411, three 9215, one 9234, one 9217, one 9269 and one 9375.

 

We (actually I'm alone in this issue :D) capture data from a 5 + 1 phase of an electrical motor so the number of measures are:

6 currents --> fastest capture loop (50 to 100kS/s)

6 voltages --> fastest capture loop (50 to 100kS/s)

1 encoder --> own loop (40MHz)

1 torque --> own loop (51,2kS/s)

3 temperatures --> own loop (4 S/s)

 

And we send control data via analog ouputs to the "load" drive in form of (usually 1kHz):

several digital inputs (as feedback)

several digital outputs (control)

2 analog outputs (setpoints)

 

The goal is synchronous sampling. In order to do so, see attached file, I created an array with all inputs converted from its 24/5-bit FXP word inputs to bitstreams, then added 8 bits and finally reconverted to int32. This data has no meaning, until at the host, those 8 bits are removed and the bitstreams are reconverted to 24/5-bit FXP words.

This way:

- I can send all types of data: true int32, and different FXP formats in a single block.

- I only use one FIFO (recommended)

- The FIFO accepts the data as normal int32 without extra header info (recommended).

I haven't found this method neither in forums or documentation (I'm rather new to LabView) and I think there may be more proper ways to do it, but this assures no data loss and 17-variables blocks of synched data. And Works.

FPGA

My problem is that the maximun capture rate I've been able to reach is 50kS/s for the fastest loop. This loop reads registers with the "last" value from the other slower loops and add the data to the array.

NI 9215 should be able to capture 100kS/s with four channels per card with no problem. More if less channels are used per card (something I would like to try as well).

 

At Reat Time side, at higher sampling rates RT-cpu reaches 100% usage and samples are lost.

Curiously, RT only passes the bitstream to the HOST pc, no manipulations is done.

FIFO to network stream is done by the so called double FIFO read (much to improve in the timeouts implementation). See picture:

 

RT side

 

 

I'm updating this project to insert it into the template of HOST <--> RT <--> FPGA  .vi structure provided by LabView to add security-robustness to the project as well as to reduce data communication for the ouputs (which sampling rate is lower).

 

My question are:

(1) Which is the best way to transfer data from FPGA to RT with 16 to 20 channels of FXP at 100kS/s?

(2) Is the problem on the RT side or FPGA side?

(3) Do a specific typedef uses less bandwitdh (combining FXP of different types and INT32)?

(4) For RT: What is better: a bigger FIFO and bigger loop times or smaller FIFO and smaller loop times?

 

Footnote: data is stored as doubles in a tdms file at HOST side: there is where INT32 to proper FXP to double conversions are done.

 

 

 

0 Kudos
Message 1 of 5
(4,245 Views)

A couple things I see that I would do differently than what you show, or would try out.

 

1.  Instead of using a regular while loop and using the FPGA timed wait, I would use a timed loop with its cycle time set to what you need.

2.  You could set your data type to U64 instead of I32.  That would let you put two pieces of data into each element instead of 1.  64 bit is a basic unit for FIFO's and is as fast as 32 bit.  You could pack it even tighter and not pad the 24 bit numbers with 8 empty bits.  17 elements x 24 bits is a total of 408 bits.  Break that across 64 bit elements and you can do it in 7 data elements instead of 17.  You'll control the packing and unpacking on each end so you shouldn't have any data corruption when you have to reassemble those elements that get split across the 64 bit boundary.

3.  Experiment by sending over only half your data.  See if you can get the loops to run any faster.  If you can, then you know the bottleneck is in the DMA FIFO.  If you can't, then the bottleneck is elsewhere.

0 Kudos
Message 2 of 5
(4,201 Views)

Hi,

I moved to U64 as you suggested. The other method you propose require bigger changes in both sides (FPGA and HOST) and this seems to cut bandwidth to half. Thank you very much.

 2x24 to 1x64

 

 

Now I'm trying to use the template of 'FPGA control' to embeed my current project on it, but its the worst template I've ever seen. So much sub, vi, reference, protocols, etc. Seems odd.

0 Kudos
Message 3 of 5
(4,176 Views)

For the RT side,

From what I can see you are polling the DMA FIFO and writing to the network stream in the same loop. A problem occurs when your buffer fills and you lose values. Basically; you aren't polling your DMA FIFO faster than the FPGA can fill it up! There are two solutions: 1) Larger buffer. 2) Faster polling on the RT.

 

Larger buffer is not ideal (uses more resources, and doesn't fix the slow RT pollingrate) but can work depending on the RT jitter. Plus; its easy!

 

Faster polling RT is the other option. The problem is that you are doing a lot besides polling in every cycle of your RT loop (processing things, writing to Network Stream etc.). You need to decouple the polling from the other things. You need to be able to reliably poll the buffer; you need to have minimal jitter in your polling loop.

 

Put the DMA FIFO read in a timed loop (high priority) and put the data into an RT FIFO. Only read from the DMA FIFO and write to the RT FIFO, minimal processing! Make the RT FIFO big! Then in a separate loop (low priority) read the RT FIFO and do any processing before sending it to the network stream.

 

A generally good tip when using Labview is to decouple the functional components of your program and get them to run independently, especially when you want to run different functions at different rates. Another example of this might be sending control information down to the FPGA. This information only needs to update slowly, maybe a few times per second. So you can put this is a really low priority loop and it will not suck resources doing pointless updating.

CLA - Kudos is how we show our appreciation for comments that helped us!
Message 4 of 5
(4,168 Views)

I'm working on that, thank you! In my tests the best solution I found was to set a rather high DMA buffer (65536-1) and increase the RT capture loop to about 50 ms. Less RT loop time ovearloads the CPUT and timeout ocurrs. The same happens with bigger DMA buffer and higher RT time. (These numbers are with the old 17x32bit).

 

My plan is to follow the "idea" behind the template:

a loop for HOST communication (all commands will come from host)

a loop for FPGA set-reset and configuration

a loop for DMA FIFO reading

a loop for other IO to the FPGA

 

 

 

 

0 Kudos
Message 5 of 5
(4,155 Views)