only 3 DMA - DMA with Index - efficienty problem

MathBoda · ‎06-11-2009

Hello,

I'm using a CRIO with several modules (analogue and digital)

I have an high speed analogue module which is using one DMA FIFO.

Another DMA is used for sending data from the RT to the FPGA

I have 1 DMA FIFO left.

The idea is to use a 64 bits DMA coding 2 index on the first 32 bits... the other 32 bits are for the data

The following picture explain how is it working ( the idea is here: one or two index).

The problem is decoding in RT the array to concatene data per channel and writing a queue.

Data will be already in order, just need to select all the data with a same key....

I made a code (attached) to describe the processing.

I am wondering if anybody use this kind of technique? and what could be the more efficient way to read and decode the DMA .....

Mathieu

Christian_L · ‎06-11-2009

Mathieu,

We probably need to know more about the basic requirements and operation of your application to make recommendations on improving the performance.

For the basic operation you have implemented in the attached VI there is not much I can recommend to improve performance, but I believe if we know more about the big picture we may be able to recommend a different method to accomplish the same result.

One recommendation is to collect more data on the FPGA and send it over DMA in larger chunks instead of transferring one data point at a time. This would allow RT to have to perform fewer operations and handle large sets of data at once. It will require some code on the FPGA to prevent multiple sections of code from writing data to the DMA FIFO at the same time. In addition each data packet passed over DMA would need information in the packet header about the amount of data contained in the packet. I would recommend using a 32-bit DMA FIFO. The first element in the packet is the number of data elements to follow in the packet. The second element would be your MSB and LSB index, and the remaining items in the data packet would be your data for the given indices.

To prevent multiple code sections on the FPGA from accessing the DMA at the same time you can use a FPGA Semaphore which I have published on DevZone.

How will the data be passed to queues in RT? Will you copy it from the 3D array to one or more queues? Can you eliminate the 3D array and write directly to queues when sorting it out from the DMA FIFO?

BTW, the VI you attached is missing one subVI (MGI_Array Get Unique Elements (I32).vi) which is not part of LV.

Christian

authored by
Christian L, CLA
Systems Engineering Manager - Automotive and Transportation
NI - Austin, TX

MathBoda · ‎06-15-2009

Thanks Christian for your quick response.

The plan is to make a flexible RT code that could be used for different applications or with different hardware (modules, FPGA size…)I’m planing to use several high speed RS232 channels and use the FPGA to process analogue data then need high bandwidth. I have a 1 or 10 Hz loop reading the DMA. The ultimate goal is to write in a queue instead of 3D array. About the DMA usage, I’ve read (CompactRIO developers guide -p144): “the PCI bus has very high bandwidth for sending DMA data.!! On the same page, I’ve learnt that the DMA size (in the FPGA) can be really small…:“creating a large FPGA memory buffer typically does not have benefits” In a way, using 64 bits DMA or 32 bits (in this case with 2 records: first index, second data) will be equivalent in speed or CPU usage.

With 64 bits DMA, I’m not using semaphore in the FPGA and on the RT; I’m reading two times less records…

Mathieu

Christian_L · ‎06-16-2009

The throughoput over DMA will be the same whether you use 32-bit or 64-bit integers, as the DMA is only 32 bit wide, and each 64-bit integer is transferred as 2 32-bit values.

The buffer you allocate on the FPGA side before the DMA does not need to be large as the DMA throughput is usually fast enought that the data can be buffered in the RT memory DMA buffer where there is much more memory available.

The key is to optimize the handling and processing of data in RT after you read it from the DMA buffer. In your example you handle each data element in the array returned from the DMA buffer in order to extrapolate the indices and know what the data in the element represents. If you can transfer data as blocks you do not need to access each element in the array; you only handle one or two elements used for the header. The rest of the data in the block can be handled in one operation. By reducing the number of data operations in RT you will be able to improve performance.

authored by
Christian L, CLA
Systems Engineering Manager - Automotive and Transportation
NI - Austin, TX

Real-Time Measurement and Control

only 3 DMA - DMA with Index - efficienty problem

only 3 DMA - DMA with Index - efficienty problem

Re: only 3 DMA - DMA with Index - efficienty problem

Re: only 3 DMA - DMA with Index - efficienty problem

Re: only 3 DMA - DMA with Index - efficienty problem