Handling 128 bits of data on FPGA

N_743 · ‎10-01-2021

Hi,

Attempt to familiarize with application:

I am communicating to some feedback devices and I have written codes on FPGA meeting each device's communication requirement. These are mostly serial devices. I send a command and capture response, send it to RT for processing and eventually to PC.

So far I have been dealing with maximum of 32 bit data. I typically would send an array of u32 elements to FPGA (600+ u32 elements/ commands), will store this data into memory, then selected address as needed to read a specific u32 value and when it comes to send out this data out to DIO pin, I will convert it into array and index values.

I have tried several different way to achieve the same result and above mentioned way has been working perfectly for me and has been highly efficient as far as timing requirement goes. But a certain device dealing with 128 bits of data has made reconsider my approach.

Currently I am sending 1 array with 128 boolean elements to FPGA and storing it into Memory. It works great but speed is the issue, it takes up to 2-3 minutes to go through all the commands/data.

New Idea:

So exploring possibilities, one thing am currently testing is:

1) Made a custom control of 128 elements of boolean value

2) Made a memory block using this custom control

Idea is to prepare a 2d array of all the required command, send them to FPGA using FIFO and store them in memory. My concern is the size of array and arrays themselves. I do not have good experience dealing large arrays on FPGA unless they are stored in memory.

I do not know if this will work or not, or if it will even compile. But I need to come up with something before Monday so thought reaching out to you guys for possible ideas.

Any suggestions or recommendations?

Device: sbRIO-9638

GerdW · ‎10-01-2021

Hi N,

can't you handle those 128 bits as 4 U32 values?

Best regards,
GerdW

using LV2016/2019/2021 on Win10/11+cRIO, TestStand2016/2019

N_743 · ‎10-02-2021

Hi GerdW,

Thank you for your response. I should be able to handle those as 4 U32 values and since it's your suggestion this has to be the best way. Usually your solutions are the best :D.

I will try and see how I can make it happen in my case.

Thanks

Terry_ALE · ‎10-03-2021

Is the speed issue between host and FPGA? Or is it internal to the FPGA.

Certified LabVIEW Architect, Certified Professional Instructor
ALE Consultants

Introduction to LabVIEW FPGA for RF, Radar, and Electronic Warfare Applications

rolfk · ‎10-03-2021

@Terry_ALE wrote:

Is the speed issue between host and FPGA? Or is it internal to the FPGA.

Considering the mentioned time of 2 to 3 minutes I'm pretty sure it is host related. And with massive boolean arrays that isn't that surprising. Boolean arrays are treated as array of uInt8 in LabVIEW except on FPGA itself where they are treated as bits. If that delay was on the FPGA I would guess you would be hard pressured to fit enough logic even in the biggest FPGAs to make it take so long (or program massive loops that only do something 1 out of a million iterations).

On the host 4 32-bit values or 2 64-bit values are magnitudes easier and more efficient to deal with than 128 boolean elements. On the FPGA it MAY seem more difficult, unless you learn boolean logic. An integer is also just a specific number of bits and the LabVIEW nodes fully support boolean operations on integers too on all platforms and the LabVIEW compiler cleanly and effectively translates that to bits for the FPGA compiler. Aside from the boolean AND, OR, XOR, NOT you also want to learn about the Logic and Arithmetic Shift operators to shift in and out bits to an integer. With these operations you can forget boolean arrays pretty much completely and simply operate on integers as boolean arrays. If you want to go higher than 64 bits you will have to think about carry and borrow bits too, but it is fully manageable.

Unless your loop runs in a single cycle loop at 100 MHz or more, you really will have lots of timing space to do quite a bit of boolean operations on numbers and if timing gets too tight you can consider pipelining too. For the final FPGA code boolean arrays or integers may not make that a huge difference, since the FPGA compiler works all in bits anyhow and LabVIEW translates both boolean arrays and integers basically to bit vectors of a given size.

One extra tidbit. DMA FIFOs are always packed and aligned to 64-bit. So if you put for instance 24-bit Fixed point numbers into the FIFO, there will be always 2 packed into a 48-bit entity with the remaining 16-bit being unused in the FIFO element. Something to consider when doing high speed FIFO transfer. Not sure how LabVIEW packs boolean arrays on the FIFO though. I never considered putting booleans into a FIFO. The only booleans I use between host and FPGA are single booleans as Frontpanel registers but often I use integers combining multiple bits in one for this as well.

Rolf Kalbermatter My Blog

DEMO, Electronic and Mechanical Support department, room 36.LB00.390

N_743 · ‎10-03-2021

Hi Terry,

Yes, speed issue is in between the PC-RT-FPGA communication. I need to communicate with the device at 9600 baud which is easily support by 40Mhz FPGA on sbRIO9638.

Currently, since data/commands are up to 128 bits, I was sending 1 command at a time to FPGA via RT. So this flow itself is causing the delay. There would be around 20 such commands ranging from 8bit to 128, so I was using boolean array to do so (seemed simpler back then). Now since the requirement grew, I need cut short delays by sending all commands at once and receiving all response back at once. (Implement similar concept on another device, meets the expectations)

Another factor for delay, uncontrollable, is the response time of the device which could be up to 30ms.

N_743 · ‎10-03-2021

@rolfk wrote:
On the host 4 32-bit values or 2 64-bit values are magnitudes easier and more efficient to deal with than 128 boolean elements. On the FPGA it MAY seem more difficult, unless you learn boolean logic. An integer is also just a specific number of bits and the LabVIEW nodes fully support boolean operations on integers too on all platforms and the LabVIEW compiler cleanly and effectively translates that to bits for the FPGA compiler. Aside from the boolean AND, OR, XOR, NOT you also want to learn about the Logic and Arithmetic Shift operators to shift in and out bits to an integer. With these operations you can forget boolean arrays pretty much completely and simply operate on integers as boolean arrays. If you want to go higher than 64 bits you will have to think about carry and borrow bits too, but it is fully manageable.

Hi rolfk,

Thank you for such detailed information. Could please shed some more light on some of the topics that you addressed:

1) If I understood it correctly, I have been using some of the "data manipulation" pallet VI's (Logic shift, Carry left/right, rotate and join/split numbers). I am not sure how I can make benefit of these in this specific scenario of sending 128 bits of data. I can think of using a logic shift and bring down a larger number to a smaller one, but I may loose some data in attempt to revert it.

2) Carry and borrow, if it is not too much to ask for, can you give me an example. I have some knowledge of this but not sure how I can link it with your proposal.

One extra tidbit. DMA FIFOs are always packed and aligned to 64-bit. So if you put for instance 24-bit Fixed point numbers into the FIFO, there will be always 2 packed into a 48-bit entity with the remaining 16-bit being unused in the FIFO element.

Are you proposing to use 64 bits instead of 32bits? It would be simpler for me. I read in one post, cannot find it now, people suggesting use to U32s instead of U64. So just wanted to confirm.

Intaris · ‎10-07-2021

@rolfk wrote:

One extra tidbit. DMA FIFOs are always packed and aligned to 64-bit. So if you put for instance 24-bit Fixed point numbers into the FIFO, there will be always 2 packed into a 48-bit entity with the remaining 16-bit being unused in the FIFO element. Something to consider when doing high speed FIFO transfer. Not sure how LabVIEW packs boolean arrays on the FIFO though. I never considered putting booleans into a FIFO. The only booleans I use between host and FPGA are single booleans as Frontpanel registers but often I use integers combining multiple bits in one for this as well.

OK; I know I'm probably wrong and I pretty never (am able to) correct RolfK, but this doesn't match my experiences in benchmarking. Very nearly matches, but not quite. FXPs are always passed via DMA as 64-bit. i.e. a FIFO of N 24-bit FXPs will actually send N 64-bit packets to the FPGA. Even if the FXP is only 4 bits, LabVIEW will still send N 64-bit Packets over DMA. I've documented this before and passed the information on to NI because it's actually really inefficient. That said, it was when using PXIE Flex-RIO in LabVIEW 2012 and 2015. May be different now.

I8,U8,I16,U16,I32 and U32 ARE packed just as Rolf says, but at least in my testing it was painfully obvious that FXP were not being packed into the DMA channel but sent as individual 64-bit packets.

rolfk · ‎10-07-2021

I have to admit that I'm not 100% sure about the FXP packing. When I investigated my last FPGA design document, I was researching it as I needed to pass 12 bit signed FXP from my 64 simultaneously sampled ADCs at a maximum speed of 70 kS/s. But for several other reasons I eventually decided to use 16 bit FXP on the FPGA and pass them as normal I16 in the FIFO, so unintentionally avoided the potential difference.

I'm working for this project however in LabVIEW 2018 and used the published Python NI-CRIO bindings to investigate the issue and there was clearly a 64-bit packing strategy, although I was under the impression that it applied for FXP just as well. But as I decided for I16 FIFO transfers before really starting programming, I have to admit that I never really got to see if there was a different packing strategy for FXP.

I anyhow prefer to pass native integers or SGL packed as I32 between FPGA and host. It just makes things on the host side a lot easier.

Rolf Kalbermatter My Blog

DEMO, Electronic and Mechanical Support department, room 36.LB00.390

Intaris · ‎10-07-2021

@rolfk wrote:

I have to admit that I'm not 100% sure about the FXP packing. When I investigated my last FPGA design document, I was researching it as I needed to pass 12 bit signed FXP from my 64 simultaneously sampled ADCs at a maximum speed of 70 kS/s. But for several other reasons I eventually decided to use 16 bit FXP on the FPGA and pass them as normal I16 in the FIFO, so unintentionally avoided the potential difference.

I'm working for this project however in LabVIEW 2018 and used the published Python NI-CRIO bindings to investigate the issue and there was clearly a 64-bit packing strategy, although I was under the impression that it applied for FXP just as well. But as I decided for I16 FIFO transfers before really starting programming, I have to admit that I never really got to see if there was a different packing strategy for FXP.

I anyhow prefer to pass native integers or SGL packed as I32 between FPGA and host. It just makes things on the host side a lot easier.

I was also very surpised to see this behaviour, but I had done some benchmarking of DMA channel speed for different datatypes and had just thrown in FXP on a whim. I spent some time trying to decipher why the data transfer rate for FXP8 and U8 were significantly different. NI gave an answer that it was faster this way than extracting the bits, but my tests showed this to be clearly not the case at all. I was able to do manual bit packing to massively increase the overall throughput of FXP values via DMA (sending as U64 and re-interpreting on the FPGA)..

LabVIEW

Handling 128 bits of data on FPGA

Handling 128 bits of data on FPGA

Re: Handling 128 bits of data on FPGA

Re: Handling 128 bits of data on FPGA

Re: Handling 128 bits of data on FPGA

Re: Handling 128 bits of data on FPGA

Re: Handling 128 bits of data on FPGA

Re: Handling 128 bits of data on FPGA

Re: Handling 128 bits of data on FPGA

Re: Handling 128 bits of data on FPGA

Re: Handling 128 bits of data on FPGA