Max acquisition rate with USB-7855R

dlathrop · ‎08-05-2015

I am trying to achieve the specified maximum acquisition rate of 1 MHz for eight channels with the USB-7855R. I started with the AI loop found in the Balanced IO example code, but the fastest this would run is 700 kHz. So I started stripping that down to its essential components, finally arriving at code that would run at 910 kHz. All this code has in it is a Loop Timer, an FPGA I/O node that reads all eight AI channels, and a timed loop to write the values to a FIFO. How can I make this run faster.

A related problem is that the code occasionally seems to drop one of the measurements. Since I have stripped out any error detection to make it run faster, I have no way to know this, so my data are scrambled when I read it from the FIFO on the host side.

Daniel Lathrop

James.M · ‎08-06-2015

For an issue like this, it really is much more helpful to share your code. There could be a multitude of ways to increase your speed.

You shouldn't need to remove error handling to run faster...

It sounds to me like you might be trying to use software timing instead of hardware timing.

Cheers

--------, Unofficial Forum Rules and Guidelines ,--------

'--- >The shortest distance between two nodes is a straight wire> ---'

dlathrop · ‎08-06-2015

Allright, here is the FPGA code.

dlathrop · ‎08-06-2015

As you can see, I am using hardware timing. The onboard clock is 40 MHz, so if I set AI Loop Timer to 40 I should be sampling at 1 MHz. The measured number of ticks in AI Loop Rate is 44, corresponding to 909 kHz.

James.M · ‎08-06-2015

You are using software timing. You're hardware timing your FIFO Write loop, which is unnecessary. My statement earlier was a mistake because I didn't look in to the device. You can only do SAR acquisition on this device, no Delta Sig timing. Delta Sig would have allowed you to set a hardware timed data rate and then the Read node would have been the source of your timing.

So you're doing some weird stuff... Really if you want to sample at 1MHz continuously, your timed loop would be surrounding your AI0-7 node. You have it, instead, around the FIFO output. You don't want to time a FIFO output, just replace that loop with a For loop and write that data right away.

What you're trying to do is fit a loop that runs for 4 ticks inside of your 1MHz AI loop, which is impossible when you're trying to use a Wait Until Next Multiple loop timer.

Cheers

--------, Unofficial Forum Rules and Guidelines ,--------

'--- >The shortest distance between two nodes is a straight wire> ---'

dlathrop · ‎08-06-2015

Hi James,

Replacing the timed loop on the FIFO writes with a for loop increases the Loop Rate from 44 to 53.

Thanks,

Dan

James.M · ‎08-06-2015

That doesn't make sense... going from a timed loop to a loop that runs as fast as possible should speed things up.

Did you run it continuously, or just a single iteration? Have you tried playing around with it in simulation mode? This would essentially have it running on your 1kHz clock, so you would see it run at 40ms periods.

Can you share a snippet of your new code, just to confirm everything?

Cheers

--------, Unofficial Forum Rules and Guidelines ,--------

'--- >The shortest distance between two nodes is a straight wire> ---'

natasftw · ‎08-06-2015

@James.M wrote:

You are using software timing. You're hardware timing your FIFO Write loop, which is unnecessary. My statement earlier was a mistake because I didn't look in to the device.

So you're doing some weird stuff... Really if you want to sample at 1MHz continuously, your timed loop would be surrounding your AI0-7 node. You have it, instead, around the FIFO output. You don't want to time a FIFO output, just replace that loop with a For loop and write that data right away.

What you're trying to do is fit a loop that runs for 4 ticks inside of your 1MHz AI loop, which is impossible when you're trying to use a Wait Until Next Multiple loop timer.

How exactly one would use "software timing" in an implementation where software doesn't exist? He's working with the FPGA.

I'm getting the impression you aren't versed in FPGA. The SCTL (single cycle timed loop) is more efficient than the for loop. It removes any forced waits at the end of cycles and completes everything within a cycle. It's expected to increase computation time to remove the SCTL. In this case, you're running a SCTL 4 times. That will take approx 100ns to complete. Your code will require more effort from there. Have you considered bitpacking? You're currently working with 4 iterations to send 4 data points. Why not pack them into a 64-bit u64, send them through the FIFO, then split them again on the other end? You're using rather large fixed point words, but you can still cut your iterations in at least half that way. If you're worried about maximizing your sample rate, you probably want to consider using additional FIFOs so you only have a single loop rather than multiples. If you add your I/O node to your SCTL, does that give an error? I'm pretty sure AI tasks make the SCTL feisty.

James.M · ‎08-06-2015

natasftw wrote:

How exactly one would use "software timing" in an implementation where software doesn't exist? He's working with the FPGA.

I was referring to the loop timer rather than relying on the Read node for timing. SAR vs Delta Sig. You're right, it's still hardware timed each way, I guess I meant FPGA Clock vs Module timed.

The SCTL (single cycle timed loop) is more efficient than the for loop.

Learning something new. I've only done a few FPGA projects, so I haven't worked with much of the timing optimization methods. Sounds like you're much better equiped to help dlathrop with this. Thanks for the tips!

Cheers

--------, Unofficial Forum Rules and Guidelines ,--------

'--- >The shortest distance between two nodes is a straight wire> ---'

natasftw · ‎08-06-2015

If you can pass the CLA, you can put the SCTL to good use in the FPGA projects you work with. Take a look at it. Essentially, it removes the forced pauses used to make compilation easier. If you have a series of calculations that require 5 cycles to complete, there's logic in the background that forces the calculations to wait for the next cycle. This happens in a typical loop. In the SCTL, these forced pauses go away and the code processes as fast as the signals can work their way through the fabric.

The problem we're likely to run into with this is there's no way the ADC in his card can keep up with the 40MHz clock. I'd expect that to cause a timing error. I'd be curious to see if we could create a derived clock at a much slower rate that would accept the AI read. If we can sample the code in that loop and send it to a FIFO, we could use the 40MHz clock domain to process the data elsewhere (or pull it out on the RT or Host).

OP, it'd help more if you shared your project rather than the VI. That way, we don't have to setup all of the same FIFOs and guess what you're doing with them outside of the small picture we can see. Is this something you're able to do?

LabVIEW

Max acquisition rate with USB-7855R

Max acquisition rate with USB-7855R

Re: Max acquisition rate with USB-7855R

Re: Max acquisition rate with USB-7855R

Re: Max acquisition rate with USB-7855R

Re: Max acquisition rate with USB-7855R

Re: Max acquisition rate with USB-7855R

Re: Max acquisition rate with USB-7855R

Re: Max acquisition rate with USB-7855R

Re: Max acquisition rate with USB-7855R

Re: Max acquisition rate with USB-7855R