Very high speed acquisition with line camera

pwebster · ‎07-06-2010

Greetings,

I have a line scan camera that is capable of line rates in excess of 300 kHz. It is hooked up to a PCIe-1429 and I'm using LV2009. The application is in high speed feedback for an optical system. When using normal/regular frame heights of 100 to 1000 or more, I am able to successfully acquire at the full rate. Also, with a frame height of 1, MAX indicates an acquisition rate of nearly 290 kfps which is good enough for me. However, when it comes time to impliment this in a VI, the framerates I get are much lower. Using a high level grab, I can pull only 6.5 kfps and using low level ring buffering (as per the LL ring example provided) I'm limited to 9 kfps.

My questions are: what is MAX doing that I can't? Is there any way to increase the framerate? Using more lines per frame is not an option for me unless I can read lines from the frame before it has finished acquiring. Specifically, I need to minimize the the total time from the end of the line integration to when a decision can be made by my VI from that data. I've already checked that my processing is not the bottleneck. To be sure, I've put the acquisition in a barebones VI to obtain the speed measurements I've given here.

Any advice would be appreciated.

PW

BlueCheese · ‎07-07-2010

Hi PW,

I don't have any benchmarks that are comparable, but I am doubtful that you will be able to achieve 300kHz frame rates via conventional means. The reason is that each time you do a Grab or Extract of an image, there is some coordination with the hardware to actually protect the buffer from being overwritten. This is by design of the hardware and software. There is some amount of latency and overhead in this interaction that I think would keep you from ever individually extracting each line. MAX shows two measurements, "acquired" and "processed" rates. I am assuming you are comparing the "acquired" framerate in MAX to the "processed" framerate in your app. You should be able to modify the example VI to give you the same measurement MAX is.

However, I think you also need to step back and realize that the overall latency of the system is going to be likely higher than the ~3us that you have between each line. The notification of a frame being done on the IMAQ framegrabbers (or pretty much anybody's framegrabber) is done by interrupts. Typical interrupt latency on a Windows system can be as high as 250-500us (what I've seen from 3rd-party measurements). Of course it goes without saying that the jitter in this measurement on Windows (being a non-real-time system) absolutely dwarfs your 3us line-interval (note that I do not know offhand what the latencies and jitter are on a LabVIEW Real-Time system).

Given that you are already going to be processing lines that could be hundreds of lines old, you really don't stand to lose much by combining multiple lines per frame to save on the interrupt and extraction overhead. However, here are some suggestions I can think of:

- Buffer extractions in IMAQ are like a roadblock to the hardware. Any buffers newer than the one extracted are protected as well while it is extracted. You can do things like extract one old buffer and then access all the ones between that and the latest buffer without actually extracting it. This would save on the extraction overhead (although the hardware will still be interrupting a lot)

- You could try this on a LabVIEW RT desktop system. At the very least the jitter would be a lot less than on Windows and I imagine the latency would be much less as well due to the architecture of the OS. You could download the trial version and test this out

- You could use the NI-1483R which is a FlexRIO adapter module. If you can do all your processing on the FPGA then you can guarantee latency of no more than 1 line interval (or whatever your processing needs) with basically zero jitter. This type of high-speed feedback system is exactly the type of problem this solution is suited for. (https://www.ni.com/en-us/shop/model/ni-1483.html)

Hope this helps,

Eric

pwebster · ‎07-07-2010

Hi Eric,

Thanks for the reply. I'm going to give RT a shot to see what I can do. The interrupt latency argument is very believable, though since I'm able to get 9kfps right now I'd say my latency has to be at least as low as ~100 us. I agree that the NI-1483R and the associated FPGA modules would likely be the answer here, though not a cheap one. Do you know off the top of your head if they are capable of doing FFTs?

I just have a couple more more questions about what you wrote:

1. With regards to buffer protection, I thought the whole point of putting 100 or so buffers in a ring is that you could write to one buffer while reading from another. Is that wrong or are there another set of buffers that effectively impose this restriction regardless of my ring? (This is related to the LL ring buffered acquisition demonstrated in \National Instruments\LabVIEW 2009\examples\IMAQ\IMAQ Low Level.llb )

2. This may be a case of symantics, but in MAX, it diplays "displayed/acquired" and not "processed/acquired". Yes, the "displayed" number is quite a bit lower (in fact it's much lower than even the rate I'm getting in my VI) and I had assumed that this involved a bottleneck associated with getting the pixels from system memory to the display. Since my framegrabber supposedly has no internal memory (it has a direct pipe to the system RAM via the PCIe bus), the "acquired" number (if it is real) must mean the rate of frames making it to the system memory. Is there some low-level way to access that memory in a round-about way without assigning any protection to the buffer?

Thanks again,

PW

pwebster · ‎07-07-2010

A further question:

I'm thinking of buying an ETS license to go to RT. Since it's not included in my LV2009 academic site license (right?) and I'm going to have to lay out some cash, is there any way to confirm a performance increase before I buy? The RT development kit has a demo, but I don't think the ETS client does. Do any LabVIEW engineers here have a testbed (ETS client, PCIe-1429, high speed line camera) that they can run a simple code on and see how many single line frames per second they are able to pull?

PW

BruceAmmons · ‎07-07-2010

If you are trying to process 300k lines per second individually, that means your results are being produced at the same rate. What are you doing with these results? I can't imagine that you could do anything with the results and keep up with the acquisition. To me, it sounds like you are trying to do way more than could possibly be necessary. Can you explain why you need to process 300k lines/sec with no lag?

Even with RT, you are probably going to get behind because FFTs can only be calculated so fast. The processor will also need to spend some time doing something with the results.

FPGA is the only way I think you could really process this many lines and keep up. There are FFT routines available for the FPGA, but I haven't tried using them.

Bruce

Bruce Ammons
Ammons Engineering

BruceAmmons · ‎07-07-2010

The RT system is going to perform very closely to the equivalent Windows system. There will probably be a slight improvement due to less OS overhead, but other than that the CPU can only process so much data per second.

The advantage of RT is scheduling. You can make things happen exactly when you need them to, which is practically impossible with Windows.

Bruce

Bruce Ammons
Ammons Engineering

pwebster · ‎07-07-2010

Hi Bruce,

Thanks for commenting on this. Does RT have the same degree of interrupt latency that Windows does? I don't expect that its processed pixels/second rate would be any higher than Windows, just that it might not get so gummed up dealing with the acquisition buffer. The FFT I mentioned is actually for a further project. My current processing is actually very simple. Tested in a parallel, decoupled loop with image acquisition going on, I can process 120 klps+ which is probably fast enough. In this incarnation, I want the true/false result of the processing to send a DAQ digital pin low or high. There will undoubtedly be additional latency with that, but I wanted to settle this problem with the acquisition first since it seemed that it "ought not to be so" based on the "Acquired" rate that MAX was showing for single line frames.

To clarify, my current goal is to see just how fast I can go with the current setup. I agree that FPGAs seem to be the way to go ultimately, but I'd rather not go out and spend 10k+ and have to learn about FPGAs (are NI FPGAs a big change from regular LabVIEW G coding?) if my current problem is not being clever enough with programming LabVIEW.

PW

BlueCheese · ‎07-07-2010

The interrupt latency argument is very believable, though since I'm able to get 9kfps right now I'd say my latency has to be at least as low as ~100 us.

Throughput and latency are very different. For instance, it could be possible to be processing 9kfps with 10ms of latency from image capture to processing (your buffer list must be long enough though or else you would end up skipping frames from even being acquired). Since the board acquires asynchronously from your processing, both sides can achieve a high throughput even though there could be latency from one to the other.

I agree that the NI-1483R and the associated FPGA modules would likely be the answer here, though not a cheap one. Do you know off the top of your head if they are capable of doing FFTs?

Not sure, but it seems so: http://digital.ni.com/public.nsf/allkb/E8C29E834503EDE2862574DD0071B8A4

1. With regards to buffer protection, I thought the whole point of putting 100 or so buffers in a ring is that you could write to one buffer while reading from another. Is that wrong or are there another set of buffers that effectively impose this restriction regardless of my ring? (This is related to the LL ring buffered acquisition demonstrated in \National Instruments\LabVIEW 2009\examples\IMAQ\IMAQ Low Level.llb )

Yes, this is a correct interpretation. You put 100 buffers in the ring and the hardware fills them in a circular fashion. You can then wait for a particular buffer number to be ready and extract it, preventing it from being overwritten by the hardware. Note that the semantics of buffer extraction vary between IMAQ (framegrabbers) and IMAQdx (bus-based cameras).

2. This may be a case of symantics, but in MAX, it diplays "displayed/acquired" and not "processed/acquired". Yes, the "displayed" number is quite a bit lower (in fact it's much lower than even the rate I'm getting in my VI) and I had assumed that this involved a bottleneck associated with getting the pixels from system memory to the display. Since my framegrabber supposedly has no internal memory (it has a direct pipe to the system RAM via the PCIe bus), the "acquired" number (if it is real) must mean the rate of frames making it to the system memory. Is there some low-level way to access that memory in a round-about way without assigning any protection to the buffer?

Yes, MAX's "displayed" is essentially a Grab Image + Display. Since Grab is essentially Extract-Copy-Release, the whole sequence MAX does is "Extract latest image - Copy - Release - Display Copy". The rate at which this happens is the "displayed" rate. The rate at which the hardware puts the images into memory is the "acquired" rate. The way the IMAQ framegrabbers/drivers work is that the hardware has full write access to the entire circular buffer list unless the software puts a lock ("extract") on a buffer. The acquired rate will always equal the camera's frame rate unless the hardware loops around and hits an extracted buffer and can go no further. As long as you guarantee you keep a buffer extracted for less time than the time it would take to loop around the buffer list, the hardware will not drop frames in this manner (though your extraction/processing of them might not be able to keep up).

As for a low-level way to access the buffers without protection, since the IMAQ Ring VIs simply lets you pass in the buffer list of Vision images when you configure the hardware, you essentially already have access to the buffer list and can access them however you want. Buffer numbers in IMAQ translate to buffer indices as a simple modulus function. Obviously if you use no protection at all you cannot guarantee the results of your processing. However, if you extract one buffer you know that all the buffers newer than it are protected as well and can simply access them directly.

Eric

BlueCheese · ‎07-07-2010

@Bruce Ammons wrote:

The RT system is going to perform very closely to the equivalent Windows system. There will probably be a slight improvement due to less OS overhead, but other than that the CPU can only process so much data per second.

The advantage of RT is scheduling. You can make things happen exactly when you need them to, which is practically impossible with Windows.

Bruce

Hi Bruce,

Yes, I did not mean to imply that an RT system will ever process data faster than a Windows one. However, I am assuming that PW's problem here is strictly the driver overhead (assuming his processing is trivial) and the jitter/latency. I believe the latency from hardware interrupt to image processing will be significantly less on RT. The determinism of the ISR is much higher with lower latency and there is no kernel/user distinction on ETS. On Windows when you extract an image it requires one or more transitions between user and kernel mode which have some overhead (in the OS itself and on the CPU as the address spaces are swapped about). On ETS this transition is simply a function call.

Eric

BlueCheese · ‎07-07-2010

@pwebster wrote:

A further question:

I'm thinking of buying an ETS license to go to RT. Since it's not included in my LV2009 academic site license (right?) and I'm going to have to lay out some cash, is there any way to confirm a performance increase before I buy? The RT development kit has a demo, but I don't think the ETS client does. Do any LabVIEW engineers here have a testbed (ETS client, PCIe-1429, high speed line camera) that they can run a simple code on and see how many single line frames per second they are able to pull?

PW

Sorry, I do not know specifics of licensing. However, my impression was that the RT toolkit evaluation should let you create an RT boot disk which you can use to create an RT desktop and use it for as long as the evaluation license lets you. Maybe you can talk to your local NI sales rep to clarify?

Eric

Machine Vision

Very high speed acquisition with line camera

Very high speed acquisition with line camera

Re: Very high speed acquisition with line camera

Re: Very high speed acquisition with line camera

Re: Very high speed acquisition with line camera

Re: Very high speed acquisition with line camera

Re: Very high speed acquisition with line camera

Re: Very high speed acquisition with line camera

Re: Very high speed acquisition with line camera

Re: Very high speed acquisition with line camera

Re: Very high speed acquisition with line camera