fastest loop computation speeds

altenbach · ‎06-18-2012

Sorry, I don't have the IMAQ stuff installed at the moment. Can you turn the three 2D arrays into diagram constants with real data* and re-attach the VI. Thanks. As has been mentioned, you could just do autoindexing instead of using shift registers.

You are also doing the labview code in U8 while the matlab code uses DBL.

*create three indicators for the three 2D input arrays, run the VI once, change the indicators to constants (right-click...change to constant), the delete all the IMAQ stuff?

LabVIEW Champion.

Joe_Szalko · ‎06-18-2012

Thank you for that with that adjustment not only do I get the correct output but The new completion time is 0.049 seconds which is right in the area I was looking for.

Joe_Szalko · ‎06-18-2012

Here you go Altenbach, program works as it is supposed to and like I said above it's within the computation time I'm looking for but if you can figure out a way to make it run even faster, I happily welcome the adjustments. I did not readjust the display size of the arrays because since I'm working with images they are quite large (the truncated image I'm using is 320x240).

Mads · ‎06-19-2012

Is there not a bug in the code as it is now; it seems to iterate over the rows based on the number of columns, and thus it also tries to replace non-existing elements? (The array size outputs the number of rows as the first element, not the second).

If the input arrays always have the same dimensions (?) you could simplify things by using auto-indexing:

or the ability of the primitives to handle arrays directly:

These examples might not reproduce the results you want 100%, (especially if the row/column thing is not a bug), but they serve to illustrate other ways to handle the array processing. The first one should normally be faster than the shift register-based solution. The second wastes some memory by generating the boolean array and then coverting it to a numeric array, but you could avoid that by changing the math a bit I guess. Using primitives directly on arrays is slick and quick...

Mads Toppe
Check out our Modbus Test Master - developed in LabVIEW

altenbach · ‎06-19-2012

Sorry, I haven't had time to look at your code yet, but the upper solution of Mads is similar to what I had in mind. The "atan" is probably one of the slower operations, so you might do one first and only do the other if the first one is "in range". If the first one fails, the second one can be skipped. You should do some statistics on typical data and reorder the operations such that whatever generally produces a deeper cut is tried first.

You have two operations with two 8 bit inputs. It might be faster to replace them with two 16bit lookup tables, completely eliminating all the orange wires and trigonometry.

You should also parallelize the outer FOR loop.

It actually might be more efficient to do the loop over the full 24bit color array (only one autoindexing array!), looping over U32 and doing the split into channels in the innermost loop using simple primitives. Only a benchmark can tell. (This would also eliminate all the splitting and joining operations)

LabVIEW Champion.

altenbach · ‎06-19-2012

There are still a few questionable constructs. Did you really intend to mix the red and green in the outer shift registers? Seems odd. I changed it.

Squaring a U8 ranged number needs at least a U16 to avoid overflow, so your math is ~~a little~~ very questionable. I did not change it but you probably should.

Anyway, here are a few benchmarks. Since you have LabVIEW 9.0, ~~I am not sure if you have the parallel FOR loop~~ you have the parallel FOR loop, but it might not be as efficient as the more modern version. (I also attached the full 2011 LabVIEW version)

Comparing your modified example, which clocks in at about 19ms, my parallel implementation with one lookup table and staggered comparisons clocks in at 1.2ms, or about 16 times faster producing the same result. (this is on an old dual core CPU, I will try my 16 core monster later at work :). You mileage will vary)

This is just a very rough draft. Modify as needed. I listed my typical times to the front panel.

LabVIEW Champion.

altenbach · ‎06-19-2012

In my dual Xeon E5-2687W workstation (16 cores, 32 cores when counting hyperthreading), it does the original in 10ms, the ParallelU32" version in about 700 microseconds and the "ParallelInlinedLookup" version in about 280 microseconds (In that last one, CPU use is only about 20%. With much larger data structures it will scale even better).

I am sure there is more slack left. 😄

LabVIEW Champion.

altenbach · ‎06-19-2012

@johnsold wrote:

Altenbach will shortly remind us that suing complex math may simplify things, also.

Yes, and it will also get rid of the overflow problem. This one is even faster.

LabVIEW Champion.

LabVIEW

fastest loop computation speeds

Re: fastest loop computation speeds

Re: fastest loop computation speeds

Re: fastest loop computation speeds

Re: fastest loop computation speeds

Re: fastest loop computation speeds

Re: fastest loop computation speeds

Re: fastest loop computation speeds

Re: fastest loop computation speeds