LabVIEW

cancel
Showing results for 
Search instead for 
Did you mean: 

fastest loop computation speeds

Sorry, I don't have the IMAQ stuff installed at the moment. Can you turn the three 2D arrays into diagram constants with real data* and re-attach the VI. Thanks. As has been mentioned, you could just do autoindexing instead of using shift registers.

 

You are also doing the labview code in U8 while the matlab code uses DBL.

 

*create three indicators for the three 2D input arrays, run the VI once, change the indicators to constants (right-click...change to constant), the delete all the IMAQ stuff?

 

 

Message 11 of 18
(1,601 Views)

Thank you for that with that adjustment not only do I get the correct output but The new completion time is 0.049 seconds which is right in the area I was looking for.

0 Kudos
Message 12 of 18
(1,600 Views)

Here you go Altenbach, program works as it is supposed to and like I said above it's within the computation time I'm looking for but if you can figure out a way to make it run even faster, I happily welcome the adjustments. I did not readjust the display size of the arrays because since I'm working with images they are quite large (the truncated image I'm using is 320x240).

0 Kudos
Message 13 of 18
(1,583 Views)

Is there not a bug in the code as it is now; it seems to iterate over the rows based on the number of columns, and thus it also tries to replace non-existing elements? (The array size outputs the number of rows as the first element, not the second).

 

If the input arrays always have the same dimensions (?) you could simplify things by using auto-indexing:

 

indexing.PNG

 or the ability of the primitives to handle arrays directly:

 

arrays.PNG

These examples might not reproduce the results you want 100%, (especially if the row/column thing is not a bug), but they serve to illustrate other ways to handle the array processing. The first one should normally be faster than the shift register-based solution. The second wastes some memory by generating the boolean array and then coverting it to a numeric array, but you could avoid that by changing the math a bit I guess. Using primitives directly on arrays is slick and quick...

0 Kudos
Message 14 of 18
(1,558 Views)

Sorry, I haven't had time to look at your code yet, but the upper solution of Mads is similar to what I had in mind. The "atan" is probably one of the slower operations, so you might do one first and only do the other if the first one is "in range". If the first one fails, the second one can be skipped. You should do some statistics on typical data and reorder the operations such that whatever generally produces a deeper cut is tried first.

 

You have two operations with two 8 bit inputs. It might be faster to replace them with two 16bit lookup tables, completely eliminating all the orange wires and trigonometry.

 

You should also parallelize the outer FOR loop.

 

It actually might be more efficient to do the loop over the full 24bit color array (only one autoindexing array!), looping over U32 and doing the split into channels in the innermost loop using simple primitives. Only a benchmark can tell. (This would also eliminate all the splitting and joining operations)

0 Kudos
Message 15 of 18
(1,554 Views)

There are still a few questionable constructs. Did you really intend to mix the red and green in the outer shift registers? Seems odd. I changed it.

Squaring a U8 ranged number needs at least a U16 to avoid overflow, so your math is a little very questionable. I did not change it but you probably should.

 

Anyway, here are a few benchmarks. Since you have LabVIEW 9.0, I am not sure if you have the parallel FOR loop you have the parallel FOR loop, but it might not be as efficient as the more modern version. (I also attached the full 2011 LabVIEW version)

 

Comparing your modified example, which clocks in at about 19ms, my parallel implementation with one lookup table and staggered comparisons clocks in at 1.2ms, or about 16 times faster producing the same result. (this is on an old dual core CPU, I will try my 16 core monster later at work :). You mileage will vary)

 

This is just a very rough draft. Modify as needed. I listed my typical times to the front panel.

Download All
Message 16 of 18
(1,534 Views)

In my dual Xeon E5-2687W workstation (16 cores, 32 cores when counting hyperthreading), it does the original in 10ms, the ParallelU32" version in about 700 microseconds  and the "ParallelInlinedLookup" version in about 280 microseconds (In that last one, CPU use is only about 20%. With much larger data structures it will scale even better).

 

I am sure there is more slack left. 😄

Message 17 of 18
(1,511 Views)

@johnsold wrote:

Altenbach will shortly remind us that suing complex math may simplify things, also.


Yes, and it will also get rid of the overflow problem. This one is even faster.

 

 

Message 18 of 18
(1,495 Views)