LabVIEW

cancel
Showing results for 
Search instead for 
Did you mean: 

Large 2D array operation takes too much time

Solved!
Go to solution

Hi everyone.

 

I have a 2D array of 10 columns by 150.000 rows. Column N°6 contains a "Part ID" and I need to extract only rows with Part ID = 1.

I tried this :

benvdv_0-1608309587975.png

(The "0, default" condition just passes the array through the condition) The VI works, but for 150.000 rows, it takes too much time (sevral minutes) and the array is from a 7MB file... not so big! And it is far worst if I don't convert string to hex.

 

Why is it so long? How to optimize this code?

 

Any idea?

 

Any kind of help would be much apreciated...

Thanks!

 

Ben.

0 Kudos
Message 1 of 8
(2,390 Views)
Solution
Accepted by topic author benvdv

My first thought is to make it simpler with conditional auto-indexing, like this:

 

keep matching rows.png

 

See if that doesn't speed things up quite a bit.  To test it I just filled a 150000x10 array with random values ranging 1-6 (like a dice roll).  It runs in a small fraction of a second, I didn't try to time it carefully.

 

 

-Kevin P

ALERT! LabVIEW's subscription-only policy came to an end (finally!). Unfortunately, pricing favors the captured and committed over new adopters -- so tread carefully.
Message 2 of 8
(2,384 Views)

@benvdv wrote:

 

Why is it so long? How to optimize this code?

.


We cannot optimize a picture, so please attach your code and some default data (or means to simulate typical inputs of a desired size). Typically it is best to operate in place on the 2D array. (I've posted examples in the past). How many columns do you have?

 

The best procedure of course depends on many factors. Are matches most of the time or rare? (e.g. if the input has 150000 rows, does the output typical have 140000 rows or 10 rows?)

 

If the part number is rare, just repeatedly use search array on the extracted column. If it is abundant use my above suggestion. You can also save allocation by indexing out the element from the key row inside the loop.

 


@benvdv wrote:

And it is far worst if I don't convert string to hex.

 


I don't see where you convert anything to "hex" and I don't even understand what this means in the current context. You are converting to numeric and the display format is completely irrelevant). Have you tried operating on the string array instead?

0 Kudos
Message 3 of 8
(2,359 Views)

At Kevin 

For a test try

  • Moving the comparison outside the loop with a compare elements to col 6
  • Sort, reverse and threshold the boolean array
  • Preallocate a 2d array of nx10 i
  • Auto index a for loop looking at the unsorted boolean 
  • Run a I32 counter to a shift register and the preallocated array to another 
  • Drive a case on the boolean 

Replace row counter with row i from the original 2d array and increment the counter when True.  Conditionally exit the for loop when counter reaches the right size.

 

No array allocations in the loop at all, boolean case structure,  and we don't have to iterate on the rows at the end of the 2d array ( since we exit on the last T)

 

Note, if output order is not a requirement...the loop might be parallel able (I'd have to play with it)

 

Also it should be possible to index and bundle boolean,  row . Sort array of cluster, reverse and iterate the for loop exactly as many times as needed to replace each row ( the rows of interest are front loaded in the cluster array so the for loop only needs to auto index the cluster and replace each row of the new array.  Definitely parallelizable at the cost of some memory for the cluster array 

 

Regardless,  the size of the output array can be determined without any loop!


"Should be" isn't "Is" -Jay
0 Kudos
Message 4 of 8
(2,337 Views)

Here's something to try... 

 

 

altenbach_0-1608589523264.png

 

 

 

 

Do you know how long the scanning takes (Hex string to number).

 

(Note that If I reply to a post with only pictures, I reply with pictures instead of VIs ;))

0 Kudos
Message 5 of 8
(2,326 Views)

@JÞB wrote:

At Kevin 

For a test try

[snip]

 

Regardless,  the size of the output array can be determined without any loop!


Gave it a shot (at least it's close to this idea) and with debugging disabled I got 1.4 ms for a 150,000 x 10 array. A 1,500,000 x 10 array was around 20 ms. Honestly it took WAY longer to generate the random numbers than it did to pull out the sorted values 🙂

 

Example_VI_BD.png

 

Note that the output of Boolean to 0,1 is of type U16, and Sum Array Elements returns the same type as its input, so you'll overflow unless you upconvert the Integer-ized Boolean array before summing it. (For the record, my 1.4 ms was without a To U32, and my 1.5 million version included the To U32).

 

Anyway that makes me think... I wish Sum Array Elements was configurable somehow. Needing to store 1.5 million ones or zeros as 32-bit numbers isn't ideal when a U8 would do the job. I bet someone more savvy than myself could sum a U8 array with some sneaky bitshifting too, but then again it only takes 20 milliseconds to traverse 1.5 million elements.

0 Kudos
Message 6 of 8
(2,306 Views)

Hi everyone

 

Thank you for all your answers.

 

I tried the first solution, conditional auto-indexing, and it is fast enough for what I need to do here. (less than 10s for all my data files)

As I'm very busy before chrismas break, I accept it as a solution, but I will try other proposed solutions later.

I think it can be interesting to build a test bench to measure the code efficiency in the future...

 

Thank you all.

 

 

Ben.

 

 

0 Kudos
Message 7 of 8
(2,267 Views)

Anything pushing you close to 10 sec would be something *other than* the bits of array processing code in this thread.

 

I understand that sometimes good enough is good enough, but you should come on back here when you *do* have time.  Zip up your actual code along with a typical set of files, post here for the curious efficiency experts, and watch those seconds melt away!  (Note: it's probably helpful to "Save for Previous Version..." back to, I dunno, maybe LV 2015.   Just to maximize the # of people who'll be able to look at it.)

 

 

-Kevin P

ALERT! LabVIEW's subscription-only policy came to an end (finally!). Unfortunately, pricing favors the captured and committed over new adopters -- so tread carefully.
Message 8 of 8
(2,256 Views)