12-18-2020 10:58 AM
Hi everyone.
I have a 2D array of 10 columns by 150.000 rows. Column N°6 contains a "Part ID" and I need to extract only rows with Part ID = 1.
I tried this :
(The "0, default" condition just passes the array through the condition) The VI works, but for 150.000 rows, it takes too much time (sevral minutes) and the array is from a 7MB file... not so big! And it is far worst if I don't convert string to hex.
Why is it so long? How to optimize this code?
Any idea?
Any kind of help would be much apreciated...
Thanks!
Ben.
Solved! Go to Solution.
12-18-2020 11:20 AM
My first thought is to make it simpler with conditional auto-indexing, like this:
See if that doesn't speed things up quite a bit. To test it I just filled a 150000x10 array with random values ranging 1-6 (like a dice roll). It runs in a small fraction of a second, I didn't try to time it carefully.
-Kevin P
12-18-2020 12:08 PM
@benvdv wrote:
Why is it so long? How to optimize this code?
.
We cannot optimize a picture, so please attach your code and some default data (or means to simulate typical inputs of a desired size). Typically it is best to operate in place on the 2D array. (I've posted examples in the past). How many columns do you have?
The best procedure of course depends on many factors. Are matches most of the time or rare? (e.g. if the input has 150000 rows, does the output typical have 140000 rows or 10 rows?)
If the part number is rare, just repeatedly use search array on the extracted column. If it is abundant use my above suggestion. You can also save allocation by indexing out the element from the key row inside the loop.
@benvdv wrote:
And it is far worst if I don't convert string to hex.
I don't see where you convert anything to "hex" and I don't even understand what this means in the current context. You are converting to numeric and the display format is completely irrelevant). Have you tried operating on the string array instead?
12-18-2020 01:35 PM - edited 12-18-2020 01:41 PM
At Kevin
For a test try
Replace row counter with row i from the original 2d array and increment the counter when True. Conditionally exit the for loop when counter reaches the right size.
No array allocations in the loop at all, boolean case structure, and we don't have to iterate on the rows at the end of the 2d array ( since we exit on the last T)
Note, if output order is not a requirement...the loop might be parallel able (I'd have to play with it)
Also it should be possible to index and bundle boolean, row . Sort array of cluster, reverse and iterate the for loop exactly as many times as needed to replace each row ( the rows of interest are front loaded in the cluster array so the for loop only needs to auto index the cluster and replace each row of the new array. Definitely parallelizable at the cost of some memory for the cluster array
Regardless, the size of the output array can be determined without any loop!
12-18-2020 02:00 PM - edited 12-21-2020 04:25 PM
Here's something to try...
Do you know how long the scanning takes (Hex string to number).
(Note that If I reply to a post with only pictures, I reply with pictures instead of VIs ;))
12-18-2020 04:16 PM
@JÞB wrote:
At Kevin
For a test try
[snip]
Regardless, the size of the output array can be determined without any loop!
Gave it a shot (at least it's close to this idea) and with debugging disabled I got 1.4 ms for a 150,000 x 10 array. A 1,500,000 x 10 array was around 20 ms. Honestly it took WAY longer to generate the random numbers than it did to pull out the sorted values 🙂
Note that the output of Boolean to 0,1 is of type U16, and Sum Array Elements returns the same type as its input, so you'll overflow unless you upconvert the Integer-ized Boolean array before summing it. (For the record, my 1.4 ms was without a To U32, and my 1.5 million version included the To U32).
Anyway that makes me think... I wish Sum Array Elements was configurable somehow. Needing to store 1.5 million ones or zeros as 32-bit numbers isn't ideal when a U8 would do the job. I bet someone more savvy than myself could sum a U8 array with some sneaky bitshifting too, but then again it only takes 20 milliseconds to traverse 1.5 million elements.
12-21-2020 04:01 AM
Hi everyone
Thank you for all your answers.
I tried the first solution, conditional auto-indexing, and it is fast enough for what I need to do here. (less than 10s for all my data files)
As I'm very busy before chrismas break, I accept it as a solution, but I will try other proposed solutions later.
I think it can be interesting to build a test bench to measure the code efficiency in the future...
Thank you all.
Ben.
12-21-2020 07:24 AM
Anything pushing you close to 10 sec would be something *other than* the bits of array processing code in this thread.
I understand that sometimes good enough is good enough, but you should come on back here when you *do* have time. Zip up your actual code along with a typical set of files, post here for the curious efficiency experts, and watch those seconds melt away! (Note: it's probably helpful to "Save for Previous Version..." back to, I dunno, maybe LV 2015. Just to maximize the # of people who'll be able to look at it.)
-Kevin P