LabVIEW

cancel
Showing results for 
Search instead for 
Did you mean: 

Search for Duplicates in a 2D Array

I am working on a program that takes a user input of a job number (imported as a string).

With that job number I need to search a .txt spreadsheet file for the job number and create a cluster of information based on the items in the same row as the job number. I originally accomplished this as shown:

scan array.png

 

This worked very well. That is, until my data file changed. Now my data file can possibly have a duplicate item as attached. The only difference between the two rows is that the second column has a smaller value. I cannot change how I receive my data. I always need to return the row that has the larger value in the second column.

 

The method I came up with is as shown:

check for duplicates.png

 

This method seems to be a lot less efficient than the original method since it requires that even if the job number is the first item on the list, I must step through the entire list. The file I am working with is computer generated, so it will always be sorted by job number, but we cannot guarantee whether the first or the second or the third occurance of the item is going to be the longest.

 

Any thoughts on a more efficient method of handling this? I searched around and saw a posting that used variant data types to remove duplicate values from a 2D array. I tried to expand that to this application, but I wasn't able to follow the use of variants in that situation. This is the original posting I saw: http://forums.ni.com/t5/LabVIEW/Remove-Duplicate-Row-From-2d-Array/m-p/1211071/highlight/false#M5186...

 

Thanks,

Michael

0 Kudos
Message 1 of 4
(6,608 Views)

here is one way to find all matches first and then you can add a fine search/separation if there is more than one match

depending on the size of the sorted array it might be faster if you search for the first match and than look if the next still matches ...

find all matches.png

Greetings from Germany
Henrik

LV since v3.1

“ground” is a convenient fantasy

'˙˙˙˙uıɐƃɐ lɐıp puɐ °06 ǝuoɥd ɹnoʎ uɹnʇ ǝsɐǝld 'ʎɹɐuıƃɐɯı sı pǝlɐıp ǝʌɐɥ noʎ ɹǝqɯnu ǝɥʇ'


0 Kudos
Message 2 of 4
(6,592 Views)

A short question: If the jobs are sorted, then I think you do not need to search the whole list, rather only the next N jobs below the "user job". Is this correct?

My idea would be:

1) Get the first column, this is the column of the jobs. Get the second column, this is the column of the value

2) Use the function "Search 1D array" or "Filter 1D array" form the OpenG>Open G Array. These functions return the indexes of all elements in the array that coincide with "item to filter/search". So for example using job order "CN53646-000", the function will return a numeric array with indeces 0 and 1.

3) Use a for loop autoindexed to the numeric array. With a shift register implement a bubble sort, such that when finished it returns the index of the max element.

Note that the sorting would only be done between the 2 or 3 repeated job orders, and not the complete list. This refers to my initial question.

4) Use this index and your original VI to build the cluster.

Let me know if it is clear.

 

 

 

0 Kudos
Message 3 of 4
(6,586 Views)

if you have a very huge list (which would question a database 😉 ) of sorted items, it could be faster to code a successive search

lookup in the middle of the list

stepsize= 1/2^(number of lookups +1) (check if step is <1 : not found)

move up /down the list in stepsize if smaller/greater

if found check up and down for all matches

 

 

Greetings from Germany
Henrik

LV since v3.1

“ground” is a convenient fantasy

'˙˙˙˙uıɐƃɐ lɐıp puɐ °06 ǝuoɥd ɹnoʎ uɹnʇ ǝsɐǝld 'ʎɹɐuıƃɐɯı sı pǝlɐıp ǝʌɐɥ noʎ ɹǝqɯnu ǝɥʇ'


0 Kudos
Message 4 of 4
(6,577 Views)