LabVIEW

cancel
Showing results for 
Search instead for 
Did you mean: 

Delete Array Duplicate for only one column

Solved!
Go to solution

SimpleJack wrote:

1. No the duplicates are not always adjecent and probably will not be.Duplicates can occur more than one time depending how many times the operator retests. You see, I am going after first pass yield, so I only care about the first entry.


If the duplicates are not necessarily adjacent, you need to use my version 3 or later. Version 3 also sorts by serial number. If you want to sort by first occurence of each serial number instead, you can do the following trivial modification.

 

 

 

In the case of many columns (55), this will be very efficient, because the bulk of the operations is done on the single column and the full dataset is touched only once at the end,

 

The sort step in my code is O(Nlog(N)) (see also) and most operations are "in place", while some of the alternative suggestions above are O(N²) and carry huge memory reallocation penalties due to constant array resizing. My version will probably be orders of magnitude faster for large arrays.

 

Can you work with the attached 2010 snippet or do you want real VIs in a possibly earlier version?

0 Kudos
Message 11 of 29
(1,755 Views)

Here is how I roll:

 

RemoveDuplicateSN.png

 

Another time I wish for a reverse iteration terminal:

http://forums.ni.com/t5/LabVIEW-Idea-Exchange/Reverse-Iteration-Terminal-in-For-Loop/idi-p/1174449

 

 

Message 12 of 29
(1,742 Views)

I knew if I wait enough somebody will bring variant attributes to the table. 😄

 

(imagine how much more intuitive it would look after this idea is implemented :D)

0 Kudos
Message 13 of 29
(1,737 Views)

Well it still cheeses me off that I have to use a shift register there because (at least up to LV10) the Feedback Node chokes on variants (similar code is much, much slower Smiley Mad ).  Assuming that is fixed by LV11 or soon thereafter, we need this as well:

 

http://forums.ni.com/t5/LabVIEW-Idea-Exchange/An-output-terminal-for-feedback-nodes-that-mirrors-the...

 

I learned to search much earlier in the idea process after I had drawn the following and was proceeding to post:

 

FN output.png

Message 14 of 29
(1,730 Views)

Of course my version is almost an order of magnitude faster. 😄

0 Kudos
Message 15 of 29
(1,727 Views)

@altenbach wrote:

Of course my version is almost an order of magnitude faster. 😄


It's of course also 10 times faster than my code Smiley Sad But in your code, the speed decreases by half if you sort it as strings (i.e, you don't first convert it to a number).

0 Kudos
Message 16 of 29
(1,719 Views)

I only see about a factor of 3 difference at most (depends a bit on number of duplicates and elements), much of which is due to the fortuitous numeric conversion (myle00 stole my thunder there).

 

Personally I like the attributes in the read-a-text-file-and-remove-duplicates game for their mixture of speed (typically quite good) and flexibility (easy to key on multiple columns with multiple types).

 

0 Kudos
Message 17 of 29
(1,714 Views)

Sorry, I was in a seminar... I only tested on an old Athlon XP and I eyeballed it as 800ms/100ms, but it seems closer to about 6.5x. 😄

 

Yes it depends on the number of elements and the number of duplicates. The above is for an array 100000 with ~10 duplicates for each number.

 

Here is my benchmarking VI. I am sure that other processors will give different results.

 

(...and yes, if I would skip the numeric conversion, you would win by about a factor of three ;))

0 Kudos
Message 18 of 29
(1,705 Views)

@myle00 wrote:
It's of course also 10 times faster than my code Smiley Sad But in your code, the speed decreases by half if you sort it as strings (i.e, you don't first convert it to a number).

In my benchmark, yours is about 300x slower than mine (size=100000, ~10 duplicates each) while for 10x smaller inputs it is only 30x slower. As mentioned above, yours is O(N²) , so things really deteriorate if the sizes get bigger.

Mine seems really not much worse than O(N) and thus seems to scale about linearly with input size. (I guess the sorting is a minor part overall;))

 

A 10x larger array costs you 100x more while my code slows down only about 10x for the same increase. For a million elements, mine is 1.2 seconds (measured!), while yours would probably be around 5 minutes on the same computer (estimated, not tested). 

Message 19 of 29
(1,693 Views)

You're right. Mine is O(N^2) if there are no duplicates while the more duplicates present the closer to O(N) it gets. The sort function seems to be O(N), so that shouldn't make yours worse than O(N) (see attached).

0 Kudos
Message 20 of 29
(1,686 Views)