Most Efficient Way of Discarding Out-of-Range Data

kehander · ‎03-23-2006

My latest LabVIEW exercise is an attempt to improve the quality of a 1D array of data by discarding points outside of an interval cenetered around the data's mean.

It seems to me that there are three ways of doing this:

-Compare the data element-by-element with the two limits and discard anything that does not fit between them.

-Use Sort 1D Array followed by two Threshold 1D Arrays to find two indicies between which the data points will be kept.

-Use Array Max & Min to discard the maximum and minimum points in the array until the max and min fit within the interval.

It seems to me that the second option would be the most efficient, since the questionable points can all be shuffled to the start or end of the array and then overwritten as necessary. Still, it seems to me that it would ultimately depend on the efficiency of the algorithms involved.

This probably isn't an uncommon operation and I imagine someone has contemplated this before. If not, can someone suggest what the most appropriate way to time and compare the various operations would be?

LV_Pro · ‎03-23-2006

I'm sorry that I can't answer you off "the top of my head", but it seems that you have defined your question in a way where it should be easy for you or someone to make up an example of each and benchmark them. I would, but am up to my eyes at the moment. Who ever does would help the community by posting the results.

P.M.

Putnam
Certified LabVIEW Developer

Senior Test Engineer North Shore Technology, Inc.
Currently using LV 2012-LabVIEW 2018, RT8.5

LabVIEW Champion

pincpanter · ‎03-23-2006

My favourite is method 1. It involves only one memory access and two comparisons per element and it's an almost fixed-time method.

The other two are heavily data dependent. A sorting operation can be very quick (if the elements are already sorted for example), but typically involve many memory access and comparisons per element, plus moving data to new locations.
Using Array Min & Max mean re-considering all elements each time you execute it. It also can be quick in special cases, for example if all data are within the given limits.

Another consideration is that, although not specified in your exercise, in many cases the data points are to be considered part of a time series and should be printed/plotted in the original order. This requirement would rule out the sorting method.

Paolo

Paolo
-------------------
LV 7.1, 2011, 2017, 2019, 2021

Darren · ‎03-23-2006

Here's how I would do it:

It's pretty fast with a 100,000 element array, I'm guessing because of the pre-sizing of the array before the loop. Also, note the To I32 conversion bullet...this is required because of a "feature" of the Add Array Elements function that matches the output datatype to the input datatype...since the Boolean To (0,1) function generates an array of I16s, the Add Array Elements function will return a datatype of I16, even though there's a good chance that the sum of a bunch of I16s will be greater than 32767. Anyway, other than that little trick, everything else is pretty straightforward. It's handy that the In Range and Coerce function will take an array of numerics as an input.

Hope this helps,
-D

Message Edited by Darren on 03-23-200609:56 AM

Kevin_Price · ‎03-23-2006

Just a quick comment on a question you didn't ask...

I've often preferred to use deviation from Median value rather than deviation from Mean. Most any time I've wanted to summarize real-world data as an average, the median has done just as good at identifying the middle of a well-behaved clump of data. Where it really shines is in weeding out major outliers and short transient glitches. Those things can shift a mean noticeably away from the middle of the main clump, but tend to have almost no effect on a median.

Note that computing a median is likely more cpu-intensive than a mean. Here are code challenge results to give you an idea of computation time.

-Kevin P.

ALERT! LabVIEW's subscription-only policy came to an end (finally!). Unfortunately, pricing favors the captured and committed over new adopters -- so tread carefully.

kehander · ‎03-23-2006

Intriguing! It is indeed handy that In Range and Coerce can take an array of numerics. Thank you!

I was thinking that, rather than initializing a new array, it might be better to keep the old array, record the indicies of the out-of-range points, and then replace the individual points with new (and hopefully better) data.

But this, too, leads to a question: Would it be better to search the Boolean array for False values, or to move the comparison into a For loop? (Using Build Array would be a no-no, of course, but just as a quick example...)

Message Edited by kehander on 03-23-200611:17 AM

Darren · ‎03-23-2006

I think your second approach would be faster because doing In Range in a loop would be much faster than Search 1D Array in a loop.

-D

kehander · ‎03-23-2006

But still, the Search 1D Array would have to execute far fewer times than the In Range.

In fact, I tried the Timing Template example, and even my "optimized" For Loop code appears to execute more slowly than the While Loop (i.e., it sometimes takes a millisecond with 10,000 data points rather than less than a millisecond). Or is something very wrong here?

altenbach · ‎03-23-2006

Well, personally I would keep it simple.

Often you can re-use the original array and trim the excess at the end. This has huge advantages in memory management.
Keep it simple. Don't generate intermediary arrays (boolenas, indices, etc). Touch each element only once.

The attached example (LabVIEW 7.1) shows one possibility. If you check array buffer allocations, there is exactly one (!) in the small data generation loop and nowhere else. 😄

Message Edited by altenbach on 03-23-200611:12 AM

LabVIEW Champion.

LabVIEW

Most Efficient Way of Discarding Out-of-Range Data

Most Efficient Way of Discarding Out-of-Range Data

Re: Most Efficient Way of Discarding Out-of-Range Data

Re: Most Efficient Way of Discarding Out-of-Range Data

Re: Most Efficient Way of Discarding Out-of-Range Data

Re: Most Efficient Way of Discarding Out-of-Range Data

Re: Most Efficient Way of Discarding Out-of-Range Data

Re: Most Efficient Way of Discarding Out-of-Range Data

Re: Most Efficient Way of Discarding Out-of-Range Data

Re: Most Efficient Way of Discarding Out-of-Range Data