How to delete a random set of indizes from an array as fast as possible?

Bluegraf · ‎03-27-2006

Hello,

My question sounds a bit like this thread but it isn't... 😞

http://forums.ni.com/ni/board/message?board.id=170&message.id=129888&query.id=10689#M129888

When sampling position/force/resistance data I use a time triggered sampling approach. For later analysis I need (among others) force (Y) over position (x).

To get this I calculate the mean force from all time points with equal positions (as time is my common index). That works quite well but with arrays over 300000 points it becomes a bit too slow...

These are my steps so far:

1. Sort the position array

2. Eliminate duplicates from the position array (no array reshape in loop)

3. With the data from 2. get all indices where my original position array has a given value

4. With the indices from 3. get the corresponding force values from the force raw data and calculate the mean

In step 3 I need to search the whole raw position array as many times as there are different position values in itself.

If I could delete all indizes I already found from the array to search that should (in theory) significantly improve performance. On the other hand such an operation would mean a lot of array reshape operations.

What is the best approach to delete the found indizes?

OR

Is there no benefit in deleting the indizes as the array operations to do this need more time than they save?

Thanks!

Sören

Kevin_Price · ‎03-27-2006

Only time for some quick thoughts to get the ball rolling...

1. Sorting a large array can be expensive. Use only when truly necessary.

2 & 3. Identifying & eliminating duplicates may lead you to issues with "floating point equality" and inefficient "delete from array" operations.

One quick idea: use histogram-like binning based on "nearly equal" X positions. For each Y force in that bin, compute a ratio Y/X. The mean ratio times the median X value for that bin produces a representative mean Y force value for that bin. If Y and/or X can be in the neighborhood of 0, you may need to handle such bins more carefully.

-Kevin P.

ALERT! LabVIEW's subscription-only policy came to an end (finally!). Unfortunately, pricing favors the captured and committed over new adopters -- so tread carefully.

Bluegraf · ‎03-27-2006

Hi Kevin,

thanks for your awnser...

1. Is there a way to get rid of the duplicates without sorting? I sort the Array (hoping NI implemented a nearly ideal code 😉 ). Then I create a new array of the size of the original array and step trough the sorted array, copying all elements that are unique into the new array. As a last step I cut the new array after copying the last value. So I avoid reshapes in the loop.

2. Floating point equality is not an issue in my case as I use an incremental position counter which is polled far to often. (The whole problem came up when we found out that we need a higher sampling rate on other channels but still synchronous sampling. Before that we used the incremental position counter as sample clock and there was no need to reduce data.) But for other raw data sets this certainly is an important point to keep in mind.

My main problem are the "delete from array"-operations: I'm not shure if they can be coded in a way that their benefit of reducing the search operation time is higher than their cost. Or even worse - if there is a break-even point, where is it?

Meanwhile I coded an algorithm to delete multiple indizes from an array but it seems that there is still a little bug in so I couldn't compare it's speed.... work for tomorrow 😉

Sören

Kevin_Price · ‎03-28-2006

Actually, in thinking about this a bit more, I realized that I'm going to need to do a fairly similar thing for an upcoming project. I've got a device that will cycle through its positional range many times, and I'll need to perform some averaged spatial frequency analysis between the position data and some analog data. In my app, I plan to perform all the analysis offline so I can settle for an effective method even if it isn't efficient. But today's "offline" becomes tomorrow's "can we look at that in real-time?" so I'll want to aim for decent performance where feasible.

So now that I've got a vested interest , what kind of performance requirements are you trying to meet? Is it sort of a fuzzy requirement like, "gee, it seems like I have to wait a long time after I hit the button before the results show up in the graph?"

General thoughts: the 1D Sort function in LV is indeed pretty good. Delete from Array can require an awful lot of memory management overhead -- search the forums for this and other articles.

I suspect that a better approach is to associate the X and Y data together first and then sort the associated data. Here the association would be a cluster with Pos as the 1st element and Force as the 2nd. More tidbits on sorting clusters can be found here. After the sort, there's no need to "Delete from Array" or search for indices -- the associated Force data is already bundled with the corresponding Pos data.

Then final step should be fairly straightforward using an auto-indexed For loop and some shift registers. When the Pos value is same as previous iteration, increment a "# identical Positions" count, and add Force to "Sum of Forces". When it changes, calc results from Left-Hand shift registers and reset values for Right-Hand shift registers. The results can go into a pre-allocated array using "Replace Array Subset". The size of the pre-allocated array is the # of possible unique positions.

Post back with comments. I'm not yet ready to implement code for my app, and would be interested in the actual performance of different methods you try.

-Kevin P.

ALERT! LabVIEW's subscription-only policy came to an end (finally!). Unfortunately, pricing favors the captured and committed over new adopters -- so tread carefully.

Bluegraf · ‎03-28-2006

Hi Kevin,

just spent the whole day mixing up i and j in matrixes for that problem 😞 - this really keeps your brain smoking...

1. The requirements: The project in the moment ist just for lab use so there's no need for speed. But as we always need position/force-plots in production and they usually count in fraction of seconds I'm in the same position as you. 😉 My personal aim is to need less time for data processing than for data aquisition even at max sampling rate. That is 333 kS/s for 1 s now.

2. Delete from array: I got in touch with an NI-engineer and double checked my idea to create a second array, copy all elements which are not to be deleted. That avoids array resizing operations (and works quite well - the array melts away while being scanned). He agreed that this should be the fastest way but had the idea of not to copy the elements one-by-one but as whole blocks between the elements to be deleted. I'll test that.
But using this method in every array resizing operation in every single loop of the algorithm is certainly the most important point here.

3. I also thought about that approach. It should make programming easier but there ist that "no clusters in arrays"-prejudice. I don't know if it really slows operation down but as it makes programming easier it would not surprise me 😉 I hope to find some time to play around a bit with this one as well.

4. Meanwhile I'm working on a VI that uses an 2D-array which each column representing a y-waveform and each row being the same timestamp. You can choose which row should act as the X-axis and get back an 2D-array with (i-1) rows and a j-size that is typically smaller than the original array. The benefit is that it works with any number of measurement channels without further programming.
A version with a fixed number of Y-arrays is already running in a simulation environment but I wasn't able to test it on the DAQ-system yet. My first guess from the simulation results is that it cut down calculation time at 300 kS down by a factor of >3 (I had some 10 seconds and now it's less than 3 - just using the counting method)

5. For the final step I calculate the arithmetic mean using a method that seems nearly the same than you suggested (I didn't really get the bit with the right hand shift register). The square mean should be easy to realize using the same method, too. But I'd like to see some more methods like the median and just dropping all but one values - but all of that without creating arrays. Any idea how to realize this without building arrays?

Sören

Bluegraf · ‎03-28-2006

... one more thing: Till friday I'll be on the VIP 2006 near munic. So don't expect more posts before.

SE

unclebump · ‎03-28-2006

If you could turn the position data into an encoder, the pulses could become the external clock for data acquisition. Then you would get the y vs. x data when you acquire the samples.

Bluegraf · ‎03-28-2006

That was the way we originally worked. The method ist absolutely OK if the resulting sample rate is high enough.

But in our case we need some more signals at a higher sampling rate than the encoder can provide. This data should be synchronous to the position data. Sampling at different rates is not an option as we use NI6251 (at least without using some sort of external clock).
With the VIs in preparation I can sample at the highest necessary sampling speed and get the other data on the way. Originally the problem of the different sampling rates was solved by taking more measurement cycles at different sampling rates. That worked but we had n different measurement cycles leaving us with a synchronisation problem.

Now I hope to be able to drop the other measurement cycles and still get better -and synchronous- results.

Sören

unclebump · ‎03-28-2006

What were you sing for the encoder and how fast of sampling rate do you need?? You should be able to get encoders with micron resolution.

Kevin_Price · ‎03-28-2006

Now I'm getting a bit more confused about the app. If you need a sampling rate higher than the encoder can provide, then then only way I can think of to accumulate multiple readings at the same position is if you have some type of bi-directional cyclical motion. But if you originally used the encoder as a sampling clock, that seems to imply a unidirectional motion. Is the motion cyclical or unidirectional?

Knowing that I need to do some similar processing down the road, I did a bit of tinkering today. The method I described earlier took ~350 msec to process (sort and then calculate averages force per unique position) on 350,000 data points. A little more tinkering gave me a method that took <100 msec. The machine was a pretty new test PC with a 2.8 GHz Pentium. Here's an outline and perhaps I'll be able to scrub the code a bit to post soon. Hopefully someone will have some even better ideas!

The basic idea is essentially a histogram where each possible integer position value maps to a histogram bin.

1. Based on knowledge of your test equipment, you can know the # of unique positions that can possibly be recorded. Pre-initialize 2 arrays of this size before starting the main data acquisition. Both should be full of floating-point 0's. These will hold (a) count of entries and (b) sum of forces. Call them the binning arrays. My test used a size of 8000 possible positions.

2. An auto-indexing For loop goes through the 350,000 integer positions and floating point forces. The binning arrays enter this loop through shift registers.

3. For each pair, use the integer position to map yourself to the appropriate index of the binning arrays. Increment that element of the count array and add this iteration's force value to the sum array.

4. After loop completes, divide sum array by count array. These are your average forces. (This is why the count is computed as a floating point value. It was very costly to allow a type coercion from an integer count array into the division function.) Note that unsampled positions should have 0 counts and produce a NaN result from the division.

There are also ways to calculate a running average in the loop, but I haven't checked (yet) to see if it's faster. The median you mentioned would need some extra post-loop steps.

I'm sure that if this were a coding challenge, someone would cut the processing time at least in half... Anyone out there?

-Kevin P.

ALERT! LabVIEW's subscription-only policy came to an end (finally!). Unfortunately, pricing favors the captured and committed over new adopters -- so tread carefully.

LabVIEW

How to delete a random set of indizes from an array as fast as possible?

How to delete a random set of indizes from an array as fast as possible?

Re: How to delete a random set of indizes from an array as fast as possible?

Re: How to delete a random set of indizes from an array as fast as possible?

Re: How to delete a random set of indizes from an array as fast as possible?

Re: How to delete a random set of indizes from an array as fast as possible?

Re: How to delete a random set of indizes from an array as fast as possible?

Re: How to delete a random set of indizes from an array as fast as possible?

Re: How to delete a random set of indizes from an array as fast as possible?

Re: How to delete a random set of indizes from an array as fast as possible?

Re: How to delete a random set of indizes from an array as fast as possible?