"Build Array" slowing down for 2d string arrays as it grows.

Ben · ‎07-17-2012

@Darin.K wrote:

@crossrulz wrote:

At the risk of sounding stupid, does anybody know the reason inserting into an array is faster than the build array? It seems to be like they should be performing the exact same function. I remember a discussion about it being faster, but I don't remember why.

.... The blue curve looks noisy, but the spikes are quite regular and a function of the row size, so they indicate calls to the memory manager for a new buffer. As you can see, it is quite greedy in requesting these. Insert into Array still does some operation proportional to the size of the array, but you can see the calls to the memory manager request a doubling each time,...

We noticed the same thing when examining it way back LV5 ?

The good news is the bad thing happens less and less frequently.

Over allocate an array in a buffer(Shift register), replace array subset, reshape array when done.

Ben

Retired Senior Automation Systems Architect with Data Science Automation LabVIEW Champion Knight of NI and Prepper LinkedIn Profile YouTube Channel

Darin.K · ‎07-17-2012

When I am able to parallelize the loop with a shift register then I will buy the over allocate solution. When the performance of an algorithm becomes linked too strongly to the implementation then there is an issue in my book. I know engineers like solutions, I am more curious about the problems. In this case I have now determined that no consideration is given for the special case of appending data in either case. Adding to the beginning or end takes the same time, so that is why the job is O(n^2) in both cases. Build Array will pad if necessary while insert will clip so perhaps that overhead is the speed difference. Insert already recognizes the special case of appending (indices unwired) it just needs to cooperate with the memory manager to optimize this case a little better. Elegance is having code which looks good run efficiently.

Norbert_B · ‎07-18-2012

@Darin.K wrote:

[..]Coincidence? Or perhaps Build Array is effectively performing a copy each step of the way. yuk.[..]

This does not really surprise me. To be honest, i always was under the impression that Build Array creates a new copy each time instead of resizing the array.

I think this is the case, because that "is the easy approach", at least it was it a couple of years ago...ok, 10 or more years ago 😉

One thing i want to add here is that Build Array therefore is worse regarding "memory fragmentation" than other array handling operations. So it is the most prominent example for wasting memory chunks and running out of memory because of this....

Norbert

PS: I attached my version of VIs for the benchmark Darin is referring to. Please note that the option "copy" does introduce jitter and offset to the benchmark for Insert Into Array. It seems that if "DONT copy" is selected, this overhead (case structure) is neglegtable. But when Copy is selected, it surpases Build Array in the benchmark, at least on my machine. This is imho due to different memory management regarding copy and inplacing the shift register, since Build Array does the copy and adding in a single step. And again, the case structure adds a little.

Note that the benchmark repeats the algorithm two times each to get better results. Increasing this will improve the values further (constant "#Benchmarks to Average").

Norbert
----------------------------------------------------------------------------------------------------
CEO: What exactly is stopping us from doing this?
Expert: Geometry
Marketing Manager: Just ignore it.

drjdpowell · ‎07-18-2012

Is there a reason that Build Array would be a better choice than Insert? Otherwise I don't understand why Build Array doesn't just invoke Insert under the hood.

epk · ‎07-18-2012

@Darin:

Interesting results!

@all the preallocate people:

Sorry, don't buy it. I'm sure that there will be times where I will want to optimize my code to the point where I am willing to preallocate an array, write more code to check the array each loop and enlarge it if necessary (especially in my while loops) and then more code to resize it in the end. Until then the good folks at NI have provided several one click methods of doing that automacally. The point is, one of the 3 (build vs. insert vs. autoindexing) is much less effcient. So for now build should be avoided (or preferably improved or deprecated).

This may not have been as much of an issue in the past but in this day and age with larger and larger data sets it will start to become an issue for a lot of people. For 15 years it was not an issue for me, but I am currently working on a project where the data processing, and thus GUI lag, changed from a 30 second to a 2 second delay by replacing builds with inserts. The former simply would not have been acceptable to my end users. A cautionary tale to say the least.

Darin.K · ‎07-18-2012

@drjdpowell wrote:

Is there a reason that Build Array would be a better choice than Insert? Otherwise I don't understand why Build Array doesn't just invoke Insert under the hood.

Well, Build Array will pad a 2D array so if that is desirable then you should choose it. It is also more forgiving in terms of mismatched reference types and such.

And before you start running out and replacing your Build Array functions, remember that this is for appending rows to 2D only! For 1D appending the opposite holds.

Here is what it looks like when LV gets it right (and wrong). This is a 1D benchmark with concatenating inputs appending a fixed length array. The only slowdowns for BA are the buffer allocations, overall cost is order the fixed sized of the appended array, no dependence on the length of the array anymore. Insert is still O(n^2).

crossrulz · ‎07-18-2012

So what you are finding is that Build Array is better for 1D arrays and Insert Into Array is better for 2D arrays? Yeah, NI has something screwed up. But this is still good data have.

There are only two ways to tell somebody thanks: Kudos and Marked Solutions
Unofficial Forum Rules and Guidelines
"Not that we are sufficient in ourselves to claim anything as coming from us, but our sufficiency is from God" - 2 Corinthians 3:5

Darin.K · ‎07-18-2012

@crossrulz wrote:

So what you are finding is that Build Array is better for 1D arrays and Insert Into Array is better for 2D arrays? Yeah, NI has something screwed up. But this is still good data have.

My current understanding is that Insert into Array is consistently bad, Build Array goes from good to even worse. You hear anecdotes about the compiler team adding special optimizations for certain special situations, it sounds like that has happened to Build Array. In the general case it copies everything at the input into a new output buffer, in the special case it recognizes when it can leave the array alone while appending. Insert into Array seems to Always copy everything as well, only considerably faster.

In the 1D case BA is very fast for appending, slow for prepending (makes sense). Insert into Array takes the same time regardless of insert position, and this time is close to the BA time for prepending. Every case is the worst case for Insert, I guess there is a case for consistency.

The solution IMO is for Insert Into Array to recognize the special case of appending and optimize accordingly. It already recognizes this case since this is what it does when the indices are unwired.

Ben · ‎07-18-2012

@Darin.K wrote:

@crossrulz wrote:

So what you are finding is that Build Array is better for 1D arrays and Insert Into Array is better for 2D arrays? Yeah, NI has something screwed up. But this is still good data have.

My current understanding is that Insert into Array is consistently bad, Build Array goes from good to even worse. You hear anecdotes about the compiler team adding special optimizations for certain special situations, it sounds like that has happened to Build Array. In the general case it copies everything at the input into a new output buffer, in the special case it recognizes when it can leave the array alone while appending. Insert into Array seems to Always copy everything as well, only considerably faster.

In the 1D case BA is very fast for appending, slow for prepending (makes sense). Insert into Array takes the same time regardless of insert position, and this time is close to the BA time for prepending. Every case is the worst case for Insert, I guess there is a case for consistency.

The solution IMO is for Insert Into Array to recognize the special case of appending and optimize accordingly. It already recognizes this case since this is what it does when the indices are unwired.

Re: 1d BA

I think LV 6i was the version where NI optimized the BA with the noteworthy enhance being the output array is allocated based on the number of iterations of teh For loop.

Re: insert 2D

worse case situation would be to insert at 0,0 where everything has to be moved around since the stride changes.

Ben

Retired Senior Automation Systems Architect with Data Science Automation LabVIEW Champion Knight of NI and Prepper LinkedIn Profile YouTube Channel

Darin.K · ‎07-18-2012

@Ben wrote:

Re: 1d BA

I think LV 6i was the version where NI optimized the BA with the noteworthy enhance being the output array is allocated based on the number of iterations of teh For loop.

Re: insert 2D

worse case situation would be to insert at 0,0 where everything has to be moved around since the stride changes.

Ben

Nice catch on the output array being based on the number of iterations, I think I can make sense of the picket fences now. Doubling is clearly a much better strategy, but this works.

Prepending the first column in Insert 2D requires essentially the same time as appending a row, so like 1D every case is treated as the worst case.

LabVIEW

"Build Array" slowing down for 2d string arrays as it grows.

Re: "Build Array" slowing down for 2d string arrays as it grows.

Re: "Build Array" slowing down for 2d string arrays as it grows.

Re: "Build Array" slowing down for 2d string arrays as it grows.

Re: "Build Array" slowing down for 2d string arrays as it grows.

Re: "Build Array" slowing down for 2d string arrays as it grows.

Re: "Build Array" slowing down for 2d string arrays as it grows.

Re: "Build Array" slowing down for 2d string arrays as it grows.

Re: "Build Array" slowing down for 2d string arrays as it grows.

Re: "Build Array" slowing down for 2d string arrays as it grows.

Re: "Build Array" slowing down for 2d string arrays as it grows.