"Build Array" slowing down for 2d string arrays as it grows.

epk · ‎07-17-2012

Has anyone else seen this behavior? As a 2d string array grows fairly large, the time it takes to add another row is increasing from close to 0ms by 1-2ms about every 1000 rows when using the "Build Array" function. This does not seem to occur if you use "Insert Into Array" or if you use indexing with the For Loop to build the array. This can have a large effect, for the code I have attached the part using "Build Array" takes about 15 seconds to execute, while using "Insert Into Array" takes a fraction of a second.

I've done some testing and it appears that this does not occur for numerical or boolean arrays and it happens only for 2d arrays, not 1d.(there may be a similar effect once the arrays are much larger, I didn't exhaust the testing of this).

I'm using Labview PDS 2011 SP1, this is happening on both Windows and Linux, in the development environment and as a compiled application.

SteveChandler · ‎07-17-2012

You should initialize the array outside of the loop. Inside the loop use replace array subset.

Edit: And watch those benchmarks. LabVIEW is pretty smart about what it does. If you have unused code then it likely will not be executed. Since your built arrays do not go anywhere then probably nothing happens. And with new versions of LabVIEW the loops will be executed if possible at compile time. In effect you have an array constant.

=====================
LabVIEW 2012

Mark_Yedinak · ‎07-17-2012

LabVIEW can do some really smart things with arrays if it knows the size. One reason they can be fast in for loops is because the compiler can preallocate the memory before the loop starts. Build array will need to move things around in memeory when the array grows and more space is needed. This is teh performance hit you are seeing. There are various posts and presents on improving LabVIEW performance. It would be worth your while to look through some of these. Knowing a bit about how the internal sof LabVIEW does things can help you to write more efficient code.

Mark Yedinak
Certified LabVIEW Architect
LabVIEW Champion

"Does anyone know where the love of God goes when the waves turn the minutes to hours?"
Wreck of the Edmund Fitzgerald - Gordon Lightfoot

Darin.K · ‎07-17-2012

Build Array in this case is a memory thrashing machine, Insert into Array is better, but still shows a scaling with size. Both cases give you O(n^2) time for the loop, you just have a smaller coefficient for Insert. I have added my own benchmarking below (LV9), be careful in yours because the two are running in parallel (no change in results this time). Strings are funny creatures (basically arrays, so string arrays are arrays of arrays), this is perhaps why they show atypical behavior. I can see similar, although much less drastic, differences in other datatypes, you just have to increase the array sizes a bit (about 100 elements/row).

Data for pre-allocation and replacement is not shown, it is barely distinguishable from the x-axis. Clearly the way to go, but you do not always know the size ahead of time so if you are forced to choose, at least choose Insert into Array.

When I can, I autoindex the For Loop since that is often parallelizable unlike using a shift register.

Mark_Yedinak · ‎07-17-2012

@Darin.K wrote:

Data for pre-allocation and replacement is not shown, it is barely distinguishable from the x-axis. Clearly the way to go, but you do not always know the size ahead of time so if you are forced to choose, at least choose Insert into Array.

I would still recommend preallocating provided you have an idea of the upper limit of the size and you are working with large data sets. You can always trim the unused portion of the array at the end.

Mark Yedinak
Certified LabVIEW Architect
LabVIEW Champion

"Does anyone know where the love of God goes when the waves turn the minutes to hours?"
Wreck of the Edmund Fitzgerald - Gordon Lightfoot

crossrulz · ‎07-17-2012

At the risk of sounding stupid, does anybody know the reason inserting into an array is faster than the build array? It seems to be like they should be performing the exact same function. I remember a discussion about it being faster, but I don't remember why.

There are only two ways to tell somebody thanks: Kudos and Marked Solutions
Unofficial Forum Rules and Guidelines
"Not that we are sufficient in ourselves to claim anything as coming from us, but our sufficiency is from God" - 2 Corinthians 3:5

epk · ‎07-17-2012

Darin,

Yup, precisely. I realized this behaviour in some code I was working on that did some data manipulation in a while loop so I can't preallocate the memory for the array.

I just found it interested (or odd, rather) that two basic Labview functions of arrays that are doing the same thing have drastically different effects on performance. In fact for years I have been naively working under the assumption that the "Build Array" would have less overhead than "Insert into Array" (I've seen some comments in this forum about "Delete from Array" being inefficient).

Cheers.

P.S. I knew someone was going to call me about my benchmarking with parellel loops, but since the fast one finished in a fraction of a second and the other took about 15 sec... well. But you're right. 😉

SteveChandler · ‎07-17-2012

Wouldn't the arrays in both loops be constant folded? If not, what prevents that from happening?

=====================
LabVIEW 2012

Darin.K · ‎07-17-2012

@crossrulz wrote:

At the risk of sounding stupid, does anybody know the reason inserting into an array is faster than the build array? It seems to be like they should be performing the exact same function. I remember a discussion about it being faster, but I don't remember why.

I think that Build Array has been around a long time and has not really been looked at much in the meantime. These things often come down to inplaceness, and it appears the BA assume no inplaceness.

I have attached another benchmark plot using rows of 512 I32s this time around. The blue curve looks noisy, but the spikes are quite regular and a function of the row size, so they indicate calls to the memory manager for a new buffer. As you can see, it is quite greedy in requesting these. Insert into Array still does some operation proportional to the size of the array, but you can see the calls to the memory manager request a doubling each time, this amortizes to a constant cost per insert and is a great way to go. This is what Shift Registers do up to a point (they stop doubling at some limit), so perhaps Insert into Array allows the Shift register to manage the buffer size as opposed to Build array.

They both scale with n, but I am usually inclined to live with that in exchange for code that is easy to write, easy to read, and ready to be turbocharged as compiler improvements come along. (I love coding for the algorithm as opposed to a particular implementation). As it is, Insert into Array is usually good up to some quite large value which probably covers 95% of the use cases, in the rest it then makes sense to take over some of the allocation duties. Overall, it really is time for NI to revisit LV memory management strategy, in the old days it made sense to begrudge us every byte, these days not so much.

@Steve: LV probably will not fold that code since it would result in a VI size of several Mb!

@epk: Delete from Array is very efficient, it is the best way to remove an element from an array, period. It is however, a poor way to remove several elements from an array since it then involves touching many elements repeatedly which turns an O(n) operation into an O(n^2) one.

BTW I use a Windows specific call to get the high precision clock value in LV9 (tick count would not reveal this kind of detail), in LV10 a similar version is included in vi.lib.

Darin.K · ‎07-17-2012

It gets pretty interesting, it turns out you can tweak things so that one function or the other starts requesting buffers like crazy. I did try one last experiment for now, I asked what is the craziest, most inefficient thing I could do to the Insert function, how about adding an Always Copy primitive right before it. Now look what happens.

Coincidence? Or perhaps Build Array is effectively performing a copy each step of the way. yuk.

Despite the appearance, the total time (ie. integral) for the Insert+Always Copy is still about 2/3 of BA's time.

LabVIEW

"Build Array" slowing down for 2d string arrays as it grows.

"Build Array" slowing down for 2d string arrays as it grows.

Re: "Build Array" slowing down for 2d string arrays as it grows.

Re: "Build Array" slowing down for 2d string arrays as it grows.

Re: "Build Array" slowing down for 2d string arrays as it grows.

Re: "Build Array" slowing down for 2d string arrays as it grows.

Re: "Build Array" slowing down for 2d string arrays as it grows.

Re: "Build Array" slowing down for 2d string arrays as it grows.

Re: "Build Array" slowing down for 2d string arrays as it grows.

Re: "Build Array" slowing down for 2d string arrays as it grows.

Re: "Build Array" slowing down for 2d string arrays as it grows.