07-17-2012 01:28 PM
Has anyone else seen this behavior? As a 2d string array grows fairly large, the time it takes to add another row is increasing from close to 0ms by 1-2ms about every 1000 rows when using the "Build Array" function. This does not seem to occur if you use "Insert Into Array" or if you use indexing with the For Loop to build the array. This can have a large effect, for the code I have attached the part using "Build Array" takes about 15 seconds to execute, while using "Insert Into Array" takes a fraction of a second.
I've done some testing and it appears that this does not occur for numerical or boolean arrays and it happens only for 2d arrays, not 1d.(there may be a similar effect once the arrays are much larger, I didn't exhaust the testing of this).
I'm using Labview PDS 2011 SP1, this is happening on both Windows and Linux, in the development environment and as a compiled application.
07-17-2012 01:31 PM - edited 07-17-2012 01:37 PM
You should initialize the array outside of the loop. Inside the loop use replace array subset.
Edit: And watch those benchmarks. LabVIEW is pretty smart about what it does. If you have unused code then it likely will not be executed. Since your built arrays do not go anywhere then probably nothing happens. And with new versions of LabVIEW the loops will be executed if possible at compile time. In effect you have an array constant.
07-17-2012 02:10 PM
LabVIEW can do some really smart things with arrays if it knows the size. One reason they can be fast in for loops is because the compiler can preallocate the memory before the loop starts. Build array will need to move things around in memeory when the array grows and more space is needed. This is teh performance hit you are seeing. There are various posts and presents on improving LabVIEW performance. It would be worth your while to look through some of these. Knowing a bit about how the internal sof LabVIEW does things can help you to write more efficient code.
07-17-2012 02:54 PM
Build Array in this case is a memory thrashing machine, Insert into Array is better, but still shows a scaling with size. Both cases give you O(n^2) time for the loop, you just have a smaller coefficient for Insert. I have added my own benchmarking below (LV9), be careful in yours because the two are running in parallel (no change in results this time). Strings are funny creatures (basically arrays, so string arrays are arrays of arrays), this is perhaps why they show atypical behavior. I can see similar, although much less drastic, differences in other datatypes, you just have to increase the array sizes a bit (about 100 elements/row).
Data for pre-allocation and replacement is not shown, it is barely distinguishable from the x-axis. Clearly the way to go, but you do not always know the size ahead of time so if you are forced to choose, at least choose Insert into Array.
When I can, I autoindex the For Loop since that is often parallelizable unlike using a shift register.
07-17-2012 02:59 PM
@Darin.K wrote:
Data for pre-allocation and replacement is not shown, it is barely distinguishable from the x-axis. Clearly the way to go, but you do not always know the size ahead of time so if you are forced to choose, at least choose Insert into Array.
I would still recommend preallocating provided you have an idea of the upper limit of the size and you are working with large data sets. You can always trim the unused portion of the array at the end.
07-17-2012 03:14 PM
At the risk of sounding stupid, does anybody know the reason inserting into an array is faster than the build array? It seems to be like they should be performing the exact same function. I remember a discussion about it being faster, but I don't remember why.
07-17-2012 03:23 PM
Darin,
Yup, precisely. I realized this behaviour in some code I was working on that did some data manipulation in a while loop so I can't preallocate the memory for the array.
I just found it interested (or odd, rather) that two basic Labview functions of arrays that are doing the same thing have drastically different effects on performance. In fact for years I have been naively working under the assumption that the "Build Array" would have less overhead than "Insert into Array" (I've seen some comments in this forum about "Delete from Array" being inefficient).
Cheers.
P.S. I knew someone was going to call me about my benchmarking with parellel loops, but since the fast one finished in a fraction of a second and the other took about 15 sec... well. But you're right. 😉
07-17-2012 04:14 PM
Wouldn't the arrays in both loops be constant folded? If not, what prevents that from happening?
07-17-2012 05:53 PM
@crossrulz wrote:
At the risk of sounding stupid, does anybody know the reason inserting into an array is faster than the build array? It seems to be like they should be performing the exact same function. I remember a discussion about it being faster, but I don't remember why.
I think that Build Array has been around a long time and has not really been looked at much in the meantime. These things often come down to inplaceness, and it appears the BA assume no inplaceness.
I have attached another benchmark plot using rows of 512 I32s this time around. The blue curve looks noisy, but the spikes are quite regular and a function of the row size, so they indicate calls to the memory manager for a new buffer. As you can see, it is quite greedy in requesting these. Insert into Array still does some operation proportional to the size of the array, but you can see the calls to the memory manager request a doubling each time, this amortizes to a constant cost per insert and is a great way to go. This is what Shift Registers do up to a point (they stop doubling at some limit), so perhaps Insert into Array allows the Shift register to manage the buffer size as opposed to Build array.
They both scale with n, but I am usually inclined to live with that in exchange for code that is easy to write, easy to read, and ready to be turbocharged as compiler improvements come along. (I love coding for the algorithm as opposed to a particular implementation). As it is, Insert into Array is usually good up to some quite large value which probably covers 95% of the use cases, in the rest it then makes sense to take over some of the allocation duties. Overall, it really is time for NI to revisit LV memory management strategy, in the old days it made sense to begrudge us every byte, these days not so much.
@Steve: LV probably will not fold that code since it would result in a VI size of several Mb!
@epk: Delete from Array is very efficient, it is the best way to remove an element from an array, period. It is however, a poor way to remove several elements from an array since it then involves touching many elements repeatedly which turns an O(n) operation into an O(n^2) one.
BTW I use a Windows specific call to get the high precision clock value in LV9 (tick count would not reveal this kind of detail), in LV10 a similar version is included in vi.lib.
07-17-2012 06:43 PM - edited 07-17-2012 06:45 PM
It gets pretty interesting, it turns out you can tweak things so that one function or the other starts requesting buffers like crazy. I did try one last experiment for now, I asked what is the craziest, most inefficient thing I could do to the Insert function, how about adding an Always Copy primitive right before it. Now look what happens.
Coincidence? Or perhaps Build Array is effectively performing a copy each step of the way. yuk.
Despite the appearance, the total time (ie. integral) for the Insert+Always Copy is still about 2/3 of BA's time.