LabVIEW Idea Exchange

spatry · ‎07-15-2013

I've been working with large arrays, and I've found that wire branches are killing my performance. In order to alleviate this I've scattered inplace structures all over, however, the only way I have to access the array size without incurring a copy of the array is to track it separately, and access that size property, this seems pretty wasteful.

I can think of two ways to implement this. The first is to add an array size block to the inplace element structure, this would be awkward to use. So, I suggest the the array size node be made inplace, as mocked up below.

Thanks

smithd · ‎07-16-2013

So, LabVIEW has proven to me yet again that I do not understand it at all. I was trying to make a quick example of the show buffer allocations tool to show mr. spatry, and got this image. Note the seq-struct terminal copy below. It persists even if I remove the seq struct. Is LV really making a copy here, or is that buffer necessary for some other reason...or is it all a lie, and I'm confused by some other thing?

altenbach · ‎07-17-2013

> Is LV really making a copy here, or is that buffer necessary for some other reason

If I understand this right, this has nothing to do with the presence or absence of the array size terminal (and the wire branch), but has to do with constant folding. Once you replace e.g. the zero in the left with an actual control, folding is eliminated and the extra buffer allocation disappears.

LabVIEW Champion.

SteenSchmidt · ‎07-17-2013

Remember that the dots merely mark possible buffer allocations. I'd suspect this is such a case, as I wouldn't expect more than the buffer alloc at the Init Array function. I suspect the IDE has trouble making sure the Replace Array Subset doesn't resize in this case. Probably just a limitation in the current version of LabVIEW.

/Steen

CLA, CTA, CLED & LabVIEW Champion

Darren · ‎07-17-2013

Array Size is not copying the array in the stated problem. The LabVIEW compiler treats this situation correctly. Also, creating pass-through wires for by-value terminals on LabVIEW functions is not generally seen as a usability improvement.

smithd · ‎07-17-2013

Not sure which of you is right (Steen or Alten), but they both make sense. Glad LV is not going crazy 🙂

AristosQueue (NI) · ‎07-17-2013

In this case, Alten is correct -- LV has a constant array and on each execution it makes a copy of that constant to be stomped on downstream. The first buffer dot on the Initialize node is the constant array itself -- not a "copy" but an allocation for an array. That array is used for the array size primitive. The dot on the sequence structure is the copy made for the Replace Subset node.

SteenSchmidt · ‎07-18-2013

AQ: Which case am I overlooking when I think that it would suffice with a single instance of the constant array in this case? I.e. as I see it the buffer allocation at Init Array is enough - everything else can use that original instance, even in parallel, as one branch only looks at the size (which remains constant), and the other branch just replaces a value without needing to re-allocate the array (it can do it in place).

It's also quite odd that the second buffer alloc dot disappears when you replace the constant DBL 0 with a DBL control on Init Array. This shouldn't affect the needed memory, only delay the point in time at which you have enough info to populate (init) the array.

Or, are you saying that LabVIEW keeps the constant array on the stack, and makes a copy for each exclusive Modifier downstream, even if there is only one Modifier and no Reader of the pre-modified array content?

On a separate note; how does LabVIEW select between making an extra data copy and serialization when solving a race condition of parallel operations? Say an Index Array and a Replace Array Subset in parallel operating on the same input array, they could use the same instance of the array if the Index Array was performed before the Replace, but if the Replace happened before the Index, then you'd need a data copy at the wire branch before these two parallel ops. Is this hardcoded or will it depend on some parameters at runtime?

/Steen

CLA, CTA, CLED & LabVIEW Champion

SteenSchmidt · ‎07-18-2013

AQ: A thought occurs to me - is the buffer alloc necessary because the constant array needs to be copied from one stack to another? One stack per thread, so for multithreaded apps you'll need the array on each thread's stack. While in that case I'd throw the array on the heap instead, and then a pointer in each stack at the most...

But then again none of that (constant array or initialized with a control value) should make the two scenarios differ. Sigh.

/Steen

CLA, CTA, CLED & LabVIEW Champion

AristosQueue (NI) · ‎07-18-2013

Steen -- although theoretically possible to reorder the instructions to avoid the data copy here, LabVIEW does not have special code for the Array Size primitive. That read is performed on a different data value than on the Replace and therefore the the array gets duplicated. It's an optimization that we've never bothered to apply to re-order array commands for that sort of simplification -- nor do we do that sort of field simplification for any other operation. If you have a cluster and you unbundle field A and downstream you bundle field B will duplicate the entire cluster. LabVIEW does not attempt to split the difference and say, "Well, I can actually technically share the data pointer because A is constant throughout." I believe this kind of optimization is already on our compiler backlog of optimizations we could implement.

As I have said before -- LabVIEW's compiler keeps getting smarter about optimizing memory copies, and at this point, it is generally smarter than any individual programmer, although in particular situations, we humans may recognize a missed optimization. This is one of those opportunities that LV missed, but I still wouldn't use that as evidence that you should generally try to outsmart the compiler. Let it do its job unless/until you notice a particular place is a hotspot in your code and then see if you can optimize that hotspot.

SteenSchmidt · ‎07-18-2013

AQ: Ok, I hear you. That just makes it quite unpredictable when a data copy will actually occur.

I'm for instance working on a Real-Time app that uses 377 Mb memory when it reaches equilibrium (takes about 15 minutes). This memory footprint is really stable, only varies +/- 200 kB over a hundred hours of runtime. It has taken a lot of effort to get there though, as we've experienced many timed structures that occasionally spiked severely making them finish late. A really periodic thing. It turns out that the biggest culprit is dynamic malloc - minimize that and Real-Time runs as smooth as it'll ever do.

The above isn't rocket science in Real-Time apps, but getting it to work can be. Avoiding blocking functions and shared resources is a deterministic task. It can be completed perfectly. Avoiding priority switching and the overhead that entails is quite doable as well. Avoiding 1:1 code paths that simply takes too many cycles to execute is trivial. But avoiding dynamic malloc is really hard to do, when you only get a "perhaps" out of the "show buffer alloc" tool, and you can't use common sense. I can never really be sure of if LabVIEW chooses to allocate more memory in this specific situation even though it technically doesn't have to. It often boils down to informed trial and verify, which isn't trivial when working with systems that can take half an hour to redeploy and as long time to reach steady-state.

Oh well, so much else is working brilliantly in LabVIEW, so I'm not really complaining 😉

Thanks for your answers!

Cheers,

Steen

CLA, CTA, CLED & LabVIEW Champion

LabVIEW Idea Exchange

Inplace Array Size