2D array, Array Subset, a fast methode

mcduff · ‎07-17-2017

@Altenbach - It's messy looking but not unmaintainable. The 1st inplace structure selects what row to choose, the second gets the columns.

When I insert my code into you benchmark replacing your code, everything runs slower. This includes Blok's original code or your code also. Not sure what effect my code is having by maybe it is unmaintainable.

Lastly if I do everything in place, that is, reuse the output buffer, then my code and your code give about the same result, 2017 version attached, at least on my computer. (For me this is more realistic as this is how I would design my program. I use the reshape array in case values change from a user standpoint due to some front panel event. ) BUT, your code is slower, that is, what took 3 ms in the VI you sent me, now takes 5 ms in the VI I modified. I have no idea what is going on except my code is covered in molasses and is infecting everything else.

Cheers,

mcduff

Blokk · ‎07-17-2017

Thanks both of you for the hints and discussion! By the way, for some reason the Lava snippet tool has always a bug on my PCs, and never found time to fix it. Even if I set it to downconvert version, keeps it at 2017 🙂

altenbach · ‎07-18-2017

@mcduff wrote:

When I insert my code into you benchmark replacing your code, everything runs slower.

I think the main reason is that all this explicit inplaceness of your code confuses the compiler and it adds a safety net by adding a buffer allocation right inside the case structure (the allocation dot can be eliminated by adding an always copy node right after the tunnel).

This buffer allocation for a new copy of the entire input array is significant overhead.

We can eliminate measuring that extra penalty by placing the case structure outside the timing code (see attached). Now we are no longer measuring that buffer allocation overhead (it still happens, but outside the measured time interval!) and both code version execute now about 4x faster and still about equal in speed.

Somebody from the LabVIEW compiler team could provide more insight, but I think your explicit inplaceness forces the buffer copy and prevents certain compiler optimizations. Just a wild guess!

LabVIEW Champion.

mcduff · ‎07-18-2017

Wow, confusing the compiler. Maybe my code should be moved to the Rube-Goldberg thread.

That buffer allocation is due to a hanging shift register, I think. If we connect everything via shift register, that is, the input array, then that buffer allocation goes away, see attached.

mcduff

crossrulz · ‎07-18-2017

But now you made it unfair for Altenbach's latest code since you have a shift register that is in place with yours but unneeded yet causes a copying of data in his (the upper shift register). And to be completely honest, you do should not include the case selector in the benchmark. I know that time is minuscule but, as we have observed, it can cause all kinds of compiler issues.

There are only two ways to tell somebody thanks: Kudos and Marked Solutions
Unofficial Forum Rules and Guidelines
"Not that we are sufficient in ourselves to claim anything as coming from us, but our sufficiency is from God" - 2 Corinthians 3:5

mcduff · ‎07-18-2017

If I am going against Altenbach, I need a big head start, if I am going to have any sort of chance.

I agree with you about the case selector.

I'll give you my use case for a 2D array subset similar to Blokk's original request.

I have a program that controls a DAQmx device where the user can select anywhere between 1 to 8 channels. Each channel can have up to 1 million points in it. I do not want to display 8 million points on a graph so I do a Min/Max decimation that depends on the width of the plot along with the end-points on the x-axis. (The user may have zoomed in on a feature.) The data for each channel is a row in a 2D matrix, the start and end index for the columns give you the region that you are focused on.

The decimate data subVI and its VIs are attached. (Sorry they are a bit messy.) For my real-use cases everything is green, that is, buffers are recycled and reused. You are correct that Altenbach's method is faster if the buffer's are not reused. But for me, then the benchmark is false because it is a scenario that I would not use in a real program. I tend to use old laptops for DAQ systems where memory is the biggest constraint, that is, I'll sacrifice some speed for memory.

Both you and Altenbach are knights and I have learned a lot from your posts, so thank you for all you help and advice over the years.

Cheers,

mcduff

crossrulz · ‎07-18-2017

The problem is that the LabVIEW compiler has so many optimizations, especially in memory, that you are telling it not to perform. You are just getting data out of part of the array. That branch is a data copy as much as using the Array Subset (it is actually a very small allocation to create a pointer to data in the larger array). And since you are not updating anything in the large array, the In Place Element Structures are doing nothing for you. Simpler is better here.

Of course, I have been proved wrong before. But this is what I have been told and observed over the years: things will work in place as much as possible by default when only a Read is performed.

There are only two ways to tell somebody thanks: Kudos and Marked Solutions
Unofficial Forum Rules and Guidelines
"Not that we are sufficient in ourselves to claim anything as coming from us, but our sufficiency is from God" - 2 Corinthians 3:5

mcduff · ‎07-18-2017

I agree that simpler is better. It quite possible that my convoluted code works better with my convoluted solutions. There is also weirdness with the in-place structure, see http://forums.ni.com/t5/LabVIEW/LabVIEW-2015-Buffer-Allocation-Bug/td-p/3300392/page/2

(Simple operations like add have now been fixed in LV2017 see above thread)

I was/am using NI's Tools to diagnose my issues, which were similar to Blokk's original request.

In the decimate data VIs I posted previously, if I did not reuse the plot buffer, the Trace Execution kit would show an initial buffer allocation for the plot data, then on every iteration a buffer resize of zero. Once I started to reuse the buffer, that resize of zero was eliminated. According to Profile Performance and Memory Tool, the VI was using less resources and memory in that mode. There was also no memory allocation for any of the any of the array subsets in that decimation vi.

Here's part of the Trace that shows the only buffer being used is the plot data buffer nothing else. (Four channels 1 million samples per channel). The first allocates the whole data buffer 4 channels x 1M Samples x Double Precision equals 32MBy.(In the dataloop) In the Data Decimate VI only the plot buffer size changes. You can see the same change in the Main VI when it displays the data as a graph.

In this following case

I get a buffer allocation for the index array operation, if the loop is parallelized then the buffer allocation is multiplied by the parallel count. I have tried to use in-place and reshape array here, but the performance cost is worse than the memory cost, so I rather use memory here. (VI analyzer tells me I should use in-place operations for that VI.)

I would prefer simpler solutions but I'm a complex guy, a different complex than Altenbach. (Referring to a previous thread https://forums.ni.com/t5/LabVIEW/Trim-Whitespace-vi-is-not-re-entrant-Why/td-p/3646261 Message 10)

But I always try to follow your and Altenbach's advice.

Cheers,

mcduff

LabVIEW

2D array, Array Subset, a fast methode

Re: 2D array, Array Subset, a fast methode

Re: 2D array, Array Subset, a fast methode

Re: 2D array, Array Subset, a fast methode

Re: 2D array, Array Subset, a fast methode

Re: 2D array, Array Subset, a fast methode

Re: 2D array, Array Subset, a fast methode

Re: 2D array, Array Subset, a fast methode

Re: 2D array, Array Subset, a fast methode