memory manager optimization

QFang · ‎10-08-2015

I have not (yet) because it is currently not supporting RT targets and I'm not sure how valid the results are when the code is compiled for e.g. Windows PC RTE. .. I'll try it anyway on some non-RT test code just to see what it does though.

QFang
-------------
CLD LabVIEW 7.1 to 2016

nathand · ‎10-08-2015

@QFang wrote:

Part of what you are saying is at odds to the internal group within NI that deals with RT systems. They told me that anytime you hit the memory manager on (vxWorks? older?) cRIO targets, you incur a fairly substantial penalty.

I haven't had any direct contact with anyone at NI about this - did they give you any more detailed information? LabVIEW doesn't have a way to do an allocation that doesn't also copy into that just-allocated memory (there's no equivalent of malloc, unless you call DSNewPtr, which is another story entirely), so I don't think our statements conflict. My point was that you should focus on copies, not on allocations; if you do that, a side effect will be reduction of new buffers allocated. Also look to reduce your use of variable-sized data structures (particularly strings, although they're unavoidable with TCP communication). You're unlikely to be in a situation where you allocate a new scalar on every loop iteration; its size is fixed, so it should be allocated once (on the first loop iteration) and be reused on every following cycle, unless you're pushing that scalar into a queue or doing something else that would hold onto it for more than one iteration.

JÞB · ‎10-08-2015

@mcduff wrote:

My use case may differ, and by no means am I a programmer or any expert, but I like DVRs for the following reason:

I usually have multiple loops in my application, one that handles instruments, one that handles data, one that handles the UI, etc. When I download large data sets from an instrument, I like to put that data in a DVR, that way I am only sending a reference to the other loops, not a data copy. Within a loop, you are typically correct ...

The simplest LabVIEW code - the one using the minimum necessary number of nodes - is often the most efficient.

Look at the snippet the top case is more complicated but uses less memory in comparison with the bottom case. (Run them separately, I was too lazy to make two snippets)

Cheers,

mcduff

Lets go back a bit.... Is this example showing a potential slowdown between [Float] (array)+ NaN (Scalar)? The output of the Add primitive is, of course, an array of NaN {array size in} so perhaps replacing the Add with a Initialize Array of Size with value NaN could be cheaper? If so, that might be worth a side discussion.

WORSE: The compiler should throw out each of the "cases" shown as "Dead Code" Niether output is used so niether operation should actually be performed! This needs a bit of explaination as to the determination of why either is better. Looking at the snippet, the code is (Void) and should compile to a NOOP

"Should be" isn't "Is" -Jay

QFang · ‎10-08-2015

Pretty much nathand, we have. same goal, somewhat different approach.. for some reason I just really can't seem to nail down the whole IPE thing, the more I stare at it and the more I read, the more questions I get. Every time I feel its clearer though.. maybe this is the time it 'sits' for good.

I do appreciate you taking time (again) to answer and help out with these discussions.

Based on our discussions so far, I'm still not certain about the following diagram (below). I now understand that since I'm not operating ON the cluster data (except for the 'message array), there is nothing gained by having those cluster value 'updates' inside the IPE, on the other hand, it seems I may as well do it all in the IPE since I have it (no harm?)

What I'm not sure on is, based on your previous points about replace array element, is if there is even a point in using the IPE at all here? -On the one hand, I'm operating on a data array in the cluster (one of the IPE text book use-cases?), on the other hand I might just as well do the same operation by means of a single "Bundle by name". Note that the cluster control is a dummy control to prevent constant folding.

OR I could just do this, but would that re-use the memory without invoking the memory manager or not?

OR since the above will (supposedly) duplicate the whole cluster, perhaps this way (note cluster constant now has empty array): ?

QFang
-------------
CLD LabVIEW 7.1 to 2016

nathand · ‎10-08-2015

Personally I'd recommend the simpler, and more readable, "reshape array" to 256 elements. All three of your versions currently have an unnecessary 256-element control in them, of which a copy needs to be maintained because it's not changed, so you're not saving yourself much versus a constant. Let's say you had the front panel of the VI open - then it's obvious that the code has to make a copy of the 256-element array, because the copy that's displayed on the front panel doesn't change. When the front panel is closed, LabVIEW can use the front-panel data directly without making a copy, but that array still needs to get reinitialized to its default value on the next call (and again, it's copying data that's slow, not the memory allocation), and the array value is stored with the VI. Take a look at the buffer allocations (note, I haven't recreated this to check so I don't know exactly what it will show, but I'm curious). Reshape array avoids storing an array anywhere.

Depending on the source of that message array and its typical length, there might be better alternatives.

There's no harm, but also no advantage, in an IPE here. It will work just like a typical bundle except it takes up more diagram space. I recommend your third version, but with reshape array instead of replace array subset.

One other very minor optimization would be to replace In Range and Coerce with Max & Min, since an array size will never be less than 0. For that matter, if you can accept the array having less than 256 elements (but not more), then you could wire the Max&Min output to the reshape array size. Shrinking an array or keeping it at the same length won't result in a new buffer allocation, so you'll save yourself any copy there.

Ben · ‎10-08-2015

The IPE is useful when we want to place restrictions on the scheduler and dicates where when how a buffer is used. It does nothing the LV compiler can not do all by itself if we structure the code correctly.

So rather than telling the compiler via gesture and inuendo (how we structur ethe diagram) we can now do it explicitly.

But sometimes "mind the compilers buisness" is less than optimal.

Nuff on the IPE.

The age old solution I have used to control and limit memory allocations is to use and Action Engine where a shift register is used as the buffer. Use only inplace operations like "replace array subset" or if forced to use a subVI, make sure the connectors are on the root of the diagram of the subVI so the compiler can see there is no need to copy the data in the buffer.

The key to this approach may require you turn your structure inside out. Rather than passing data in and out of the AE, put all of the functions that ues the buffer INSIDE the AE.

Some file opeartions will duplicate the data pased to it. The raw binary byte writes can be used to write one byte at a time so the only buffer required is a byte.

Ben

Retired Senior Automation Systems Architect with Data Science Automation LabVIEW Champion Knight of NI and Prepper LinkedIn Profile YouTube Channel

mcduff · ‎10-08-2015

The 8-ball says ...

answer Jeff's Question

As far as I know the bottom case always uses more memory, not matter how you slice it. The compiler cannot use/schedule (do not know) the atan operation before reusing the bottom array to go NaN. The whole point of the plus is to reuse that buffer not make a new one. If I initialize an array I make a new one. In the top case since it is going point by point no new array is needed, thus cheaper in memory.

Hope this answers your question.

Cheers,

mcduff

Ben · ‎10-08-2015

I think Jeff was making the point that LV is smart enough to know if code wher the result is never used, is eliminanted. You have to wire the output to something, even the edge of a seq structure to trick LV into compile the otherwise "dead code".

Ben

Retired Senior Automation Systems Architect with Data Science Automation LabVIEW Champion Knight of NI and Prepper LinkedIn Profile YouTube Channel

mcduff · ‎10-08-2015

@Ben wrote:

I think Jeff was making the point that LV is smart enough to know if code wher the result is never used, is eliminanted. You have to wire the output to something, even the edge of a seq structure to trick LV into compile the otherwise "dead code".

Ben

Sorry for the misunderstanding.

Cheers,

mcduff

Bob_Schor · ‎10-08-2015

@QFang wrote:

Queues on RT of course is potentially a really bad idea anyway, if you are on RT for the sake of deterministic execution times anyway. If you don't care about jitter etc., then they are still somewhat fickle, but have less CPU overhead than the RT FIFO's.

As I recall, Queues are a bad idea for passing data out of a Deterministic Loop (i.e. an RT Timed Loop) -- that's what the RT FIFO is for. However, once the data are out of the time-critical part, I thought that Queues were recommended for Producer/Consumer sorts of actions.

Example -- RT Loop taking samples from Analog and Digital channels at 1KHz. Data passed by RT FIFO to "Data Packager" (a Producer), which assembles data into "packets" of 50 samples and sends it (via Queue) to Consumer, the Network Streams "Deliver to Host" routine.l

Bob Schor

LabVIEW

memory manager optimization

Re: memory manager optimization

Re: memory manager optimization

Re: memory manager optimization

Re: memory manager optimization

Re: memory manager optimization

Re: memory manager optimization

Re: memory manager optimization

Re: memory manager optimization

Re: memory manager optimization

Re: memory manager optimization