memory manager optimization

QFang · ‎10-07-2015

I'm carrying a pre-allocated U8 buffer array on a shift register, dumping the 'active' portion of it to disk when it is getting 'close to full'.

My overall goal is to reduce calls to the memory manager as much as possible for this particular code. I insert data into the buffer array using in-place "array split / replace sub-arrays" which seems like a good idea(?), but when I go to get the 'data' portion of the buffer, am I better off using the in-place to get the sub-array (the size of which is close to the max size of the buffer array itself), or is it better to use 'Array Subset'?

My thinking is that the 'in-place' might be better since I'm not creating a copy of the main array, while if I wire the main array to 'Array subset' I have a branch/split on that wire, which creates a (new) copy of the whole array (except buffer allocation seems to indicate I don't on the input, but I do on the output.. sort of opposite of the in-place structure)?

On the other hand, the in-place shows two buffer allocations; the total size equaling the size of the buffer array, and another allocation on the write binary file VI anyway...

I always have a hard time trying to 'reason' or 'think' myself to the ideal solution in this type of a problem. 😕 Pointers and insight would be appreciated. Again, I'm trying to reduce dynamic memory alloc/dealloc as much as possible, not necessarily looking to maximize (cpu) performance.

QFang
-------------
CLD LabVIEW 7.1 to 2016

crossrulz · ‎10-07-2015

Unfortunately, I think somebody from NI will have to chime in here to get a definate answer.

But based on my limited understanding, the Array Subset will not make a copy of the entire array, but the subset you are getting. Supposedly, the compiler is smart enough to allow the read functions for the array happen before the write functions when on the same wire (branches count). If multiple write functions, then a copy will be make for that second write function.

I'm not completely sold on the IPES helping here based on what I just stated.

There are only two ways to tell somebody thanks: Kudos and Marked Solutions
Unofficial Forum Rules and Guidelines
"Not that we are sufficient in ourselves to claim anything as coming from us, but our sufficiency is from God" - 2 Corinthians 3:5

mcduff · ‎10-07-2015

I always use a construct like your first diagram, the in-place structure.

Cheers,

mcduff

Bob_Schor · ‎10-07-2015

Good question. I don't know the answer, and even if I said I did, why should you believe me (and who says that this would be true on your machine and with your version of LabVIEW)?

Besides, I'm a Scientist, so my answer would be "Do the Experiment". Write a little test routine, put timing code around it, and see how long it takes to do a reasonable number of writes similar to what you'd want to do with your code. Do it with the In Place method, then with the Subset method, and compare.

Several years ago, Darren Nattinger (who I believe still holds the title of The World's Fastest LabVIEW Coder) gave a very interesting talk at NI Week asking "Which way is faster?" The audience voted among three alternatives, and in most cases, the correct answer (which he demonstrated by rinning the code with timers surrounding it) was chosen by the smallest nujmber of people.

You have to "do the experiment" ...

Bob Schor

P.S. -- when you've done it, let us know the results!

mcduff · ‎10-07-2015

I would lie to chime in if you are testing for memory.

Use the Windows Memory Task Manager to watch the memory when you do the two tests. Make array of 1 million or so of the data type you have, with 1 million points it should be easier to tell the memory usage. Check the task manager to see if any copies of the data are made. I did this in the past, do not remember which LabVIEW version, the IPE was better for memory, not sure about speed. I do not know if new versions have changed so case 2 uses the same amount of memory as case 1.

Cheers,

mcduff

Bob_Schor · ‎10-07-2015

McDuff is clearly another Scientist, who says "Do the Experiment" ... Lay On, McDuff ...

BS

QFang · ‎10-07-2015

Oh I run 'experiments' all day long, trust me.. My issue here is that I am not aware of a good way to run a MEMORY test on an RT target. I already know that IPE often (but not always) take LONGER to run than not using it... But I am at a loss for how to test memory allocation/deallocation (on an RT system).

I'm actually oposite of mcduff when it comes to usage of IPE's. A year ago or so, I started making extensive use of IPE's, until I found that it really ate up a lot of CPU, at least in the cases and in the manner I tried to use them back then, so my general take is to never use them unless there is a very good reason/chance that it will actually be beneficial.

Even on a Windows platform, I'm not quite sure how I would get detailed information. Keep in mind, memory usage isn't the exact metric I'm looking for.. I'm looking for if the memory is re-used without calls to the memory handler (on an RT target) or not. blergh.. I feeling like I'm doing a poor job explaining what I mean. sorry guys..

QFang
-------------
CLD LabVIEW 7.1 to 2016

mcduff · ‎10-07-2015

According to the LabVIEW 2014 Profile Perfromance and Memory ...

This Case 1 snippet is the best

Strangely, without the DVR it is worse than the array subset, snippet below

I think you need to test in your application, ie, in a subVI with an array in and array out, if everything is inlined than not copies should be made.

Cheers,

mcduff

QFang · ‎10-07-2015

By the way, the reason I'm taking such a keen interest here, is that I'm reworking a 'system log engine' that runs on my RT targets and logs messages along with time and system memory/cpu statistics to a file from time to time. The old version of the engine carried a string on a shift register that it kept bundling to until it hit a certain threshold and the string was written to file.. this would create a (noticeable) 'ramp' effect over time on memory usage on the RT targets. Now, I'm also chasing and testing for memory leaks on these targets as they go in the field and are expected to operate 24/7 for hundreds of days. The task of positively identifying the presence or absence of a memory leak is MUCH simpler if my code overall is 'quiet' and 'consistent' in its memory usage. As such, most my RT code make use of pre-allocated arrays and other techniques to reduce and prevent memory allocations and de-allocations.. This makes even smaller (4 byte references anyone) leaks stand out much sooner in my testing. This system log engine is one of the largest/last sources of 'noise' on my application memory usage. 🙂

Also, it is already better than the old version of this system logging loop.. now I'm just going full on OCD to make it as good as I can while I'm in there tinkering with it.. Once done, it is published on our VIPM repository so I want it as 'good as I can get it'. 🙂

QFang
-------------
CLD LabVIEW 7.1 to 2016

QFang · ‎10-07-2015

Mcduff, could you snag a screen shot of the Profile performance and memory result for Case 1 and Case 2? What metric do you use to say case 1 is 'best'? (sorry if that sounds like a retarded question, I'm just curious how you made that determination of 'best'?)

QFang
-------------
CLD LabVIEW 7.1 to 2016

LabVIEW

memory manager optimization

memory manager optimization

Re: memory manager optimization

Re: memory manager optimization

Re: memory manager optimization

Re: memory manager optimization

Re: memory manager optimization

Re: memory manager optimization

Re: memory manager optimization

Re: memory manager optimization

Re: memory manager optimization