01-05-2016 11:53 AM
I am adding two arrays with 1e6 elements each; the addition loops 1000 times. When I use buffer allocation examination tool, it shows a buffer allocation at one of the inputs off the Add node. When I trace the performance, it reports 1000 memory resize with change 0. The addition takes what, in my opinion, is very long time: 6 msec on i5 2.1 GHz processor. Is it related to the memory resize operation or is it normal processing time? If it is caused by the memory resize, how can I avoid it?
Solved! Go to Solution.
01-05-2016 12:07 PM
@muh1 wrote:
I am adding two arrays with 1e6 elements each; the addition loops 1000 times.
Perhaps you should give a better example of exactly what your issue is. With this example, I see that you do not need a loop at all since the same value will always be going into the indicator. Are you always adding the same array with itself? If so, why not just multiply by 2?
01-05-2016 12:13 PM
01-05-2016 12:56 PM
This example is quite adequate to demonstrate the point - memory allocation and excecution time. Is there anything wrong with it in this respect? Do you have an answer to my question for this particular example?
In reality I add two matrices, one of which comes from outer product and the other from shift register. Does it matter?
01-05-2016 01:12 PM
It proves that there is momery resize and that execution time is some 6 msec. It is not intended to prove anything else. However real-life example has approcimatly same excecution time.
Actually you are wrong - disabling debugging does not eliminate the loop, the execution time remains essentially the same.
Indicators are not updated in each operation unless they are set to synchronous display (In which case the excecution time goes up, factor of 4 or so), so this is not a problem, really.
Flat sequence structure with tick count before and after.
Just to make sure, I've modified the circuit slightly:
Surprisingly, buffer allocation dot at the Add input has disappeared, but time did not change much. It, probably, is normal calculation time for this processor, after all. Memory allocation of Labview remains mistery though.
01-05-2016 01:22 PM - edited 01-05-2016 01:48 PM
What is a circuit? What is the point of the shift register? If I see words and things like that I tend to question everything else.
@muh1 wrote:
Flat sequence structure with tick count before and after.
You need to be significantly more detailed. What is before the structure, what is after, what is in each frame? What are the debugging settings?
@muh1 wrote:
Indicators are not updated in each operation unless they are set to synchronous display (In which case the excecution time goes up, factor of 4 or so), so this is not a problem, really.
Yes, FP updates are asynchronous by default, but the transfer buffer still needs to be written with each iteration so you are doing significantly more work having the terminal inside the loop.. You are still creating way too much overhead.
01-05-2016 04:18 PM
>> buffer allocation dot at the Add input has disappeared
It did not disappear, it moved to the shift register. This and previous examples need at least 2 arrays to store data: original and result. Technically add operation uses 3: input data 1, input data 2, and result, but it puts result into one of the possible input spaces without allocating all 3.
Display indicator is independent from these, it needs separate space, you have a dot on indicator.
01-05-2016 04:27 PM - edited 01-05-2016 04:28 PM
@muh1 wrote:
This example is quite adequate to demonstrate the point - memory allocation and excecution time. Is there anything wrong with it in this respect?
Yes there is several things. One is the compiler is smarter than you think. Besides constant folding it will also cache results, and do things that make this an invalid test, on top of UI indicators being updated during execution. Generally randomized data, and code to measuring timing is needed, along with other settings to get an accurate gague of performance.
Unofficial Forum Rules and Guidelines
Get going with G! - LabVIEW Wiki.
17 Part Blog on Automotive CAN bus. - Hooovahh - LabVIEW Overlord
01-06-2016 03:32 AM
You see testing circuit. Maximally simplified.
Tick count before the excecution of the loop, tick count after. Arranged in flat sequence.
I've moved terminal out in the latest iteration. Not much change.
Let me ask you a simple question - do you have an answer to my original one or not. Let me re-itterate, it is simple enough : does memory resize with change 0 slow down excecution or not. Yes/no. Secondary questions - why there is memory buffer reallocation at the input of Add of two arrays of same size? Is 4 msec resonable time for adding two array of 1e6 complex dbl numbers? I prefer not to pointlessly discuss digramm, which I introduced just to illustrate the questions. The questions are self-containing,
01-06-2016 03:40 AM
I've posed questions which are not directly limited/related to this illustration. If the answer to those particular questions are "Depends", I can elaborate more on the illustration. If not, I see no point to.
Regarding optimisation - my problem is excecution beeing too slow (or so I think, maybe it is normal). Can optimisation slow down the excecution? If not, why mention it? I am not trying to precisely measure the performance. I am trying to see whether it is way too slow for some reason. Indicator in the loop is a possibility of course, but no, it is not the cause. For example profiler reports vi "CM Add" from NI library adding two matrices 1000x1000 taking 6 msec. That is how I noticed the problem and then wrote this maximally simple example.