09-16-2010 11:16 AM
I’m trying to optimize an application which uses several megabytes of data in arrays of clusters and strings and I was hoping to use the Data Value Reference functions to speed things up. This has certainly helped where an array is passed unchanged through several sub VIs but I would really like to index a big array by reading the elements from the original buffer belonging to a sub VI. The problem with the ‘Data Value Reference Read Element Border Node’ is that it makes a new buffer allocation where I would really like some way to index an element from the original buffer.
The code below runs about 40 times faster (LabVIEW2009) than the example above presumably because the buffer is not being allocated on every iteration.
Is there a way to index an array directly from the data value reference?
09-16-2010 05:28 PM
I ran the code through desktop execution trace looking for memory operations and It doesn't look like the DVR dereference is causing a memory allocation. If I set a breakpoint at the end of each loop, I see a 1MB allocation at the beginning of the loop, and 1MB freed somewhere in each case. The action engine case has an additional Alloc/Free pair.
I understand that the Profile buffer dots aren't always right, and sometimes they are maybes.
My LV2009sp1 times are approx:
Reference = 164ms (baseline)
Action Engine = 394ms
Single-Element Queue = 742ms
Data Value Reference = 446ms
Data Value Reference outside loop = 199ms
Data Value Reference outdise loop normal index = 174ms
DVR SubVI = 629ms
Basic = 181ms
Times stay roughly the same for 1k and 100MB.
Times are about 50ms less each in 2010.
the DVR subvi case merely handles the DVR read/write in a subvi in a for loop, like the normal DVR, just packaged. DECT shows that it closes the same handle that the DVR opens in the toplevel. I'm willing to bet that the ~250ms time difference that I see reading the DVR a million times vs. once is the overhead from its mutex behavior.
The most interesting thing is that my relative times are much different than DFGray's: AE and DVR roughly the same, SEQ lagging a bit.
09-17-2010 08:26 AM
If you wrap the DVR around the loop instead of doing it in the loop, you will probably see a much faster response. The reason for this is that the DVR acquires a global mutex (semaphore, if you will) to prevent anything else accessing the data every time it is called. This is a performance issue, and your example shows it.
The buffer viewer is conservative. The buffer it shows coming out of the DVR dereference terminal is statically allocated and an empty data structure used under error conditions (e.g. the DVR input is invalid). The gold standard is still using your OS memory monitor (Task Manager in Windows) to dynamically watch LabVIEW's memory use as you single-step through code. Doing this confirms the "fake" memory buffer.
09-21-2010 09:06 PM
blawson wrote:Reference = 164msAction Engine = 394msSingle-Element Queue = 742msData Value Reference = 446msData Value Reference outside loop = 199msData Value Reference outdise loop normal index = 174msDVR AE = 629msBasic = 181ms
Reference = 164ms (baseline)
Action Engine = 394ms
Single-Element Queue = 742ms
Data Value Reference = 446ms
Data Value Reference outside loop = 199ms
Data Value Reference outdise loop normal index = 174ms
DVR SubVI = 629ms
Basic = 181ms
...
The most interesting thing is that my relative times are much different than DFGray's: AE and DVR roughly the same, SEQ lagging a bit.
I saw similar results, but the action engine consistently outperforms the DVR. Then if I wrap the DVR and SEQ into subvis and in-line them all, the AE is another 2x faster
Test VI DataValueReferenceDemo-1.vi in lv2010, as downloaded:
Reference = 120ms
Action Engine = 228ms
Single-Element Queue = 469ms
Data Value Reference = 318ms
Data Value Reference outside loop = 106ms
Data Value Reference outdise loop normal index = 106ms
Basic = 106ms
starred cases have subvis running with new in-lining feature
Reference = 114ms
* Action Engine = 95ms
* Single-Element Queue = 443ms
* Data Value Reference = 293ms
Data Value Reference outside loop = 102ms
Data Value Reference outdise loop normal index = 101ms
Basic = 99ms
Not sure why the AE is even faster than "basic", but it is!
@Zing wrote:
Is there a way to index an array directly from the data value reference?
I don't think so, but it sure would be nice. DVRs do seem to be a great way to protect your data, but they are not the end-all solution for huge datasets, especially arrays of clusters.
09-21-2010 09:51 PM - edited 09-21-2010 09:57 PM
^ remember though that the AE adds a memory buffer. This is a show-stopper if you're pushing around a lot of data.
I didn't really play around with execution flavors and inlining. does an inlined FG not add a memory buffer? Is inlining the FG in this example at all relevant to real-world performance?
I also noticed that occasionally the AE would take about 4x as long, repeatably, until I restarted labview. I believe that the main thing that affects the speed of the AE is how fast your system is able to make the buffer copy. If you've maxed out your free RAM, it will take more time than if you have plenty of room, and also this will vary significantly from system to system. I don't think the other methods depend so much on RAM performance, which could explain why you, me, and Damien all see different relative times.
09-21-2010 11:26 PM
@blawson wrote:
^ remember though that the AE adds a memory buffer. This is a show-stopper if you're pushing around a lot of data.
I didn't really play around with execution flavors and inlining. does an inlined FG not add a memory buffer? Is inlining the FG in this example at all relevant to real-world performance?
I also noticed that occasionally the AE would take about 4x as long, repeatably, until I restarted labview. I believe that the main thing that affects the speed of the AE is how fast your system is able to make the buffer copy. If you've maxed out your free RAM, it will take more time than if you have plenty of room, and also this will vary significantly from system to system. I don't think the other methods depend so much on RAM performance, which could explain why you, me, and Damien all see different relative times.
A memory buffer does not equal a data copy. I don't think my version could have AE executing so quickly with a data copy.
If the diagram code is only reading the data, which is common for large datasets, then the labview compiler can often determine that the data will not be modified and a copy will not be made. However you may still see a buffer allocation on the diagram. That's based on my limited understanding , since the documentation is pretty sparse about buffer allocations vs. data copies. (You may want to read the vaguely related topic http://lavag.org/topic/7307-another-reason-why-copy-dots-is-a-bad-name-for-buffer-allocations).
I would agree that in-lining shouldn't have any effect on memory buffers, unless by in-lining the code, the compiler is able to see that a copy is unnecessary whereas it might not be able to tell if the array goes into a subvi. That all depends on how much their compiler can optimize through different subvi calls.
09-25-2010 03:00 PM
I think you hit the nail on the head there.
From watching the memory performance on my machine, I'm quite certain that the AE creates a data copy (but only one). Then again, it could be the way I coded it or the fact that I was using 2009 (and not 10's compiler).
09-25-2010 06:34 PM
@jdunham wrote:
@blawson wrote:
^ remember though that the AE adds a memory buffer. This is a show-stopper if you're pushing around a lot of data.
I didn't really play around with execution flavors and inlining. does an inlined FG not add a memory buffer? Is inlining the FG in this example at all relevant to real-world performance?
I also noticed that occasionally the AE would take about 4x as long, repeatably, until I restarted labview. I believe that the main thing that affects the speed of the AE is how fast your system is able to make the buffer copy. If you've maxed out your free RAM, it will take more time than if you have plenty of room, and also this will vary significantly from system to system. I don't think the other methods depend so much on RAM performance, which could explain why you, me, and Damien all see different relative times.
A memory buffer does not equal a data copy. I don't think my version could have AE executing so quickly with a data copy.
If the diagram code is only reading the data, which is common for large datasets, then the labview compiler can often determine that the data will not be modified and a copy will not be made. However you may still see a buffer allocation on the diagram. That's based on my limited understanding , since the documentation is pretty sparse about buffer allocations vs. data copies. (You may want to read the vaguely related topic http://lavag.org/topic/7307-another-reason-why-copy-dots-is-a-bad-name-for-buffer-allocations).
I would agree that in-lining shouldn't have any effect on memory buffers, unless by in-lining the code, the compiler is able to see that a copy is unnecessary whereas it might not be able to tell if the array goes into a subvi. That all depends on how much their compiler can optimize through different subvi calls.
To my understanding, in-lining doesn't affect the total memory buffer for an object, however it does affect how much memory is buffered at a given time.
So if (for example) an array was 500 elemenets (each of which being I8) it would only have to buffer enough memory for a single I8 rather than
500 x I8 for an entire copy of that array. This could be very useful when doing manipulation of large amounts of data.