"Data Value Reference" and "In Place Element Structure" performance

kaneem · ‎05-12-2011

Having just upgraded from 8.6.1 to 2010, I have began testing with the Data Value Referencing (DVR) and In Place Element Structure (IPES) features aiming at improving many of the array manipulation processes we have. However, to my surprise, I have discovered that they actually take much longer to do the same work on manipulating an array than the traditional "Index + Replace" array methods.

The attached VI contains the test I was performing. An array of 5e6 int32's is created, then for 10e6 times, the array is randomly indexed, incrementing that indexed elements value by 1 and writing that element back in.

The test VI does the above process twice, 1) first using the triditional method, and then 2) doing the same thing but using the DVR + IPES methods.

For my tests, the results each time are around:

1) Traditional method: 2,240ms

2) DVR/IPES method: 5,410ms.

This was completely opposite to what I was expecting. At the very least, I would have expected the resulting processing times to be the same, but based on other existing forum discussions on this the referencing method should have been 4 times faster. My understanding on these data references would give us the speed and efficiency that us ex-C developers are used to when handing large arrays by dealing with pointers. Am I missunderstanding what the DVR is giving us in LabVIEW?

Also, in addition I noticed something unexpected. If you look at the .png file attacthed, you can see the 'buffer allocations' (indicated by the black dot on the outputs of some of the VI's). I am highly surprised and dissapointed to see one on the "Data Value Reference Read" output!!! If this is truely a buffer that is created when you are dereferencing, then that totally invalidates the point of using DVR's (again, as I understand what its intended use is).

I would be greatful for any feedback on my above test results, and also on my understanding on what the DVR's are supposed to bring to the table (compard to a C developer finally thinking LabVIEW is finally giving them pointers).

Best Regards,

Kane Evans-McLeod.

FraggerFox · ‎05-13-2011

Nice find, even I was of the opinion that DVRs are faster than normal "by value" definitons.

However, what I think of in this case is DVRs have to go to the address where the value is present, so it has to do an operation similar to "value "at address of"" (mentioned by "&" operator in C), so it is taking higher amount of time, as now it has to first go to the referred address, and then take out its value.

But this statement contradicts the "by reference" performance concept, and now I am confused why is this still "performance efficient"?

@kaneem wrote:

Also, in addition I noticed something unexpected. If you look at the .png file attacthed, you can see the 'buffer allocations' (indicated by the black dot on the outputs of some of the VI's). I am highly surprised and dissapointed to see one on the "Data Value Reference Read" output!!! If this is truely a buffer that is created when you are dereferencing, then that totally invalidates the point of using DVR's (again, as I understand what its intended use is).

I agree, there is no use of DVR if this is happening. We don't need to create a separate buffer to do this stuff if we are using DVRs!

Someone who is more experienced in using DVRs in LabVIEW would be able to help, but this thing has also left me confused!

-FraggerFox!
Certified LabVIEW Architect, Certified TestStand Developer
"What you think today is what you live tomorrow"

FraggerFox · ‎05-13-2011

I went ahead and removed the DVRs, and tried to check just the "In Place Structure" execution performance. As you can see from the attached VI, the Iteration Value (In Place) take a lot of time to reach the final value, and hence, there is a lot of difference in execution time when we are directly using In Place structure with values, as compared with the traditional method.

-FraggerFox!
Certified LabVIEW Architect, Certified TestStand Developer
"What you think today is what you live tomorrow"

tst · ‎05-13-2011

There are two examples in this thread, and each performs like this for a different reason:

The second example (reply 3) is actually easier, because it simply has a bug - it's missing the shift register, which causes the array to be duplicated each time through the loop. Put the SR back in and the performance is back to normal.
In the first example, the issue is with the DVR. A DVR is NOT a pointer. It is a reference to a specific value. LV can guarantee that all places in the code will work on the same data by locking and unlocking it, but this has performance implications - you have to go through this lock every time through the loop, which takes the extra time you see.

Remember that the main design goal for DVRs is NOT to improve performance. It is to allow safe concurrent access to a single piece of data from multiple places in the code, so for something like this, you would deref the array once outside the loop and then put it back into the reference. Passing the DVR itself through the SR is meaningless (unless the loop runs 0 times), as the reference doesn't change.

As for the buffer dot, that shouldn't be an issue, as a buffer dot does not necessarily indicate a new memory allocation. In this case, it just means that the value is put on the wire. You can see the same thing inside the IPES.

___________________
Try to take over the world!

Ben · ‎05-13-2011

For the most part, the DVR and In Place functions do not give any better performance than well writtien LV code that does not use them. What they do is make it easier to recognize and implement efficient code. Or stated another way... If you have good kick-butt code writtein before the DVR and in-Place, it will not get much improvement by switching over.

Where the DVR shines is when you are dealing with "By-ref" patterns in LVOOP. The DVR provides an excellent foundation to implement those design.

Now if Christian comes back and contradicts me... forget eveything I said.

Ben

Retired Senior Automation Systems Architect with Data Science Automation LabVIEW Champion Knight of NI and Prepper LinkedIn Profile YouTube Channel

kaneem · ‎05-15-2011

Thank you all for your feed back.

Fragger Fox wrote:

However, what I think of in this case is DVRs have to go to the address where the value is present, so it has to do an operation similar to "value "at address of"" (mentioned by "&" operator in C), so it is taking higher amount of time, as now it has to first go to the referred address, and then take out its value.

I have to admit, I am not up to scratch on addressing modes on x86 Indexed addressing modes, but potentially, as you say, the dereferencing method may not be entirely efficient here as I would have hoped for . And furthermore, as you attempted to demonstrate by removing the DVR, the DVR in fact shows to be the slow-down indeed.

tst wrote:

2. In the first example, the issue is with the DVR. A DVR is NOT a pointer. It is a reference to a specific value. LV can guarantee that all places in the code will work on the same data by locking and unlocking it

This it self is probably the biggest element of confusion. The terms "Pass by reference" in C style languages implies a pointer, as does the term "dereference", i.e. look at the data that the pointer is pointing to. So to me, a reference and a pointer is one in the same. If NI have used the term reference in a different context, then ouch - a curve ball to me then!

tst wrote:

Remember that the main design goal for DVRs is NOT to improve performance. It is to allow safe concurrent access to a single piece of data from multiple places in the code, so for something like this, you would deref the array once outside the loop and then put it back into the reference. Passing the DVR itself through the SR is meaningless (unless the loop runs 0 times), as the reference doesn't change.

I do agree, the test vi I uploaded here was not comparing apples with apples as the way you might use each of the methods could vary immensely. I decided to have the derefencing inside the loops to tests worst case usage of these new DVR methods. To put this in perspective, I should explain how we currently pass our large (lets say 10MB for arguements sake) arrays around the many VI's that might call on the data. We use "single element lengthed Queues". This provides the "locking" protection that DVR boasts, as well as the best performance means off passing + processing the data.

I have uploaded a new VI showing this queue method, and also the results. As you can see looking at the "Time ms (Queue)" of 215ms, it is faster than any of the other 4 methods tested here! So what does DVR bring that a single length queue does not?

Ben wrote:

For the most part, the DVR and In Place functions do not give any better performance than well writtien LV code that does not use them. What they do is make it easier to recognize and implement efficient code. Or stated another way... If you have good kick-butt code writtein before the DVR and in-Place, it will not get much improvement by switching over.

Yes, I am beginning to see this. In the initial investigation, I read this thread...

http://forums.ni.com/t5/LabVIEW/Dr-Damien-s-Development-LabVIEW-2009-Data-Value-Reference/m-p/956429

... which the author specifically says: "On my machine, the Data Value Reference is roughly twice as fast as the single-element queue, which is roughly twice as fast as the action engine", which got me quite excited! But as can be seen from my test VI's this performance increase escapes me!

I apologise for the long posts, but the performance is important to me, and I influenced my boss down the this upgrade path from 8.6.1 to 2010 So I need to gather as much information as I can before our next talk!

Any further feedback would be appreciated.

Best regards,

Kane Evans-McLeod.

kaneem · ‎05-15-2011

I want to ammend the above post. I made an error in the VI where I had the queues. To compare correctly with the DVR dereferencing inside the IPES structure, I should have had the EnQueue and DeQueue functions inside the For Loop.

Having made these changes, it now shows that the DVR method (543ms) is faster than the single-length Queue method (926ms).

So yes I can see the benefit here of using DVR.

I think I can come to my own conclusion now (taking on board the above comments) that DVR is a candidate for replacing the Queues, but its dereferencing needs to be minimised as it still impacts a lot on performance. And I do agree with the prior comment that it makes for better diagrams and readability when using DVR and IPES. I think I see the way forward now.

Thank you,

Kane.

tst · ‎05-16-2011

The terminology when moving between languages can be confusing, especially when things behave in similar, but subtly different ways. In any case, you should note that C (or C++, at least) does have the concept of references (the ampersand operator mentioned earlier), although they're also obviously not quite the same as LV references.

In any case, references are common in LV. For example, the queue primitives you mentioned all operate on a queue reference. The obtain primitive provides it and all the others use it as an input to refer to a specific queue. Functionally, DVRs are almost identical to the SEQ method you used (you can see some discussion of this here), so you should be able to use them as a replacement.

___________________
Try to take over the world!

DFGray · ‎05-16-2011

To add a bit more fuel to the fire, I would not consider most of the benchmarks I have done in LabVIEW versions prior to 2009 to be very valid from 2009 and forward. In LabVIEW 2009, National Instruments introduced a new optimizing compiler. As such, some techniques that used to work, don't work or don't work as well any more. For example, a feedback node used to be faster than a single cycle WHILE loop for use as a functional global. However, the the single-cycle WHILE loop is now as fast or faster because the compiler team optimized it, knowing it was a common use case.

Going forward, this will continue to be the case, as the compiler team will continue to optimize things and will concentrate on common use cases.

In this particular example, the simple first expression almost always beats the modified second on my machine in LabVIEW 2010 SP1. Not by much, but the overhead of the explicit InPlace structure slightly hobbles the compiler. When in doubt, use the wires.

One final comment. The buffer allocation coming out of the DVR dereference is a default set of data that is allocated in case the data in the DVR is empty or non-existent. It is a static allocation, so should not cost any time and very little memory.

deserio@florida · ‎07-17-2013

Has anyone recognized that the DVR + IPES examples have the DVR derefencing inside the for loop. The dereferecing should be outside the loop. Only the IPES should be inside. It then gives the same results as traditional method.

LabVIEW

"Data Value Reference" and "In Place Element Structure" performance

"Data Value Reference" and "In Place Element Structure" performance

Re: "Data Value Reference" and "In Place Element Structure" performance

Re: "Data Value Reference" and "In Place Element Structure" performance

Re: "Data Value Reference" and "In Place Element Structure" performance

Re: "Data Value Reference" and "In Place Element Structure" performance

Re: "Data Value Reference" and "In Place Element Structure" performance

Re: "Data Value Reference" and "In Place Element Structure" performance

Re: "Data Value Reference" and "In Place Element Structure" performance

Re: "Data Value Reference" and "In Place Element Structure" performance

Re: "Data Value Reference" and "In Place Element Structure" performance