03-06-2013 11:30 AM
This is a simple question - is there anyway to increase the execution speed of the VI below? I am parallelizing the outer loop but given the sequential nature of the inner loop, I am not sure if there is anything else I can do. The problem is that I am attempting to crunch a lot of data - each second I am producing about 1.25 MS/s and then attempting to accumulate this over 10 s (Data In) to perform the calculations below over several integration periods (iTime; currently 0.001, 0.01, 0.1, 1, and 3 s) - and it is causing an overrun on some other loops. Any help is appreciated.
Solved! Go to Solution.
03-06-2013 11:48 AM
Two places I would start:
Use array subset instead of Delete from Array.
Write your own Mean Vi which uses SGL precision.
03-06-2013 11:54 AM
Hmmm....doesn't using Array Subset incur a calculation that Delete from Array doesn't? I will have at least a multiplication node (i*chunk size) to add to the inner loop that was not there before to maintain the proper place. On the other hand, that gets rid of the shift register. Is there a significant difference in terms of performance between Array Subset and Delete...?
It looks like you are correct about the single precision math - can be considerably faster. I will look into this.
Thanks, Darin.
Matt
03-06-2013 12:16 PM
@mtat76 wrote:
Hmmm....doesn't using Array Subset incur a calculation that Delete from Array doesn't? I will have at least a multiplication node (i*chunk size) to add to the inner loop that was not there before to maintain the proper place. On the other hand, that gets rid of the shift register. Is there a significant difference in terms of performance between Array Subset and Delete...?
It looks like you are correct about the single precision math - can be considerably faster. I will look into this.
Thanks, Darin.
Matt
You should test in LV12, in previous versions Delete From Array is a dog, even when deleting from the end of the array. I did a simple test, create an array of 50000 random numbers and take the average of consecutive chunks of 400 elements using both methods. I even added a Reverse Array to the Subset test to match the behavior of Delete from Array.
Results:
Delete: 2.7 msec
Subset: 200 usec
your mileage may vary.
03-06-2013 12:20 PM
@mtat76 wrote:
Hmmm....doesn't using Array Subset incur a calculation that Delete from Array doesn't? I will have at least a multiplication node (i*chunk size) to add to the inner loop that was not there before to maintain the proper place. On the other hand, that gets rid of the shift register. Is there a significant difference in terms of performance between Array Subset and Delete...?
Yes, there is. Using Array Subset, LabVIEW can create a "sub-array" that's just a pointer to the start location within the original array, along with a length of the subset. The data in the array itself doesn't need to be moved or copied, and you'll be able to run the entire loop without ever making a copy of Data In. Right now, with the shift register and Delete from Array, every iteration of outer loop needs to make a new copy of Data In to reinitialize the inner shift register.
As an alternative to converting to single-precision math, you might also get a speed increase from converting Data In to double-precision once, before it enters the outer loop.
03-06-2013 03:12 PM
Wow! Thanks, Darin. I don't really understand why, but the single precision mean performs at about a rate of 70x faster than the double precision mean provided by LV. That is remarkable! Here is the VI that I used to test:
Fairly straightforward. I am going to reexamine some of my other routines - we were getting tight on computational power anyway and this is a big hit if you are doing this over and over again but don't need the precision.
Cheers, Matt
03-06-2013 03:15 PM
You may want to redo your test, making sure that the two calculations don't happen in parallel. As you have it now, the speed comparison is not necessarily very accurate.
03-06-2013 03:34 PM - edited 03-06-2013 03:36 PM
Here's a much better version of your speed test. On my machine, the calculation time for DBL is almost exactly twice as long as for SGL. Interestingly, if you remove the conversion to SGL, it takes the native LabVIEW version (array sum divided by array size) about the same amount of time as the Mean VI (which calls a DLL).
(EDIT: sorry, in my initial post I wrote "fast" where I should have written "long" in the second sentence. I've now corrected that.)
03-06-2013 03:51 PM
You had me worried there for a second, Nathan. I am still getting 60-70x on an RT PXI chassis (quad-core 8110).
Here is the output to the graph Ratio in the code above. The DBL calculation is running at about 600 ms while the SGL calculation is running at about 8 ms. What are you getting?
03-06-2013 03:58 PM - edited 03-06-2013 03:59 PM
Please try the code I posted (it's a snippet, you can simply drag it to your desktop and from there to a block diagram, no need to rewrite or change anything).
I get about 12ms for DBL and 6ms for SGL.
There are several very important things about the way the benchmark is arranged in my VI (thanks, Altenbach, for the tips on this board over the years about this!). Nothing happens in parallel with the sequence structure, so that while the sequence structure is executing, it is the ONLY thing that is executing. That includes updating front panel controls, which can be very slow. All the inputs enter the sequence structure in the first frame, and all outputs exit from the last frame. I'd need to find the reference, which I don't have time to do right now, but I believe frames in a sequence structure can execute as soon as that frame has all its data available (and the preceding frame has executed), even if following frames cannot yet execute; and items outside the sequence structure can execute as soon as data is available from that frame even if the rest of the sequence hasn't yet executed (in your VI, the time calculations below the sequence structure).