LabVIEW

cancel
Showing results for 
Search instead for 
Did you mean: 

Help with algorithm?

Hello all,

I need some help with decreasing the computational time associated with my vi. I have attached the vi which is responsible for my calculations. In the vi, I start with some random generators to produce my data, but the only area of interest is what is between my crude method of timers, which determines how long the calculation takes.

You could probably use some background 1st. I am a grad student and this is for a new medical imagining system and I am applying the usual data acquisition/calculation/display process. I am taking data from a PCI9812 data acquisition card at approximately 8MS/s. This data is split up into 2 arrays and these arrays are passed into a series of for loops for calculation which eventual produces 3 display graphs. I have only attached the calculation vi, as this is where most of the time is being spent according to the vi profiler. I am using a dual xeon 2.4Ghz computer and realize that if I split my data set into parallel sets it will decrease the computational time, but I was wondering if there was improvements that I could make to my basic algorithm which could be used in each the parallel loops. From reading many posts in this forum I gather that manipulating arrays chews up a lot of processor time and could be the reason for my slow calculation. Can anyone make any suggestions on how to improve my algorithm? If you require any further info please let me know.

Thanks a lot,

Azazel
Azazel

Pentium 4, 3.6GHz, 2 GB Ram, Labview 8.5, Windows XP, PXI-5122, PCI-6259, PCI-6115
0 Kudos
Message 1 of 7
(3,529 Views)
I tried several alternatives, such as using a 3D array (512x1000x8) to avoid the use of nested FOR loops or a 1000x8 array to replace the inside for loop, without any success. I don't think there is much place for improvement, since the X and Y outputs are arrays of cumulated sums, requiring array slicing.

Just a comment : I believe your shift registers should be initialized...

Good luck !

CC
Chilly Charly    (aka CC)
0 Kudos
Message 2 of 7
(3,517 Views)
Azzazel,

I agree with chilly charly that you should definitely initialize the shift registers, else results from previous calculations will leak through.

I don't think you can expect miracles, but it seem to me that combining the two arrays into a complex array speeds things up by about 30% on my PIII. I've attached a crude modification to show the approach, modify as needed (it migh even contain bugs. Check the result with a known good dataset).

For more consistent timing results, you should really isolate the calculation part from the other stuff. In your example, there is no guarantee that the random generator for the second array executes before or after the first time stamp. Same on the right side. You might include the FP updates of some of the indicators (=slow!) before the second timestamp. See my example for a possibility.

Of course YMMV 😉
Message 3 of 7
(3,491 Views)
Good job Altenbach : down from 1.45 to 0.85 s on my Athlon 2600+.

CC
Chilly Charly    (aka CC)
0 Kudos
Message 4 of 7
(3,481 Views)
Thanks for the approach Altenbach, on my test computer my time went from 1.8 to 1.4 s on my Athlon 1800+!!! I can't wait to try it on my dual. I still have to verify the code with some data sets but I will keep you guys informed....thanx again!!!

Cheers,

Azazel
Azazel

Pentium 4, 3.6GHz, 2 GB Ram, Labview 8.5, Windows XP, PXI-5122, PCI-6259, PCI-6115
0 Kudos
Message 5 of 7
(3,474 Views)
OK, I quickly looked at it again over my morning coffee and there are two improvements that give at least 20% combined speedup.
(1) Major: Change the indexing in the inner loop to avoid the second indexing of the rotated array(see attached).
(2) A minor improvement seems to be gained if the inverse tangent is done after the transposition (saving one transpose operation).

Since these arrays are only for display, you might ask yourself if it is sufficient to do all calculations in single precision (SGL).
On my rig, it gives another 40% speedup and you still get about 6 good significant digits. This is plenty for e.g. 16bit grayscale! 🙂

The attached modification implements the above mentioned two improvements but is also set to SGL. It can easily be switched back to DBL by changing the representation of the single diagram constant (zero) in the outer loop and the representation of the array indicators.

Combined, the attached solution in SGL mode is about 2.25x faster that my earlier solution above. I am sure quite a few more things could be tweaked to improve the speed.
Message 6 of 7
(3,460 Views)
WOW!!! Thanks Altenbach, your second attempt is **bleep** good. I tried it with some test data and the results were great. I still have a lot to learn about algorithm design but you have given me sooooo many good ideas. With my dual and your changes this thing will fly...I will keep tweaken but if you have any more brainstorms....by all means let me know!!!!!

Thanks again

Azazel
Azazel

Pentium 4, 3.6GHz, 2 GB Ram, Labview 8.5, Windows XP, PXI-5122, PCI-6259, PCI-6115
0 Kudos
Message 7 of 7
(3,448 Views)