LabVIEW

cancel
Showing results for 
Search instead for 
Did you mean: 

optimization nested loops for speed

Solved!
Go to solution

Hi. I would need some help to optimize for speed (in LabVIEW) the following (matlab) code:

-----------------------------------

a=rand(1024, 200);
b=rand(1024, 512);
c=zeros(200,512);
tic
 for i=1:200
  for j=1:512
   c(i,j)=abs(sum(a(:,i).*b(:,j)));
  end
 end
toc
end 

-------------------------------------------------------------

Basically, I have two 2d arrays, a and b, one with 200 columns and 1024 rows, the other one 512 columns and 1024 rows. What I have to do is to multiply (dot product) each column of a with each column of b, sum and take the absolute value of the result. The output, c, will be a 2d array of size 200x512. Is there a way to do it without using 2 nesting loops? The two 2d arrays, a and b\, are produced by my hardware.

What is the best strategy to get maximum speed? I've got a quite powerful processor.

 

Thanks a lot?

 

0 Kudos
Message 1 of 20
(7,624 Views)

No matter what you do you have to generate the 200x512 element array, so why are you set on not letting the autoindexing loops select the columns for you? You can right click on the for loop to enable parallelization, though as I found out last week, it doesn't always pay to do this!

0 Kudos
Message 2 of 20
(7,610 Views)

I've tried various tactics, including autoindexing + parallelization. Unfortunately it does not provide the speed I would like to have. The thing is that the nested loop is selecting each column 200 times, so I hope that there is a way to not do this; somehow to select them once and keep them (somehow in memory) for future use ....

 

Thanks a lot for your reply. 

0 Kudos
Message 3 of 20
(7,603 Views)

Using LV primitives appears to be somewhat (12%) faster than calling Dot Product.vi numerous times.

 

Lynn

 

Nested Loops.png

0 Kudos
Message 4 of 20
(7,593 Views)

I think you have it backwards there, it's actually slower to use the primitives on my machine.  You have the two time fields in reverse order.

 

However when I enable parallelization on the loops, they roughly double in speed and also become tied for speed.  Hmm.

0 Kudos
Message 5 of 20
(7,574 Views)

@Kyle97330 wrote:

I think you have it backwards there, it's actually slower to use the primitives on my machine.  You have the two time fields in reverse order.

 

However when I enable parallelization on the loops, they roughly double in speed and also become tied for speed.  Hmm.


There's an overhead with activating parallellization, so generally it's only the outer loop that should be parallell.

/Y

G# - Award winning reference based OOP for LV, for free! - Qestit VIPM GitHub

Qestit Systems
Certified-LabVIEW-Developer
Message 6 of 20
(7,539 Views)

I am getting about 100ms using dot product and 140ms using multiply and sum. If I enable the parallelism in the outer loop with 8 instances, they both drop to about 20ms. If I enable parallelism on the inner loop as well, they hardly change, still around 18-20ms.

 

For OP: Can you also run it in ~20ms? Do you need it to run faster than that? How long does your Matlab code take?

Message 7 of 20
(7,506 Views)
Solution
Accepted by Adrian_Bradu

So I looked at this problem, said "Seems familiar, but different ...".  Did a little math, coded it up, and improved (with absolutely no parallelism) the speed by a factor of 100.  How about them apples?

 

If we let A and B be the 1024 x 200 and 1024 x 512 arrays, and turn them into matrices, the problem is (mathematically) equivalent to the absolute value of the transpose of A times B, where "times" is matrix multiplication.  I "guessed" that NI would have included Matrix Multiplication in its bag of tricks, and perhaps optimized it, so I coded it up, adding timing.

 

Using Lynn's first double-For multiply, sum, abs formulation and a "do it as a matrix" on my machine, the Array method took 1004 msec, the Matrix took 10 msec, and the results differed by less than one part in a million (I generated random (0, 1) floats, and did the comparison by subtracting the two arrays and checking that the absolute values of all the differences were less than 1E-6).

 

Bob Schor

Message 8 of 20
(7,489 Views)
Solution
Accepted by Adrian_Bradu

Nice Work Bob!

 

You can increase the speed a bit, 20-30%, by using your method, the exact same way, but use the High Performance toolkit (Multicore Analysis & Sparse Matrix.)

 

http://sine.ni.com/nips/cds/view/p/lang/en/nid/210525

 

 

cheers,

mcduff

Message 9 of 20
(7,471 Views)

Thanks, Bob.

 

I am going to have to learn some linear algebra one of these days.

 

And I need to get a faster computer. Bob's VI on my computer runs 13 ms for the matrix and 2 seconds for my method.

 

Lynn

Message 10 of 20
(7,467 Views)