optimization nested loops for speed

Adrian_Bradu · ‎12-03-2015

Hi. I would need some help to optimize for speed (in LabVIEW) the following (matlab) code:

-----------------------------------

a=rand(1024, 200);
b=rand(1024, 512);
c=zeros(200,512);
tic
for i=1:200
for j=1:512
c(i,j)=abs(sum(a(:,i).*b(:,j)));
end
end
toc
end

-------------------------------------------------------------

Basically, I have two 2d arrays, a and b, one with 200 columns and 1024 rows, the other one 512 columns and 1024 rows. What I have to do is to multiply (dot product) each column of a with each column of b, sum and take the absolute value of the result. The output, c, will be a 2d array of size 200x512. Is there a way to do it without using 2 nesting loops? The two 2d arrays, a and b\, are produced by my hardware.

What is the best strategy to get maximum speed? I've got a quite powerful processor.

Thanks a lot?

Gregory · ‎12-03-2015

No matter what you do you have to generate the 200x512 element array, so why are you set on not letting the autoindexing loops select the columns for you? You can right click on the for loop to enable parallelization, though as I found out last week, it doesn't always pay to do this!

CLA // BALUG // Unofficial Forum Rules and Guidelines // Ask Smart Questions

Adrian_Bradu · ‎12-03-2015

I've tried various tactics, including autoindexing + parallelization. Unfortunately it does not provide the speed I would like to have. The thing is that the nested loop is selecting each column 200 times, so I hope that there is a way to not do this; somehow to select them once and keep them (somehow in memory) for future use ....

Thanks a lot for your reply.

johnsold · ‎12-03-2015

Using LV primitives appears to be somewhat (12%) faster than calling Dot Product.vi numerous times.

Lynn

Kyle97330 · ‎12-03-2015

I think you have it backwards there, it's actually slower to use the primitives on my machine. You have the two time fields in reverse order.

However when I enable parallelization on the loops, they roughly double in speed and also become tied for speed. Hmm.

Yamaeda · ‎12-04-2015

@Kyle97330 wrote:

I think you have it backwards there, it's actually slower to use the primitives on my machine. You have the two time fields in reverse order.

However when I enable parallelization on the loops, they roughly double in speed and also become tied for speed. Hmm.

There's an overhead with activating parallellization, so generally it's only the outer loop that should be parallell.

/Y

G# - Award winning reference based OOP for LV, for free! - Qestit VIPM GitHub

Qestit Systems

Gregory · ‎12-04-2015

I am getting about 100ms using dot product and 140ms using multiply and sum. If I enable the parallelism in the outer loop with 8 instances, they both drop to about 20ms. If I enable parallelism on the inner loop as well, they hardly change, still around 18-20ms.

For OP: Can you also run it in ~20ms? Do you need it to run faster than that? How long does your Matlab code take?

CLA // BALUG // Unofficial Forum Rules and Guidelines // Ask Smart Questions

Bob_Schor · ‎12-04-2015

So I looked at this problem, said "Seems familiar, but different ...". Did a little math, coded it up, and improved (with absolutely no parallelism) the speed by a factor of 100. How about them apples?

If we let A and B be the 1024 x 200 and 1024 x 512 arrays, and turn them into matrices, the problem is (mathematically) equivalent to the absolute value of the transpose of A times B, where "times" is matrix multiplication. I "guessed" that NI would have included Matrix Multiplication in its bag of tricks, and perhaps optimized it, so I coded it up, adding timing.

Using Lynn's first double-For multiply, sum, abs formulation and a "do it as a matrix" on my machine, the Array method took 1004 msec, the Matrix took 10 msec, and the results differed by less than one part in a million (I generated random (0, 1) floats, and did the comparison by subtracting the two arrays and checking that the absolute values of all the differences were less than 1E-6).

Bob Schor

mcduff · ‎12-04-2015

Nice Work Bob!

You can increase the speed a bit, 20-30%, by using your method, the exact same way, but use the High Performance toolkit (Multicore Analysis & Sparse Matrix.)

http://sine.ni.com/nips/cds/view/p/lang/en/nid/210525

cheers,

mcduff

johnsold · ‎12-04-2015

Thanks, Bob.

I am going to have to learn some linear algebra one of these days.

And I need to get a faster computer. Bob's VI on my computer runs 13 ms for the matrix and 2 seconds for my method.

Lynn

LabVIEW

optimization nested loops for speed

optimization nested loops for speed

Re: optimization nested loops for speed

Re: optimization nested loops for speed

Re: optimization nested loops for speed

Re: optimization nested loops for speed

Re: optimization nested loops for speed

Re: optimization nested loops for speed

Re: optimization nested loops for speed

Re: optimization nested loops for speed

Re: optimization nested loops for speed