Code optimization for computing unit vectors x-y-z

Ubik) · ‎02-03-2019

Hello,

Do you have suggestions to improve the code below of the computation of the unit vectors?

ie. unit_V = V/norm(V)

In this code, each row of the array (with random values) represents the 3 components x-y-z of a vector to normalize.

My first idea was used an another implementation of the reciprocal square root (https://forums.ni.com/t5/LabVIEW/Fast-Reciprocal-Square-Root-with-Labview/m-p/3889359) but it's not faster with labview.

Ubik) · ‎02-03-2019

In the attachment, the algorithm that I searsh to improve.

altenbach · ‎02-03-2019

Looks pretty good.

(One thing I would change is not wire the P terminal and not count the CPUs. Same thing, less code ;))

LabVIEW Champion.

GerdW · ‎02-04-2019

Hi Ubik,

what about getting rid of that loop and use polymorphism?

Which array size (number of vectors) do you want to process?

Best regards,
GerdW

using LV2016/2019/2021 on Win10/11+cRIO, TestStand2016/2019

Ubik) · ‎02-04-2019

@GerdW

10 times faster for the In Place Element Structure into the parallelized loop. It's not suprising.

@altenbach

If I don't connect the P terminal, the performance is significantly degraded. I don't understand why, although I configured the loop parallelism on "Automatically partition iterations". I noted this behaviour many times, so I always connect the terminal.

Thanks a lot for your help, I hoped for a better solution because I have to process around 2*10^7 vectors for a software in vision with a "Live" mode

altenbach · ‎02-04-2019

@Ubik) wrote:

@GerdW

10 times faster for the In Place Element Structure into the parallelized loop. It's not suprising.

Yes, loop-free without the IPE seems always much slower. Another slow solution is "unit vector" with the advantage of very simple code (it does much more, so it'll be slow).

@Ubik) wrote:
If I don't connect the P terminal, the performance is significantly degraded. I don't understand why, although I configured the loop parallelism on "Automatically partition iterations". I noted this behaviour many times, so I always connect the terminal.

This has not been my experience. Unwired should be identical according to the help.

@Ubik) wrote:

Thanks a lot for your help, I hoped for a better solution because I have to process around 2*10^7 vectors for a software in vision with a "Live" mode

Well, calculate how fast it is per vector. Does it really need to be DBL?

LabVIEW Champion.

robdye · ‎04-04-2019

The performance discrepancy described here between the ParFor implementation and the "vectorized" (or "polymorphic primitive") implementation was brought to my attention and here is my explanation about what is going on:

TL;DR: the ParFor is faster because it manages to avoid copying the contents of the 2-D array. The vectorized implementation suffers from two copies: first stripping the columns out into three 1-D arrays (contiguous pieces of memory), then reconstructing the 2-D array after the vectorized operations.

Details: I hope the annotations on these pictures will suffice for the details. You will see in the vectorized implementation that I have made the first data copies explicit with the "Always Copy" primitives. Removing these primitives does not prevent the copies; LabVIEW will just put implicit copies just before the vectorized operations. I made them explicit in frame 2 so that I could measure the cost.

I also attached the VI (v 2018).

mcduff · ‎04-07-2019

Since NI is looking at this thought I chime in with the "worst" solution to ask for an explanation.

It is an in-place structure, vector implementation hybrid. It suffers from the most buffer copies. The question is why?

Look at the screen shot for the buffer allocations they are everywhere! Why is that? In this case, in-place is terrible; maybe pass along to the compiler team.

2015 version attached.

A buffer dot on everything!

EDIT: Sometimes there are two buffer dots on operations, like multiply!!

LabVIEW

Code optimization for computing unit vectors x-y-z

Code optimization for computing unit vectors x-y-z

Re: Code optimization for computing unit vectors x-y-z

Re: Code optimization for computing unit vectors x-y-z

Re: Code optimization for computing unit vectors x-y-z

Re: Code optimization for computing unit vectors x-y-z

Re: Code optimization for computing unit vectors x-y-z

Re: Code optimization for computing unit vectors x-y-z

Re: Code optimization for computing unit vectors x-y-z