LabVIEW

cancel
Showing results for 
Search instead for 
Did you mean: 

Code optimization for computing unit vectors x-y-z

Hello,

Do you have suggestions to improve the code below of the computation of the unit vectors?

ie. unit_V = V/norm(V)

 

In this code, each row of the array (with random values) represents the 3 components x-y-z of a vector to normalize.

 

My first idea was used an another implementation of the reciprocal square root (https://forums.ni.com/t5/LabVIEW/Fast-Reciprocal-Square-Root-with-Labview/m-p/3889359) but it's not faster with labview.

 

TEST__ComputeUnitVectord.png

 

 

 

0 Kudos
Message 1 of 8
(3,615 Views)

In the attachment, the algorithm that I searsh to improve.

0 Kudos
Message 2 of 8
(3,610 Views)

Looks pretty good.

 

(One thing I would change is not wire the P terminal and not count the CPUs. Same thing, less code ;))

0 Kudos
Message 3 of 8
(3,583 Views)

Hi Ubik,

 

what about getting rid of that loop and use polymorphism?

check.png

Which array size (number of vectors) do you want to process?

Best regards,
GerdW


using LV2016/2019/2021 on Win10/11+cRIO, TestStand2016/2019
0 Kudos
Message 4 of 8
(3,553 Views)

@

10 times faster for the In Place Element Structure into the parallelized loop. It's not suprising.

 

If I don't connect the P terminal, the performance is significantly degraded. I don't understand why, although I configured the loop parallelism on "Automatically partition iterations". I noted this behaviour many times, so I always connect the terminal.

 

Thanks a lot for your help, I hoped for a better solution because I have to process around 2*10^7 vectors for a software in vision with a "Live" mode

0 Kudos
Message 5 of 8
(3,500 Views)

@Ubik) wrote:

@

10 times faster for the In Place Element Structure into the parallelized loop. It's not suprising.

 


Yes, loop-free without the IPE seems always much slower. Another slow solution is "unit vector" with the advantage of very simple code (it does much more, so it'll be slow).

 

UnitVector.png

 


@Ubik) wrote:

If I don't connect the P terminal, the performance is significantly degraded. I don't understand why, although I configured the loop parallelism on "Automatically partition iterations". I noted this behaviour many times, so I always connect the terminal.

 


This has not been my experience. Unwired should be identical according to the help.


@Ubik) wrote:

 

Thanks a lot for your help, I hoped for a better solution because I have to process around 2*10^7 vectors for a software in vision with a "Live" mode


Well, calculate how fast it is per vector. Does it really need to be DBL?

 

0 Kudos
Message 6 of 8
(3,485 Views)

The performance discrepancy described here between the ParFor implementation and the "vectorized" (or "polymorphic primitive") implementation was brought to my attention and here is my explanation about what is going on:

 

TL;DR: the ParFor is faster because it manages to avoid copying the contents of the 2-D array. The vectorized implementation suffers from two copies: first stripping the columns out into three 1-D arrays (contiguous pieces of memory), then reconstructing the 2-D array after the vectorized operations.

 

Details: I hope the annotations on these pictures will suffice for the details. You will see in the vectorized implementation that I have made the first data copies explicit with the "Always Copy" primitives. Removing these primitives does not prevent the copies; LabVIEW will just put implicit copies just before the vectorized operations. I made them explicit in frame 2 so that I could measure the cost.

 

I also attached the VI (v 2018).

 

UnitVector- ParFor vs Vectorizer-BD (002).PNG

 

UnitVector- ParFor vs Vectorizer - FP (002).PNG

 

Message 7 of 8
(3,390 Views)

Since NI is looking at this thought I chime in with the "worst" solution to ask for an explanation.

It is an in-place structure, vector implementation hybrid. It suffers from the most buffer copies. The question is why?

 

Look at the screen shot for the buffer allocations they are everywhere! Why is that? In this case, in-place is terrible; maybe pass along to the compiler team.

2015 version attached.

A buffer dot on everything!A buffer dot on everything!

 

EDIT: Sometimes there are two buffer dots on operations, like multiply!!

0 Kudos
Message 8 of 8
(3,315 Views)