Solved: Re: multiply vectors by matrix in gpu

ssara · ‎10-10-2017

Hi there,

I currently use CUDA in LabVIEW for matrix multiplication as shown in the attachment

the GPU is used to compute large amount of data.

so I wounder how can I multiply a large no. of vectors (each of one) by a N*N matrix to produce a new set of vectors???

like this

x0,y0,z0 1st vector

x1,y1,z1 2nd vector

:

x1000,y1000,z1000

these vectors are stored as a 3*1000 array.

how can these operations of mul (vector by N*N) are distributed in the cores of GPU??

any advise please.

jwscs · ‎10-10-2017

no idea on specifics of CUDA,

but for splitting the problem in smaller parts,

you could just let different parts of the vector multiplication run on different cores/processes/magic.

by which i mean, the matrix multiplication rule is "line by column" (loosely translated),

and then you just let the multiplication of line1-by-column1 run on the first core,

and line2-by-column1 on the next and so forth.

(assuming right-sided vector multiplication, but left-sided multiplication would be similar)

If Tetris has taught me anything, it's errors pile up and accomplishments disappear.

ssara · ‎10-10-2017

@jwscs wrote:

no idea on specifics of CUDA,

but for splitting the problem in smaller parts,

you could just let different parts of the vector multiplication run on different cores/processes/magic.

by which i mean, the matrix multiplication rule is "line by column" (loosely translated),

and then you just let the multiplication of line1-by-column1 run on the first core,

and line2-by-column1 on the next and so forth.

(assuming right-sided vector multiplication, but left-sided multiplication would be similar)

but the vector-matrix multiplication is done internally in GPU and i cant divided it!

I want to mul each vector with the specified matrix such as A, and store the result back in vector.

Blokk · ‎10-10-2017

Why did you upload that bad quality screenshot? Even worse, packed in a zip. Can you upload the VI itself?

ssara · ‎10-10-2017

@Blokk wrote:

Why did you upload that bad quality screenshot? Even worse, packed in a zip. Can you upload the VI itself?

oh im sorry for that , i attached the vi here.

please help me

Blokk · ‎10-10-2017

Ok, lets forget CUDA for a while, and lets only use the CPU first. As I understood, you want to multiply 1000 3-dimensional vectors with a single 3X3 matrix. What you could do, bundle the 1000 vectors (1D arrays in LabVIEW) into a 2D array, transpose it, finally multiply the single matrix with this 3X1000 sized matrix. The result matrix will contain the result vectors as columns. See this snippet below. So as I imagine, you could do the same with CUDA, and get the results with one step.

Edit: actually for 1000 such small vectors, you do not need CUDA...If you do not need to upscale this operation later, just use the CPU...

edit2: "these vectors are stored as a 3*1000 array." ok, so you already have the vertex matrix. So just multiply the 2 matrices, and you get the results in the columns...

ssara · ‎10-10-2017

@Blokk wrote:

Ok, lets forget CUDA for a while, and lets only use the CPU first. As I understood, you want to multiply 1000 3-dimensional vectors with a single 3X3 matrix. What you could do, bundle the 1000 vectors (1D arrays in LabVIEW) into a 2D array, transpose it, finally multiply the single matrix with this 3X1000 sized matrix. The result matrix will contain the result vectors as columns. See this snippet below. So as I imagine, you could do the same with CUDA, and get the results with one step.

oh yes, by build array, thank you for your help.

@Blokk wrote:

Edit: actually for 1000 such small vectors, you do not need CUDA...If you do not need to upscale this operation later, just use the CPU...

indeed I have a 3D object with hundred of thousand of vertices, and the A matrix isnt constant, it is produced after many operations. why i dont need GPU ???? I want to compare with GPU and CPU, can i get a segnificant difference in execution times?

secondly I would ask you about the real time operation in GPU, is it supported in GPU ??? If i import a real moved object or even if i change any parameter in the matrix???

ssara · ‎10-10-2017

oh yes by build array, thank u for your help .

Blokk wrote:
Edit: actually for 1000 such small vectors, you do not need CUDA...If you do not need to upscale this operation later, just use the CPU...

indeed, i have a hundred of thousand of vertices for a 3D object. also the A matrix isn't constant, it is produced after many operations. why you said I do not need CUDA?? If I compare the execution time between CPU and GPU, am I get any significant difference ???

secondly, i would ask u about the real time execution in GPU, If i change any parameter in the matrix or even if i import a moved 3D object(i.e variable vertices). Is it supported in LabVIEW when using gpu toolkit???

Blokk · ‎10-10-2017

@ssara wrote:

oh yes by build array, thank u for your help .

Blokk wrote:
Edit: actually for 1000 such small vectors, you do not need CUDA...If you do not need to upscale this operation later, just use the CPU...

indeed, i have a hundred of thousand of vertices for a 3D object. also the A matrix isn't constant, it is produced after many operations. why you said I do not need CUDA?? If I compare the execution time between CPU and GPU, am I get any significant difference ???

secondly, i would ask u about the real time execution in GPU, If i change any parameter in the matrix or even if i import a moved 3D object(i.e variable vertices). Is it supported in LabVIEW when using gpu toolkit???

Since you already have your vectors in a 2D array, you do not need the build array function, as I wrote in the edit part.

I said you do not need CUDA for only 1000 vectors. If you need to upscale, then you might need it. But I already explained this, why you ask?

I am not really familiar with LabVIEW Cuda, so I can't really help further. I imagine you can hold the two 2D arrays on the GPU memory, and only update their content as needed. But since I have no idea what is your actual procedure, I cannot really help. One important thing is to minimize the frequency of data copies between the CPU RAM and the GPU RAM, this slows things down...

ssara · ‎10-10-2017

Since you already have your vectors in a 2D array, you do not need the build array function, as I wrote in the edit part.

ok, i understood that, the vectors are not in a 2D as l need.

I said you do not need CUDA for only 1000 vectors. If you need to upscale, then you might need it. But I already explained this, why you ask?

sorry for misunderstanding, I need to upscale.

thank you,