Hi there,
I currently use CUDA in LabVIEW for matrix multiplication as shown in the attachment
the GPU is used to compute large amount of data.
so I wounder how can I multiply a large no. of vectors (each of one) by a N*N matrix to produce a new set of vectors???
like this
x0,y0,z0 1st vector
x1,y1,z1 2nd vector
:
:
:
x1000,y1000,z1000
these vectors are stored as a 3*1000 array.
how can these operations of mul (vector by N*N) are distributed in the cores of GPU??
any advise please.
Solved! Go to Solution.
no idea on specifics of CUDA,
but for splitting the problem in smaller parts,
you could just let different parts of the vector multiplication run on different cores/processes/magic.
by which i mean, the matrix multiplication rule is "line by column" (loosely translated),
and then you just let the multiplication of line1-by-column1 run on the first core,
and line2-by-column1 on the next and so forth.
(assuming right-sided vector multiplication, but left-sided multiplication would be similar)
@jwscs wrote:
no idea on specifics of CUDA,
but for splitting the problem in smaller parts,
you could just let different parts of the vector multiplication run on different cores/processes/magic.
by which i mean, the matrix multiplication rule is "line by column" (loosely translated),
and then you just let the multiplication of line1-by-column1 run on the first core,
and line2-by-column1 on the next and so forth.
(assuming right-sided vector multiplication, but left-sided multiplication would be similar)
but the vector-matrix multiplication is done internally in GPU and i cant divided it!
I want to mul each vector with the specified matrix such as A, and store the result back in vector.
Why did you upload that bad quality screenshot? Even worse, packed in a zip. Can you upload the VI itself?
@Blokk wrote:
Why did you upload that bad quality screenshot? Even worse, packed in a zip. Can you upload the VI itself?
oh im sorry for that , i attached the vi here.
please help me
Ok, lets forget CUDA for a while, and lets only use the CPU first. As I understood, you want to multiply 1000 3-dimensional vectors with a single 3X3 matrix. What you could do, bundle the 1000 vectors (1D arrays in LabVIEW) into a 2D array, transpose it, finally multiply the single matrix with this 3X1000 sized matrix. The result matrix will contain the result vectors as columns. See this snippet below. So as I imagine, you could do the same with CUDA, and get the results with one step.
Edit: actually for 1000 such small vectors, you do not need CUDA...If you do not need to upscale this operation later, just use the CPU...
edit2: "these vectors are stored as a 3*1000 array." ok, so you already have the vertex matrix. So just multiply the 2 matrices, and you get the results in the columns...
@Blokk wrote:
Ok, lets forget CUDA for a while, and lets only use the CPU first. As I understood, you want to multiply 1000 3-dimensional vectors with a single 3X3 matrix. What you could do, bundle the 1000 vectors (1D arrays in LabVIEW) into a 2D array, transpose it, finally multiply the single matrix with this 3X1000 sized matrix. The result matrix will contain the result vectors as columns. See this snippet below. So as I imagine, you could do the same with CUDA, and get the results with one step.
oh yes,
by build array, thank you for your help.
@Blokk wrote:
Edit: actually for 1000 such small vectors, you do not need CUDA...If you do not need to upscale this operation later, just use the CPU...
indeed I have a 3D object with hundred of thousand of vertices, and the A matrix isnt constant, it is produced after many operations. why i dont need GPU ???? I want to compare with GPU and CPU, can i get a segnificant difference in execution times?
secondly I would ask you about the real time operation in GPU, is it supported in GPU ??? If i import a real moved object or even if i change any parameter in the matrix???
oh yes
by build array, thank u for your help .
Blokk wrote:Edit: actually for 1000 such small vectors, you do not need CUDA...If you do not need to upscale this operation later, just use the CPU...
indeed, i have a hundred of thousand of vertices for a 3D object. also the A matrix isn't constant, it is produced after many operations. why you said I do not need CUDA?? If I compare the execution time between CPU and GPU, am I get any significant difference ???
secondly, i would ask u about the real time execution in GPU, If i change any parameter in the matrix or even if i import a moved 3D object(i.e variable vertices). Is it supported in LabVIEW when using gpu toolkit???
@ssara wrote:
oh yes
by build array, thank u for your help .
Blokk wrote:Edit: actually for 1000 such small vectors, you do not need CUDA...If you do not need to upscale this operation later, just use the CPU...
indeed, i have a hundred of thousand of vertices for a 3D object. also the A matrix isn't constant, it is produced after many operations. why you said I do not need CUDA?? If I compare the execution time between CPU and GPU, am I get any significant difference ???
secondly, i would ask u about the real time execution in GPU, If i change any parameter in the matrix or even if i import a moved 3D object(i.e variable vertices). Is it supported in LabVIEW when using gpu toolkit???
Since you already have your vectors in a 2D array, you do not need the build array function, as I wrote in the edit part.
I said you do not need CUDA for only 1000 vectors. If you need to upscale, then you might need it. But I already explained this, why you ask?
I am not really familiar with LabVIEW Cuda, so I can't really help further. I imagine you can hold the two 2D arrays on the GPU memory, and only update their content as needed. But since I have no idea what is your actual procedure, I cannot really help. One important thing is to minimize the frequency of data copies between the CPU RAM and the GPU RAM, this slows things down...
Since you already have your vectors in a 2D array, you do not need the build array function, as I wrote in the edit part.
ok, i understood that, the vectors are not in a 2D as l need.
I said you do not need CUDA for only 1000 vectors. If you need to upscale, then you might need it. But I already explained this, why you ask?
sorry for misunderstanding, I need to upscale.
thank you,