04-25-2012 05:54 AM
Hello,
I am trying to write DLL function to allocate cuda memory and get back pointer to cuda (device) memory.
Second function should accept this pointer and do the calculation.
I want this operation to be separate because I need to do many calculations on the same data and I am trying to avoid repeatly copying same data to GPU memory (it take a lot of time)-
My DLL:
main.cpp:
extern "C" __declspec(dllexport) int cuda_Malloc ( float *i, void *i_d, int N ){
for( float x=0; x<N; x++ )
i[x]=x;
kernel_cuda_Malloc( i, i_d, N );
return 0;
}
extern "C" __declspec(dllexport) int cuda_Malloc ( float *i, void *i_d, int N ) {
kernel_cuda_calculation( i_d, result, N );
return 0;
}
simple.cu:
__global__ void kernelTest( float *i, int N ){
unsigned int tid = blockIdx.x*blockDim.x + threadIdx.x;
if ( tid<N )
i[tid] += 10;
}
int kernel_cuda_Malloc( float *i, void *i_d, int N ){
cudaMalloc( (void**)&i_d, N*sizeof( float ) );
cudaMemcpy( i_d, i, N*sizeof( float ), cudaMemcpyHostToDevice );
return 0;
}
void kernel_cuda_calculation( float *i_d, float *result, int N ){
dim3 threads; threads.x = 240;
dim3 blocks; blocks.x = ( N/threads.x ) + 1;
kernelTest<<< threads, blocks >>>( i_d, N );
cudaMemcpy( result, i_d, N*sizeof( float ), cudaMemcpyDeviceToHost );
cudaFree( i_d );
}
I am not able to get out pointer to "i_d" from "cuda_Malloc" function in Labview.
Code is modification of https://decibel.ni.com/content/docs/DOC-20353
04-25-2012 06:15 AM
Here is similar question but the guy managed to solve it so there is no correct answer :
04-25-2012 11:16 AM - edited 04-25-2012 11:16 AM
You will need to pass in the cuda pointer as a reference to the pointer so it can be returned to LabVIEW. Then configure that parameter in the Call Library Node to be a pointer sized Integer passed by reference. If you pass in the pointer by value into kernel_cuda_Malloc() (and also cuda_Malloc() of course) then the pointer will in fact never get passed out of either function.
Make sure to consistently treat this pointer as pointer sized integer everywhere in your LabVIEW diagram. You most likely can't access it in a LabVIEW diagram directly without going explicitedly through cuda library functions, since that pointer might be not located in the heap at all and that is the only memory LabVIEW can access directly.
04-26-2012 07:20 AM
Hallo Rolf Kalbermatter,
thank you for replay.
I have change a code acording your advice (changed parts are highlited).
I send to Call Library Node as i_d dummy value - zero, and I am expecting that after cuda_Malloc finish i_d would contain pointer to i_d but it is still zero.
Call Library Node configuration for i_d:
Type: Numeric
Data type: Signed 32-bit Integer.
My DLL:
main.cpp:
extern "C" __declspec(dllexport) int cuda_Malloc ( float *i, void **i_d, int N ){
for( float x=0; x<N; x++ )
i[x]=x;
kernel_cuda_Malloc( i, i_d, N );
return 0;
}
extern"C" __declspec(dllexport)int cuda_Calculation(void*i_d,float*result,int N ){
kernel_cuda_calculation( i_d, result, N );
return 0;
}
simple.cu:
__global__ void kernelTest( float *i, int N ){
unsigned int tid = blockIdx.x*blockDim.x + threadIdx.x;
if ( tid<N )
i[tid] += 10;
}
int kernel_cuda_Malloc( float *i, void **i_d, int N ){
cudaMalloc( (void**)&i_d, N*sizeof( float ) );
cudaMemcpy( i_d, i, N*sizeof( float ), cudaMemcpyHostToDevice );
return 0;
}
void kernel_cuda_calculation( float*i_d,float*result,int N
){
dim3 threads; threads.x = 240;
dim3 blocks; blocks.x = ( N/threads.x ) + 1;
kernelTest<<< threads, blocks >>>( i_d, N );
cudaMemcpy( result, i_d, N*sizeof( float ), cudaMemcpyDeviceToHost );
}
04-26-2012 08:23 AM - edited 04-26-2012 08:26 AM
And did you make also changes to the configuration of that parameter in the Call Library Node?
You also forgot to remove the pointer operator inside kernel_cuda_Malloc().
These have little to do with LabVIEW, CUDA or the Call Library Node, but are deficits in basic understanding about pointers. Read through a C text book and work specifically through the chapter about pointers.
04-26-2012 08:27 AM
I send to Call Library Node as i_d dummy value - zero, and I am expecting that after cuda_Malloc finish i_d would contain pointer to i_d but it is still zero.
Call Library Node configuration for i_d:
Type: Numeric
Data type: Signed 32-bit Integer.
04-26-2012 08:31 AM
you changed the pointer from being passed as value to be passed by reference in the C code, so you should make the according change in the Call Library Node too.
04-26-2012 08:45 AM
I do not know how to configure Call Library Node properly.
I do not know which option is the right one.
04-26-2012 08:55 AM
You do know what is the differnce about passing a parameter by value and by reference? And you do realize that these correspond to the two possible valus of the Pass control in the Call Library Node configuration?