CUDA device memory allocation and returns the device ptr.

SNSVSN · ‎04-13-2011

Hello Everbody,
I have a strange problem and i hope i have come to the right place to find some help.
For my project i am trying to use CUDA to do flat field correction on an incoming sequence of images. Currently in my setup i have created a CUDA dll in visual studio and calling this DLL from labview. LAbview is aquiring the images from the frame grabber and sending it to GPU via the CUDA dll that i created for processing. Everything works fine the flat field correction is also fine . However my current implementation is not efficient. For the correction i need two constant images the flat field and the dark field. In the current setup every time the dll is called i copy the flat field and the dark field to the device memory for calculations . However since these images are constant is there a way i can copy these images to the device memory outside the image acquisition loop and then after the acquisition is done free the device memory. In other words i want to do a cudamalloc and a cudamemcpy of the two constant images outside the acquisition loop. Use the device ptrs in the acquisition loop and then free them at the end.

I tried to create a seperate dll that does only cudamalloc and returns the device ptr . I call this dll before the aquistion loop. But the value returned in the device ptr after this dll has finished execution is 0.

I went through a document by AndreyDmitriev in which he returns a device ptr adress from the dll. I tried to do this but it does not work.

Any assistance on this matter would be great .

Regards
SNS

Kevin_H. · ‎04-14-2011

I think your strategy of doing a separate DLL to perform this function is good. I'm not sure I 100% understand the application, but it sounds like you're saying you have an operation you want to happen just once outside of the loop and then have the rest of the application continue executing. If this is true then your strategy is good. I don't have any experience with Cuda, but it looks like the malloc function you're calling returns an error state. Have you checked the error state to make sure it's success? Do you know what the significance of a 0 devptr is? This forum post looks kind of useful for understanding how these calls should be structured, but I'm only going off of what I know of C programming. http://forums.nvidia.com/index.php?showtopic=172554 I assume you're testing your DLL calls before implementing them in LabVIEW, are you getting the same results there? Have you tried posting to the NVIDIA forums? Let me know if any of this helps or if you have some further information I could use to try to help more!

SNSVSN · ‎04-15-2011

Kevin,

Thanks for your reply. I was able to get it working . I was not able to get the ptr out properly before. Once i was able to get the ptr out properly i was able to get the program running . It shortened my runtime by a considerable amount..

On the same lines of memory allocation . i want to allocate a page locked memory for the images when the labview program starts . I donot how to modify the memory allocation done by labview so that i can allocate a page locked memory to the images.

Regards

SNSVSN

rolfk · ‎04-15-2011

@SNSVSN wrote:

Kevin,

Thanks for your reply. I was able to get it working . I was not able to get the ptr out properly before. Once i was able to get the ptr out properly i was able to get the program running . It shortened my runtime by a considerable amount..

On the same lines of memory allocation . i want to allocate a page locked memory for the images when the labview program starts . I donot how to modify the memory allocation done by labview so that i can allocate a page locked memory to the images.

Regards

SNSVSN

I don't think that is an option really. The image structures need to be allocated with the LabVIEW memory manager functions in order to be managable by LabVIEW later on and there is no way to modify the memory manager functions yourself.

Basically if you allocate the memory in your own way, LabVIEW will get into deep trouble when it tries to manage the image later on, including to deallocate them, and if you let LabVIEW allocate the image you can't really tell it to do special page locked allocation. Also standard malloc() doesn't even support page locking. That could be only handled by cuda or the OS kernel API and you can't push such memory under the feet of the LabVIEW memory manager. That manager simply allocates application memory in the virtual memory application heap, not in the physical memory space.

Rolf Kalbermatter My Blog

DEMO, Electronic and Mechanical Support department, room 36.LB00.390

brano.s · ‎04-24-2012

Hi SNSVSN,

I am dealing with same problem and I did not solve it yet.

Would you be so kind help me?

I am not sure how to get the ptr out properly and how to send it back to second dll.

Thanks

Regards,

Brano.

Tunde_S · ‎04-25-2012

Hi brano.s,

This thread was started over a year ago. As such, I am not sure if SNSVSN is still monitoring this thread. I also am not sure how SNSVSN got the ptr out.

However, I believe the NI GPU Computing Community may have the answer you are looking for.

Regards,

Tunde S.
Applications Engineer
National Instruments

brano.s · ‎04-26-2012

Hi Tunde S.,

thank you for advice. I will try it.

Regards,

Brano.

LabVIEW

CUDA device memory allocation and returns the device ptr.

CUDA device memory allocation and returns the device ptr.

Re: CUDA device memory allocation and returns the device ptr.

Re: CUDA device memory allocation and returns the device ptr.

Re: CUDA device memory allocation and returns the device ptr.

Re: CUDA device memory allocation and returns the device ptr.

Re: CUDA device memory allocation and returns the device ptr.

Re: CUDA device memory allocation and returns the device ptr.