04-13-2011 03:19 PM
Hello Everbody,
I have a strange problem and i hope i have come to the right place to find some help.
For my project i am trying to use CUDA to do flat field correction
on an incoming sequence of images. Currently in my setup i have created
a CUDA dll in visual studio and calling this DLL from labview. LAbview
is aquiring the images from the frame grabber and sending it to GPU via
the CUDA dll that i created for processing. Everything works fine the
flat field correction is also fine . However my current implementation
is not efficient. For the correction i need two constant images the flat
field and the dark field. In the current setup every time the dll is
called i copy the flat field and the dark field to the device memory for
calculations . However since these images are constant is there a way i
can copy these images to the device memory outside the image
acquisition loop and then after the acquisition is done free the device
memory. In other words i want to do a cudamalloc and a cudamemcpy of the
two constant images outside the acquisition loop. Use the device ptrs
in the acquisition loop and then free them at the end.
I tried to create a seperate dll that does only cudamalloc and returns the device ptr . I call this dll before the aquistion loop. But the value returned in the device ptr after this dll has finished execution is 0.
I went through a document by AndreyDmitriev in which he returns a device ptr adress from the dll. I tried to do this but it does not work.
Any assistance on this matter would be great .
Regards
SNS
04-14-2011 11:39 AM
I think your strategy of doing a separate DLL to perform this function is good. I'm not sure I 100% understand the application, but it sounds like you're saying you have an operation you want to happen just once outside of the loop and then have the rest of the application continue executing. If this is true then your strategy is good. I don't have any experience with Cuda, but it looks like the malloc function you're calling returns an error state. Have you checked the error state to make sure it's success? Do you know what the significance of a 0 devptr is? This forum post looks kind of useful for understanding how these calls should be structured, but I'm only going off of what I know of C programming. http://forums.nvidia.com/index.php?showtopic=172554 I assume you're testing your DLL calls before implementing them in LabVIEW, are you getting the same results there? Have you tried posting to the NVIDIA forums? Let me know if any of this helps or if you have some further information I could use to try to help more!
04-15-2011 07:40 AM
Kevin,
Thanks for your reply. I was able to get it working . I was not able to get the ptr out properly before. Once i was able to get the ptr out properly i was able to get the program running . It shortened my runtime by a considerable amount..
On the same lines of memory allocation . i want to allocate a page locked memory for the images when the labview program starts . I donot how to modify the memory allocation done by labview so that i can allocate a page locked memory to the images.
Regards
SNSVSN
04-15-2011 03:01 PM
@SNSVSN wrote:
Kevin,
Thanks for your reply. I was able to get it working . I was not able to get the ptr out properly before. Once i was able to get the ptr out properly i was able to get the program running . It shortened my runtime by a considerable amount..
On the same lines of memory allocation . i want to allocate a page locked memory for the images when the labview program starts . I donot how to modify the memory allocation done by labview so that i can allocate a page locked memory to the images.
Regards
SNSVSN
I don't think that is an option really. The image structures need to be allocated with the LabVIEW memory manager functions in order to be managable by LabVIEW later on and there is no way to modify the memory manager functions yourself.
Basically if you allocate the memory in your own way, LabVIEW will get into deep trouble when it tries to manage the image later on, including to deallocate them, and if you let LabVIEW allocate the image you can't really tell it to do special page locked allocation. Also standard malloc() doesn't even support page locking. That could be only handled by cuda or the OS kernel API and you can't push such memory under the feet of the LabVIEW memory manager. That manager simply allocates application memory in the virtual memory application heap, not in the physical memory space.
04-24-2012 08:29 AM
Hi SNSVSN,
I am dealing with same problem and I did not solve it yet.
Would you be so kind help me?
I am not sure how to get the ptr out properly and how to send it back to second dll.
Thanks
Regards,
Brano.
04-25-2012 08:00 PM
Hi brano.s,
This thread was started over a year ago. As such, I am not sure if SNSVSN is still monitoring this thread. I also am not sure how SNSVSN got the ptr out.
However, I believe the NI GPU Computing Community may have the answer you are looking for.
Regards,
04-26-2012 04:03 AM
Hi Tunde S.,
thank you for advice. I will try it.
Regards,
Brano.