I just noticed that I forgot to address the second part of your comments - calling CUDA libraries directly. The reason why this isn't a proposed solution is described in:
LVCUDA - Why Do I Need A Compute Context PDF.zip
It is possible to call the libraries directly but there are unexpected side effects. The options to call the libraries directly are explained on page 9-10 of the document stored at:
http://decibel.ni.com/content/docs/DOC-7707
If you choose to go the route of directly calling the libraries, you can post issues but support may be limited. The approaches covered in the document are known because they were initial implementations used for this module. The conclusion was that those implementations had too many issues to be a viable solution for most LabVIEW customers.
Darren