I see... so really you see a performance difference with a known VC++ program and trying to implement the same thing in LabVIEW. That seems entirely possible that you would see a performance difference, but not in my opinion because LabVIEW is less optimized. Probably the only way you could really find out what's going on is to look at the VC++ code and re-duplicate it--something you can't do obviously if you don't have the VC++ code.
I wonder if there is a solution for you somewhere in the middle by caching data either internally or to disk, then spawning a parallel process to transfer the cached data to Excel. That seems likely to improve your processor usage by minimizing the amount of transfers that take place, at the cost of not having the latest data in Excel if you opened your file--you would always have a chance that there are between 1 and n data points not logged yet, where n is the size of your cached data set.
Also, when logging to Excel, are you doing this immediately after your DAQ routine in the same loop? Have you tried running a queue to process data separately to Excel so that your DAQ loop puts data in the queue and another loop processes it and sends it to Excel? If you had this type of program architecture and performance was still poor it would be very easy to modify it like I said earlier... to test and see if caching the data would help improve performance. Parallel loops also seem to hold some possibility for not interfering with your DAQ operations to the point that they won't work right due to time delays.