Hi Dwivedi,
I haven't used any DAQ cards in a long time, but I have written data acquisition software for use with FieldPoint that worked equally reliably on P3 boxes, P4 boxes (legacy and HT), and a dual Xeon box. I started my early installations as CVI 6 projects, and have since recompiled and redistributed CVI 7.1 code. I never had one lick of problem. If you are using DAQ, or DAQmx, admittedly you may have issues that I don't have.
I did a very specific redesign for the dual Xeon box installation, to pull all inbound data through a single callback and then put it into a thread-safe queue as directly and quickly as possible. NI suggests that you generally not channel multiple DAQ callbacks through a single function, but my personal experience is that a 'big-iron' box can handle almost 950 FieldPoint callbacks per second, and 'baby-iron' box can handle about half of that. I have never been able to log a true inbound DAQ misbehavior with the single callback style, even though I had a few false scares initially. Using undocumented or unsuggested approaches is always done at your own risk, of course.
If you are extremely diligent in not permitting multiple threads (and therefore multiple logical processors) from accessing the same variables simultaneously, and if you heed warnings in the function panels about a status variable or return value not being thread-safe, you generally should be OK. If any function calls that you use are not thread-safe, then use thread locks or similar.
It doesn't sound like you are sampling too quickly, so you might run some speed tests of just the DAQ portion of your code to see what the limit is on the different types of boxes. You may find that at extremely high DAQ rates, you can't log any malfunctions, this would mean that you probably have a glitch or timing issue with some other portion of your code.
I would also mention that all of the software that runs in conjunction with your code can have a profound impact on performance. Even though the dual Xeon boxes might have about 200% more 'power' than a single legacy P4, my CVI code can handle almost 10x the load. Since I have lots of disk I/O to a database engine, the hyperthreading picks up all of the blocks and stalls and lets other portions of my code continue to run. I'm not saying that your install will be the same, but I am saying that some really unpredictable results can occur with hyperthreading, even moreso with multi-processor hyperthreading.
Orlan