03-09-2010 10:51 AM
03-10-2010 08:00 AM
Hi Oskar,
Glad to see the solution works for you!
Since the solution will introduce additional data copy of sub-matrix, I am not surprised that the solution is 1.6x slower than the build-in A x B VI on Core 2 machine.
MKL uses different optimized code on different processors. I know Intel designs specific code for Core 2 CPU. So it is possible that the VI on Core 2 is faster than Xeon Processor.
Best Regards,
Michael