01-08-2020 09:31 AM
Hi !
I tray to run my code on a Dell poweredge server with four intel xeon processor in it.
My code is actualy running on a single processor (18 cores, 36 threads) and I can't figure out how to use the full avalible power.
The machine is running Windows server 2012 R2.
Do anny one have an idea ?
Solved! Go to Solution.
01-08-2020 09:45 AM
Post your code, you'll get plenty of answers.
Most likely your code is entirely sequential and the LV compiler recognizes there'd be benefit in assigning different parts to different cores. Try putting down 3 independent parallel while loops that do nothing but iterate fast and update an indicator from their iteration terminal 'i' so you can see that they're running. I'll bet you see 3 cores getting used because the compiler recognizes the benefit of splitting up parallel work to separate cores.
-Kevin P
01-08-2020 10:14 AM
Hi Kevn,
Thank you for your answer.
I diden't put my code because here is more than 50 VI on it ^_^, it is a simulation of a process for my research and I try to run as many simulation as I can.
The code is using a for loop with paralelisation actived.
My machine have four processor with 36 threads each so 144 threads in total.
I have 36 simulation in parallel but no more...
01-08-2020 11:54 AM
Ok, I think I misunderstood your problem. It now appears that you *are* seeing the parallelization benefit from 18 cores of 1 CPU, but are *not* seeing any benefit from the 3 additional CPUs.
At this point, I'll have to bow out due to lack of experience and in-depth knowledge.
Anyone else?
(Also noticed a typo in my first reply, 1st sentence of 2nd paragraph should have said "no benefit". Though again, I was mistakenly replying to a wrong understanding of the problem.)
-Kevin P
01-08-2020 12:08 PM
We can't verify that you have everything set up to work the way you would like it to without at least seeing a stripped-out version of your code - e.g., all the loops in place with a minimum of stuff - except leave the code inside the parallelized FOR loops intact, so we can examine whether or not the compiler thinks it is actually appropriate to handle the code inside in a parallel fashion.
01-08-2020 12:34 PM
These are hyperthreaded cores, so the max speedup will be around 4x18.
We really need to see a simplified version of the code. Is the vast majority of the loop code reentrant? Do you have a critical section? Do the parallel instances share resources?
How did you configure the parallel for loop?
01-08-2020 01:17 PM - edited 01-08-2020 01:23 PM
I really would love if you could do a quick benchmark using my program. (requires LabVIEW 2015 runtime engine). This way we can tell how much speedup you can potentially get using highly optimized code.
As you can see from my benchmark table, I get a >16x speedup on my dual 8 core xeon and another user with a quad Xeon (48) got a >43x speedup (even thought hyperthreading was disabled in the bios in this case).
Now somebody needs to buy me a one of these. 😄
01-09-2020 03:21 AM
I have run the benchmark (but using Labview 2018 runtime engine).
resultat are the same as what I get with my programme : only 1 physical processor is runing... I can see it in resources monitor.
I get best score : 73,78 Hz (serialized) 1204.23 Hz (parallelized) so 16,23x
I wonder if the problem come from windows server 2012 R2 licence ? Even if it should be ok.
Any one have an idea ?
01-09-2020 03:43 AM
Another isue is the fact that parallele "for loop" are limited to 64 instances in Labview.
I have try to creat very simple code in four loops with each one of them set for 36 parallele iteration (I also tryed with 18 ideration) but result are the same, only one processor worked.
I have been able to get two processor running at full power simultaneously by running my programme on two Windows session. But it's not relible and dosen't work for more than two processors.
01-09-2020 10:40 AM
@tdcpopiou wrote:
Another isue is the fact that parallele "for loop" are limited to 64 instances in Labview.
You can raise the limit to 256 after you change an entry in the ini.
ParallelLoop.MaxNumLoopInstances=256
(details)