LabVIEW

cancel
Showing results for 
Search instead for 
Did you mean: 

What happens when you specify more parallel loop iterations than you have CPUs?

Solved!
Go to solution

I have an application that does the same set of stuff up to 10 times for different hardware resources. For example, I have a list of 10 COM ports that I want to send serial commands to. I want them to be executed as close to synchronous as possible, but up to 200 ms out of sync would be acceptable. Currently I use a for-loop configured for 10 parallel iterations, and assume that LabVIEW will juggle the iterations among processors as needed, executing them as concurrently as possible. Experimentally, LabVIEW indeed seems to create 10 parallel loops even when running on a computer with only 4 logical processors, otherwise the 10-element rendezvous inside the loop wouldn't be able to complete, right?

 

My question is, am I doing something verboten with my number of loop iterations? According to the LabVIEW help for the For Loop Iteration Parallelism Dialog Box: "If you plan to distribute the VI to multiple computers, set Number of generated parallel loop instances equal to the maximum number of logical processors you expect any of those computers to ever contain." Clearly I am violating this advice yet it seems to work. Is my performance going to essentially be the same as if I had 10 blocks of code in parallel on the block diagram?

 

From reading "How Many Threads Does LabVIEW Allocate?" and the links at that page, it appears that, at worst, LabVIEW is thread starved and switching threads between iterations, but my application runs slow enough to accommodate this sub-optimal situation. At best, LabVIEW has allocated 4 threads per execution system, so as long as I have at least 3 processors then there are at least more threads that parallelized loop iterations. It's all a bit confusing.

____
Ryan R.
R&D
0 Kudos
Message 1 of 7
(5,835 Views)

If you have more allocated parallel loops than logic processors, the For loop will max out at the # logical processors. The main reason for "parallelized" For loops is for performance. If you want more actually parallel loops than your logical processor count, you will need to create more loops next to each other, or do nested For loops.

Cheers


--------,       Unofficial Forum Rules and Guidelines                                           ,--------

          '---   >The shortest distance between two nodes is a straight wire>   ---'


0 Kudos
Message 2 of 7
(5,829 Views)

There is a hard upper limit on the number of parallel instances (see this idea for details).

More parallel instances than cores is sometimes useful if you want each parallel instances to have e.g. it's own reentrant copy of a subVI or as in your case.

You need to be careful because the parallel instances don't necessarilty always execute in the same order or pattern, you can get the actual instance from the parallel ID output.

0 Kudos
Message 3 of 7
(5,819 Views)

I think that your interpretation of what happens when you have more parallel loops requested than you have processors available is accurate.

 

Slide 25 of this document shows a behind the scenes view of the parallel for loop. https://forums.ni.com/t5/IDLE-CLUG-Cambridge-LabVIEW-User/CLUG-Oct-2013-Dr-Richard-Thomas-LabVIEW-Co...

0 Kudos
Message 4 of 7
(5,813 Views)
Solution
Accepted by RnDMonkey

@RnDMonkey wrote:

I have an application that does the same set of stuff up to 10 times for different hardware resources. For example, I have a list of 10 COM ports that I want to send serial commands to. I want them to be executed as close to synchronous as possible, but up to 200 ms out of sync would be acceptable. Currently I use a for-loop configured for 10 parallel iterations, and assume that LabVIEW will juggle the iterations among processors as needed, executing them as concurrently as possible. Experimentally, LabVIEW indeed seems to create 10 parallel loops even when running on a computer with only 4 logical processors, otherwise the 10-element rendezvous inside the loop wouldn't be able to complete, right?

 

My question is, am I doing something verboten with my number of loop iterations? According to the LabVIEW help for the For Loop Iteration Parallelism Dialog Box: "If you plan to distribute the VI to multiple computers, set Number of generated parallel loop instances equal to the maximum number of logical processors you expect any of those computers to ever contain." Clearly I am violating this advice yet it seems to work. Is my performance going to essentially be the same as if I had 10 blocks of code in parallel on the block diagram?

 

From reading "How Many Threads Does LabVIEW Allocate?" and the links at that page, it appears that, at worst, LabVIEW is thread starved and switching threads between iterations, but my application runs slow enough to accommodate this sub-optimal situation. At best, LabVIEW has allocated 4 threads per execution system, so as long as I have at least 3 processors then there are at least more threads that parallelized loop iterations. It's all a bit confusing.


In this case (Where you are interacting with external hardware and have a rendezvous) there will be an inheirant delay in each iteration.  SO, "Oversubscibing" or, enabling more parallel instances that logical processors, actually improves performance by executing another iteration durring wait time.  In fact, you are not "Violating Advice" by oversubscibing.  You are using that technique correctly! You just might not have read about it.  See Esp PP 4


"Should be" isn't "Is" -Jay
Message 5 of 7
(5,806 Views)

Here's a VI I made a bit ago that uses nested parallel For loops in order to operate more parallel code than the single parallelized For loop. If you look at the Ticks output, you'll see they all operated at approximately the same time. Remember, though, like altenbach said, the order of the loop execution is not predictable.

Get All Data NestedNested.png

This is a Snippet! Drag it to your LabVIEW block diagram to import the code directly.

 

You could also set up code to Call and Collect, which can launch a whole bunch of asynchronous processes.

 

That all being said, if your setup works just fine without needing the loops to all be 100% matching, don't worry about implementing the above.

Cheers


--------,       Unofficial Forum Rules and Guidelines                                           ,--------

          '---   >The shortest distance between two nodes is a straight wire>   ---'


Message 6 of 7
(5,805 Views)

Thankj you, Altenbach, James, and Jeff, and NLutz, for the affirmation. This definitely makes me feel more comfortable with my program. I started down quite a rabbit hole when I decided to learn about multithreading and compiler behavior in LabVIEW!

____
Ryan R.
R&D
0 Kudos
Message 7 of 7
(5,762 Views)