LabVIEW Idea Exchange

jspinozzi · ‎03-29-2017

I have code I want to run in parallel, I've confirmed that if I write the code 3 times to execute in parallel there is no problem doing it. Yet when I try to put the code once in a for loop with iteration parallelism on, set to 3 parallel instances, it will only run 2 at a time because my target cRIO only has two processors. I suppose this is a requirement if we want the code to be truly parallel. But in my case, I'm satisfied with them running pseudo-parallel using whatever behavior happens when I write the code 3 times in parallel and they appear to execute in parallel. So like a timed loop, I'd like an option to set iteration parallelism to be either a) targeted to available processors OR b) just let them be launched in parallel asynchronously regardless of number of processors, and perhaps not truly parallel (with appropriate warnings). See attached image

Note - Specific to Softmotion: I realize there are theoretically other ways to do this asynchronously built into Softmotion, but they did not execute as expected. Yet this is just about the loop iteration parallelism.

jspinozzi · ‎03-31-2017

.

jspinozzi · ‎03-31-2017

OK So I started this reply writing that I was 99% sure this is intended behavior. If you only have 2 processors, it will only run 2 at a time. No way around it.

I just re-read all the Help and other topics and it's implied or mentions processors a LOT but does not specifically say iterations = processors. Thus I think maybe I just found an half-explanation and extrapolated it. Then I noticed a confusable explanation that says what it will do based on what you wire to P - - 0, or not wired = MinimumOf(Number of logical processors, Dialog value) thus it will use the lower of the number you set in the Loop Parallelization dialog ("Dialog Value") and the number of processors. Thus if I set 6 in the dialog, but don't wire P, it will use 2.

So I re-coded to use my single piece of code in a loop, set 6 in the dialog, and did not wire P. It runs 2, then 2, then 2.

Then I wired P=6, and now it runs 6 in parallel as expected. Sigh.

Now that I re-read all the help and topics yet again I find "If you leave the input of the parallel instances terminal unwired, LabVIEW automatically detects the number of logical processors in the machine and uses it as the default parallel instances terminal value." as well as the repeated advice "If you plan to distribute the VI to multiple computers, set Number of generated parallel loop instances equal to the maximum number of logical processors you expect any of those computers to ever contain. ". Thus I now re-interpret the Dialog setting to be the Max number you might EVER want, not how many you actually want now. Or said differently - if you don't wire P you haven't specified what you want. This dialog value isn't a setting of what you want to run, it's a setting for the most you ever expect to run. Perhaps now re-reading the dialog description that will be painfully obvious in hindsight.

So nevermind the feature as titled... perhaps instead some improvement on the dialog of what the dialog setting means, or live feedback on how many will actually be running at runtime, such as showing "You've selected 6 maximum, on this machine it will run 2 because P is not wired"... something more intuitive.

jspinozzi · ‎03-31-2017

OK darnit now I've gone back and explained it to a colleague, and I wasn't wrong after all. The setting in the dialog is either really clear, or it's perfectly unclear. It definitely says "the number you want to run", and then contradicts itself. See underlines statements here.

Number of generated parallel loop instances—Specifies the number of For Loop instances you want LabVIEW to generate at compile time. If you plan to distribute the VI to multiple computers, set Number of generated parallel loop instances equal to the maximum number of logical processors you expect any of those computers to ever contain.

Regardless, then it also says what will happen if you don't wire P, and that seems to be correct, because the solution for my problem was wiring P. The feature request may be to clarify all this in the interface.

robdye · ‎03-31-2017

Here is how the ParFor loop works:
The number you specify in the Configure Iteration Parallism dialog - let's call it W, for number of workers - is used to transform your ForLoop into W different, independent instances of your original ForLoop. Literally, the compiler generates W instances that can be executed in parallel. W is therefore the maximum amount of parallelism that can be exhibited by that ParFor.

The ParFor does NOT directly specify that any instance execute on a particular thread or processor. The code generated by the transform decides how many of the generated instances will be used as follows:
If P is wired, the value provided on that wire is pinned to be in the range [1 to W], and that is the number of instances that will be scheduled to run. Values of 0 and -1 are treated specially: 0 means run a number of instances equal to the number of CPU cores on the machine on which we are running. -1 means run all of the instances specified in the dialog, which is equal to W. If P is unwired, it is treated the same as wiring the value of zero.
After the ParFor "scheduler" (the code that generated for the ParFor decides how to use the generated instances) has decided the value of P, it divvies up the data to that many worker instances, and schedules those instances on the run queue. It is then left up to the execution system's pool of execution threads to dequeue them and execute them.

At this point, the number of available threads can determine how much parallelism will actually occur. As I mentioned in an earlier post, LabVIEW allocates 2x (or 4x) more threads per pool than the number of CPUs (with a minimum of 4 threads). If these threads are not busy doing something else, they will wake up immediately and begin executing the worker instances. If they are busy, they are designed to take a break every 55 ms (or so) and see if there is something waiting in the run queue to be executed. This design can be defeated by uncooperative code, such as DLLs that call the OS Sleep() API, or otherwise hog their thread. Sometimes drivers do this.

jspinozzi: I am sorry that you were confused by the description of the ParFor behavior in the Help. I will bring this to the attention of our tech writers and include a link to this thread.

X. · ‎03-31-2017

So you can have 6 threads used in parallel on a 2 core processor using loop parallelization?

robdye · ‎03-31-2017

X: If you have 6 threads available, yes, they can all be used to execute a ParFor (with at least 6 worker instances) with "virtual" 6-way parallelism. These 6 threads, however, are at the mercy of the OS scheduler, which has only 2 processors to work with. It will, in standard OS fashion, multiplex those 6 threads, by suspending 4 at a time, while allowing 2 to actually make progress. This level of scheduling is invisible to LabVIEW, and all processes. LabVIEW's scheduler works one level higher than the OS scheduler, putting pointers to little chunks of work (we call them clumps) that it has generated on a run queue of its own, and allowing execution threads to dequeue them at their own speed.

X. · ‎03-31-2017

OK, I had not realized that.

Darren · ‎05-01-2019

Any idea that has received less than 2 kudos within 2 years after posting will be automatically declined.

LabVIEW Idea Exchange

Iteration Parallelism NOT tied to processors