ā10-17-2018 04:31 PM
I have an 1D array of Values that I want to replace in place and in parallel. The elements indices I want replaced are determined by an array of non-repeating integers. Since the elements and their values are independent I should be able to replace them in parallel and in place but I'm not sure how.
The VI shown below demonstrates the result I want but the for loop is not parallelizable (clearly because of the shift register). I have attempted to using an in place structure to pass the array into the for loop by reference but it was considerably slower than the shift register method below. Am I overlooking something simple?
Attached is my example VI.
ā10-17-2018 04:41 PM
Looks fine to me, have you found this to be a bottleneck in your code? There is no need to reverse the array, since you are replacing the elements. If you were deleting elements, then starting at the back of the array would make sense.
ā10-17-2018 04:52 PM
In my larger application I have large array (representing an image) and as time goes on simulated events require random pixels in the image to change. For example I may be commanded to change a random set of array elements say [10,500,0,8,37,999] (as shown in the non-repeating integer array box). Given that each element can be replaced In parallel I'd like to parallelize the replace loop (to the extent I can given the number of CUP cores I have) it because if I'm commanded to change a large number of elements there are slowdowns. So in essence, the non-repeating integer array is a known value and I want to parallelize the loop where I replace the elements in the double array.
ā10-17-2018 04:53 PM - edited ā10-17-2018 05:00 PM
Hi Brandon,
In order to operate on the array in-place and in parallel, you need to cast it to a DVR. This is essentially handling the variable by reference instead of by value, so you don't need a shift register to ensure you are maintaining state between loop iterations, allowing parallelization of the for loop. Code attached.
ā10-17-2018 05:10 PM
DJColeslaw, I don't think that will help. See point 4 in the whitepaper:
"4.The In Place Element Structure has a pair of nodes for Data Value Reference Read/Write for dereferencing and rereferencing the data, respectively. This structure blocks the execution of other structures using the same reference until it is finished and the data has been rereferenced."
ā10-17-2018 05:34 PM - edited ā10-17-2018 05:45 PM
@DJColeslaw, thank you for the quick response.
Although this (may?) allow the for loop to access the elements in parallel. I'm not sure why but, even parallelized, the data by reference is considerably slower than the unparalleled shift register. As mentioned in my original post when I tried this is was unexpectedly slow.
Attached is the Shift register and the DVR VI's with some benchmark code. on My pc with 12 cores the DVR method is ~ 100x slower than the shift register method.
By all means, take a look and let me know if I have done something wrong
ā10-17-2018 06:43 PM - edited ā10-17-2018 06:45 PM
Hi Brandon,
See my comment above. The DVR actually blocks all other instances from accessing it, so you can't actually access it in parallel, the other For loops are waiting for it to become free. Plus, you have the overhead of the DVR and parallel for loops, so slower than a shift register makes sense.
ā10-17-2018 06:57 PM
Brandon, gregory,
Sorry, I both overlooked the statement about using DVRs already and sort of missed the point of the question - I had some stuff come up as I was responding and apparently didn't negotiate the context switching very well. Gregoryj is right on all accounts. Using DVRs doesn't actually allow for concurrent access of the array elements (that's actually the point of the in place element structure), it just allows you to parallelize the loop that is operating on the array. For some reason, I had it in my head that the main motivation was parallelizing the loop rather than replacing the elements as quickly as possible. I retract my previous post and apologize for the fumble.
ā10-17-2018 09:16 PM
Getting back to the original question:
No, you aren't missing something simple.
The compiler *needs to* disallow this combination of using a shift register to replace values in place and loop parallelization. Think about it. At compile time, there's no way for the compiler to know whether the array of indices will end up containing any repeat values. So to be data-safe, it has to assume that it *might*. And if there are repeats, then the *order* in which the replacements occur affect the resulting final value for the array. Thus, since parallelization would make the order of replacements indeterminate, it could make the resulting data indeterminate, and therefore it *needs to* be disallowed.
-Kevin P
ā10-17-2018 09:36 PM - edited ā10-17-2018 10:02 PM
No, you cannot replace these elements in parallel but your original method is efficient and fast (~10ns/replacement). I am pretty sure this is not the time limiting operation overall. (make sure to also disable debugging).
Of course you could split the array into parts, split the indices according to the partial array and process each in parallel. Of course splitting the array and reassembling at the end will probably cost you more overall.
Can you explain how this all fits into the main program. What else is going on? What are the rate limiting processes? This is definitely not it. There might be the real bottlenecks elsewhere that could be tuned.