Parallel Consumers Maintain Data Order

ConnerP · ‎04-25-2018

So I have pulses of time data coming in and I am running it through an FFT + some other math. The result of this math should be displayed real time and in sequence. With just 1 block of code for the math, the computation lags behind DAQ significantly (~10% which adds up for long collection times). So I placed the math in a subvi, made it non-reentrant, and placed two in parallel to receive the data from a tag channel. From each of these, I drop the result back into a single channel wire to rejoin at a single subsequent consumer.

Currently when I generate my data I bundle it with it's index in the measurement series, and rebuild the result array on the other side of the math by array substitutions. Is this the correct way to go about this, or is there another way to handle parallel worker threads for streamed data? If all the data were already present I would just run it through a parallelized for loop, but this seems a bit trickier.

I can try to make a MWE when I get to the lab if this isn't clear.
Thanks for the help.

RavensFan · ‎04-25-2018

If you made the sub-VI's non-reentrant, you're not getting any benefit because only once instance of the subVI can run at a time and it will block the others.

Now if you actually made the subVI re-entrant, then multiple copies can run in parallel without blocking each other and will improve your overall processing time.

I think your scheme of providing an index with each piece, and sending all the data on to final consumer that puts it all back in order is perfect way to go. Anything that is good for memory management, and I'd say doing Replace Array Subsets over any kind of array building and growing is the best.

Hopefully your test results will show what you have proposed will be an improvement over your original code.

ConnerP · ‎04-26-2018

Oh man, that is what I get for posting in the morning! I actually meant Preallocated clones. I haven't done much with these settings, so I got confused on the names.

At any rate, thank you for the input. I already implemented it yesterday in rough form but wanted to make sure this was the sensible way to go. I've only been using LV for a few months and know there are plenty of little tricks I don't know about.

Bob_Schor · ‎04-26-2018

Um, why not use a Producer/Consumer Design Pattern? As fast as the data are acquired, put them on a Queue and send them to a single Consumer Loop to process. Having one loop running at full speed will "consume" your data, maintaining the order and using the free cycles available. If you are only doing processing (like FFTs), there might not be much benefit in dividing the processing into two parallel loops and worrying about recombining them. If part of the Processing is, itself, "time-blocking" (like big I/O to disk, fancy plots, etc.), you may want to make the Consumer itself have a second Queue that it puts the data for "slow stuff" (as a Producer) to yet another Consumer. Such "priority-ordered" (a term I just invented) Producer/Consumer patterns aren't that unusual (I've got several) and keep the CPU humming away, processing the most critical things (data acquisition) first, the next most critical (initial data "massaging") second, and the least critical (so-called because they can "lag behind", time-wise) last.

Bob Schor

ConnerP · ‎04-26-2018

I am already using a producer consumer setup. DAQ >---Tag Channel--> FFTx2 >---Tag Channel---> Further processing

A little more about my setup: each block of data is on the order of 4ms long with a sample rate such that I have ~100k samples per block. A single measurement includes ~100k blocks. Already by the the time the 10k-th block is acquired, the FFT has lagged 1k blocks behind. I don't what to have my data collection be finished but waiting for 10k more FFTs to finish.

Kevin_Price · ‎04-26-2018

100k samples per 4 msec block means that you're sampling at 25 Megahertz. That's putting you into a realm where it's no longer trivial to stream your data around, like the way it would be at 5 KHz.

If you post your code, someone may spot something that could be done better. My main initial thought is to keep the DAQ "producer" loop *tight*. As in, do nothing but read the data and pass it directly into the Enqueue function. Do not branch the data wire anywhere else. Put no other time consuming function in the loop. Just read and enqueue.

For everything else downstream, the side effect of bogging down and falling behind is merely lag. For the DAQ loop, the possible side effect is an error that stops the entire data stream in its tracks until you restart your acquisition program. That's why I'd put primary importance on making the DAQ loop as lean as possible -- that's where the harshest consequences come from.

-Kevin P

ALERT! LabVIEW's subscription-only policy came to an end (finally!). Unfortunately, pricing favors the captured and committed over new adopters -- so tread carefully.

Bob_Schor · ‎04-26-2018

So you are trying to "stream-process" a 25MHz signal for 400 seconds, a total of 10 Giga-samples, and you are finding that it takes a while to process them.

Suggestion -- get a faster PC, preferably with as many cores (an I7 or better) as you can, or see if you can get your hands on a machine with a fast FPGA section to handle the FFTs.

Bob Schor

RavensFan · ‎04-26-2018

@Bob_Schor wrote:

Um, why not use a Producer/Consumer Design Pattern?

Bob Schor

I assumed that was already happening when ConnorP started talking about merging the results through the channel wires. I'm visualizing the problem as data is acquired, it is passed off to a consumer loop and that subVI is taking longer to process than the data is coming in, so the queue is growing. And the results of the processed data are lagging further and further behind the original acquisition. But if there were two consumer loops working in parallel on different packets of data, then the time spend is less. and the final consumer loop would take each packet and put it back together in the correct order.

ConnorP, is my interpretation explaining your original problem correctly? If not, can you post your original code so we can look at it?

Bob_Schor · ‎04-26-2018

One needs to look closely at Consumer loops and see if there are code elements that are not CPU-intensive, but are slow because "the hardware is slow". Examples are displays (especially scrolling Charts) and I/O. If it is all "computation", then it is unclear (to me) that breaking a task into parallel (identical) sub-Tasks and later recombining the results will result in much speed improvement. At the very least, one should test this assumption with simulated (or, perish forbid, even "real" data).

Certain "processing" can be sped up. If you are acquiring data at 1KHz, it doesn't make sense to plot every point, as you can't see that time detail. Plotting every 20 or 50 points not only speeds things up, but lets you see the data. If you are saving to disk, so what if the disk write lags by seconds (or minutes), as long as it gets saved. However, this should also be checked, as you really don't want the queue to keep growing (and requiring more buffer space, which takes time, which takes more time). Maybe invest in a fast SSD?

Bob Schor

P.S. -- I recently did just such a "reality" test on some code I'd written that I thought was the Cat's Pajamas. Then I calculated the data rate coming in over TCP/IP if we ran with all the Stations (24) actively sending, and found we were trying to cram hundreds of megabits of data over a 100-BaseT Ethernet line. Oops. Not to mention the size of the Image buffers thanks to our (only slightly higher than Version 1) Frame Rate ... Double-oops.

drjdpowell · ‎04-26-2018

I did this once for a CPU intensive operation that was single-threaded internally. By having one "worker" loop per core I could get all cores running on different bits of the data. To reorder, I used what is effectively "Futures". These were temporary single-element queues (SEQ). I would pass a new SEQ with each "job" passed to the Queue the workers consumed from. The workers would enqueue their results to the SEQ. In parallel I passes the SEQ's to the ultimate consumer of the results, who would wait on each Queue in order (releasing the Queue when finished). Since the SEQs were sent in order, it did not matter what order the workers finished their jobs.

LabVIEW

Parallel Consumers Maintain Data Order

Parallel Consumers Maintain Data Order

Re: Parallel Consumers Maintain Data Order

Re: Parallel Consumers Maintain Data Order

Re: Parallel Consumers Maintain Data Order

Re: Parallel Consumers Maintain Data Order

Re: Parallel Consumers Maintain Data Order

Re: Parallel Consumers Maintain Data Order

Re: Parallel Consumers Maintain Data Order

Re: Parallel Consumers Maintain Data Order

Re: Parallel Consumers Maintain Data Order