06-13-2018 08:30 AM
Hi all,
Can the cycle time of a producer-consumer structure be improved by adding a second, sequential, consumer? That is, the first consumer processes data and queues this to the second consumer. Sort of a producer – consumer/producer – consumer (see the VI snippets attached).
I'm asking because I use the producer-consumer structure in my application. However, when I’m running other software packages simultaneously on the PC, the consumer cannot keep up with the data acquisition rate.
The producer grabs images from a camera at 10 fps. The consumer processes the image, uses this information as input for a PID-controller and logs the data (including the frame). The cycle time of this occasionally rises to 120-125 ms.; that is too long.
I optimised the data logging by keeping the log file open instead of opening/closing on each write, and by using a buffer to write data only once every second. The computer has an i7-3770 (4 core/8 threads), so not a lot to be gained there. I also tried disabling hyperthreading to boost the capacity per thread, this had no influence.
My suspicion is that data logging takes too much time. My thought is to send the output data to a second consumer for the file interactions, essentially making the image processing loop both consumer and producer. The idea of a producer-consumer structure is to parallelise data acquisition and processing. I want to parallelise this once more, but obviously there is a limit to how often one can do this.
What do you think? I’m curious to hear your opinions on this.
Solved! Go to Solution.
06-13-2018 08:44 AM
Yes, the idea to decompose data obtaining and data logging is good. However, if loop 2 will spin faster than loop 3, the queue buffer will grow and finally eat whole the memory.
I would try using Flush Queue in loop 3 logging large sets of data at once
06-13-2018 09:03 AM
@_Y_ wrote:
I would try using Flush Queue in loop 3 logging large sets of data at once
I have found it better to use a conditional FOR loop to limit the amount of data you pull off of the queue at once. It helps with memory allocation. The idea is to use a FOR loop and autoindex the values from the queue to create an array of values. If the queue times out (set the timeout to something reasonable, not 0), stop the FOR loop and do not index that value (conditional tunnels). You can set N to something like 5 or 10 so that you get at most that many items.
06-13-2018 09:17 AM
I have found it better to use a conditional FOR loop to limit the amount of data you pull off of the queue at once.
If upper data limit is needed, FOR loop will do the work (while flash should work little faster). I think there are many possible solutions and choice should depend on selected method of data logging.
06-13-2018 09:18 AM
@_Y_ wrote:
Yes, the idea to decompose data obtaining and data logging is good. However, if loop 2 will spin faster than loop 3, the queue buffer will grow and finally eat whole the memory.
Thanks for the reply.
In the current situation Loop 2 takes about 120 ms. I suspect about 70 ms goes into image processing and 50 in logging the data.
Loop 2 always runs at 10 Hz, because it only runs when Loop 1 queues an element. This happens at fixed intervals and under normal circumstances the buffer only holds a single item at a time. Loop 3 would work similarly: it would only run if Loop 2 adds an element to the queue.
If the processing and logging combined takes 120 ms, than I would say that by splitting these they should be able to work well under 100 ms. At some point it is not possible to create extra parallel processes of course, so I was wondering if this reasoning still holds.
As I take it from your reply, it will probably still work in my case?
06-13-2018 09:31 AM
The essential thing to think about is: which code is most critical to stay on pace? Which code can be allowed to lag a while and later catch up?
You mention a control algorithm, so I'll venture that neither your producer (10 fps images) nor your first consumer (PID algorithm) can afford to lag because you don't want to have large or variable latency in your control loop.
The other data destination is for logging and that *can* afford to lag. I'd probably follow the previous suggestions to slow down the logging loop so it retrieves larger chunks of data at a time and writes them much less often than 10 times per second. Maybe once every 2-10 sec? It's probably a good idea to put the logging loop in a separate subvi in case you want to experiment with assigning execution systems or priorities. Caution: LabVIEW's default and automatic method for splitting up work across threads & cores, etc. is pretty tough to beat. Still it's nice to have an option to try.
You could also consider putting the PID algorithm right in the image capture loop. I'd probably lean toward trying to do things that way as it more clearly expresses the immediacy of the relationship between the image data and the PID calculation.
-Kevin P
P.S. More replies came in while composing this. Whether you can squeeze 120 msec of work into 100 msec is a "it depends" situation. Since it sounds like your loop 2 code does the processing and logging in *series* (70 ms + 50 ms), there's a good chance that the 120 msec is driven more by *sequencing* than by total CPU load. In which case, it *will* be helpful to defer the 50 msec worth of logging over to another loop that can operate in parallel. But I'd still plan to iterate the logging loop much slower than 10 Hz. (Although OS-based file caching may make my reasoning entirely moot, it may make no difference at all whether you write 1 image frame at a time 10 times a second or 20 image frames at a time once every 2 seconds.)
06-13-2018 11:09 AM
Thanks for the reply, Kevin. You are spot on: the acquisition and first consumer (PID control) cannot lag. The data logging can afford to lag a little. The first consumer knows two states: Idle or Measuring. Only during the measurement the data is logged. One measurement is only several minutes with some time idle before the next sample comes. So even if the second consumer would lag during the measurement there is plenty of time to clear the buffer.
In all, it would be my best bet to split the data processing and data logging steps. With regards to the buffer, I have already attempted something similar by storing the output data in an array of 10 and writing this every 10 steps.
If I were to implement this in the producer/consumer structure, how would you suggest doing this? My idea was to have the second consumer on an infinite timeout, so that it only runs when data is available. The way I see that working if I fill up a buffer (for example an array of clusters) and queue that every 1-2 seconds. I would stick to my first consumer loop opening/closing the file, so I have to make sure that the file is not closed before the second consumer has cleared its buffer (if that still makes sense).
06-13-2018 12:40 PM
A separate logging loop with an infinite timeout on the queue is a reasonable option. Then you would be servicing each packet of data at 10 Hz, but you can still choose to build up 10 or so of these packets internally before writing them. But as I mentioned before, it might be the case that the disc caching built into the OS makes it not matter whether you write data 1 image at a time at 10 Hz or 10 images at a time at 1 Hz.
An infinite timeout queue is a pretty normal way to set up a consumer loop. It's also pretty normal to establish a sentinel value that means "shut the loop down gracefully".
-Kevin P
06-14-2018 04:44 AM
Thanks for the reply again.
I will implement the separated data processing and logging then. I like the idea of a sentinel value as well. I'll see how I add this.
As for timing, how would you solve this? If Consumer 1 fills the queue at its normal pace and Consumer 2 waits until X items are in the queue, it might happen that data is lost when Consumer 1 stops the measurement in between these intervals.
Furthermore, Consumer 1 is responsible for opening/closing the log files. Consumer 2 writes data to these files. If the data logger lags behind and the buffer fills, it could be that Consumer 2 is still clearing the queue while the file is already closed by Consumer 1.
Maybe the nicest solution is to send commands (open, write, close) to Consumer 2, and create my own buffer there. That way I do not build up packets in the queue. Not sure if that makes a difference?
06-14-2018 07:23 AM
1. But we're taking the approach where Consumer 2 *doesn't* wait. It has a dequeue with an infinite timeout and extracts data as soon as it can. The option to build up 10 or so data packets before writing would be something handled internally to the Consumer 2 loop.
2. Make your logging loop responsible for opening and closing the file. Then you can easily make sure all legit data is written before the file is closed. I regularly do this kind of thing using a Queued Message Handler pattern.
The key to it is that the queue carries a (typedef'ed) cluster which bundles a message with variant whose real type can be different from one message to the next. This provides for a great deal of flexibility. You can have an "Open File" message bundled with a file path datatype. And you can have a "New Data" message bundled with your image data. Consumer 2 just needs to convert that variant back to the correct real datatype.
-Kevin P