03-12-2020 02:50 AM
Hi,
I have been developing a cod to read 12 daq cards with 16 channel each. The rate is 100k and they all are synchronized. My approach is to read all of them in one producer loop and send them to a consumer loop using RT FIFO. The system is a RT PXI. My question is if I should split the producer to 2 loops with 6 cards each or should I keep this to one loop. The reason I'm wondering is I think the having two loops will minimize the cpu load on a single core by using a second core to do half the task. Did I got it right? Is there any negative aspect by doing so? BTW it is continues sampling.
best regards
Ahmed
03-12-2020 03:39 AM
That is quite a sizeable amount of data.
What are you doing with the data? Are you doing the same thing to every channel?
I would be keen to keep it in one loop to make your code easier to follow. Extra efficiency can be had by running code in parallel within that loop and the use of parallel loops. If you are using parallel loops be wary that some of the LabVIEW shipped functions are not re-entrant (For example the pixmap tools which I have recently fallen foul of)
Having said that if you aren't doing the same thing to all of your data there would be an argument to send it to different loops. I would still sooner see it in the same loop though with this sort of layout. Each subvi will run in parralel, as long as they don't contain the same non-entrant VI's.
Without knowing exactly what you are doing in your processing loop it is difficult to give you more advice i'm afraid.
Also, beware...asking a question like this you are likely to get multiple ways of doing the same thing all of which are right.
03-12-2020 07:36 AM
I would be more concerned about the consumer loop than the producer loop, unless you're doing a lot of data processing in the producer loop. Assuming that the data is not all processed the same I would consider consolidating data by processing type and sending it to separate processing loops. Processing in separate loops (I would likely make them subvis with relevant icons) allows the code to be more independent and can be highly readable (at the top level you can make an Enqueue subvi for each data type so that it's easily readable as to what data is going where). If all of the data are processed the same then other parallelization methods are appropriate.
03-12-2020 09:49 AM
I'd echo the concerns from johntrich that the consumer may be the bottleneck, and perhaps the more important place to look to optimize parallel processing.
I'm reminded of a thread where I first heard of the concept of "futures". <search, search, search...> Ah yes, here's a key message from that thread, but it'll be good to read it from the beginning.
The situation there *may be* a bit different in that it's concerned with a situation where the chunks of data shipped off for parallel processing must eventually be re-assembled in the original order (even if the parallel "workers" don't finish the work in that order).
-Kevin P
03-12-2020 09:54 AM
The producer loop reshape the 2D array coming from three daq read tasks, the first task is the master containing 16 channels, the two other tasks are slaves containing 5 and 6 cards respectively. All the reshaped arrays are replaced in a predefined array that are sent trough the RT FIFO to the consumer loop. The consumer loop sends the data to an another PC for processing (we use two network cards to send the data). This works fine and we havn't seen any problem with it yet. With other words there are no re-entrant VIs. I was just thinking that if I I split the producer and consumer loops in to two loops would be a better approach since I'm using two TCP connection to send the data.
03-12-2020 09:56 AM
There are no processing in the consumer loop, it just sends the data to a different PC using two TCP networks.
03-12-2020 11:01 AM
I'm not a deep expert on how LabVIEW breaks down block diagram code for different cores and threads, especially under RT. For raw data processing / transfer bandwidth, it makes *intuitive* sense to me that getting more physical cores involved should help. But splitting up the work into more threads on a given core probably doesn't.
I would guess it'd be worthwhile to make sure each consumer is on a distinct core as it receives data from 1 or more RTFIFOs and delivers it out a distinct physical network card.
It may or may not help to split up the producer. Reshape operations that need no resizing will often require negligible CPU. But I'm not sure how much CPU may be involved in your "array replace" operations.
I *do* know that under Windows, I would typically aim to wire data direct from DAQmx Read to Enqueue with no other operations or wire branching in between. When I do that, the data isn't ever really copied. LabVIEW recognizes the pattern, the Queue is granted (temporary) ownership of the array memory space & pointer, and the consumer's Dequeue operation can subsequently take instant ownership without data copying as well.
-Kevin P
03-12-2020 11:32 AM
@ahmalk71 wrote:
Hi,
I have been developing a cod to read 12 daq cards with 16 channel each. The rate is 100k and they all are synchronized. My approach is to read all of them in one producer loop and send them to a consumer loop using RT FIFO. The system is a RT PXI. My question is if I should split the producer to 2 loops with 6 cards each or should I keep this to one loop. The reason I'm wondering is I think the having two loops will minimize the cpu load on a single core by using a second core to do half the task. Did I got it right? Is there any negative aspect by doing so? BTW it is continues sampling.
best regards
Ahmed
I'll make some assumptions here:
This seems like a lot, but on PXI should be manageable.
Some suggestions and questions:
mcduff
03-12-2020 12:09 PM
Hi Mcduff,
We chose RT to be on the safe side and avoid BSOD, that is one reason, other reason is that we actually have two of these system pushing out 76,8 MB each through the network to the host Windows PC with 4 network card and hosts the NI diskarray. I know about the DAQ TDMS write function, but unfortunately RT do not support diskarrays. There are other reasons we have chosen this solution which I rather not talk about :-/. Don't take me wrong the system is working fine, it is just that we need to update some functionality to the system and I was thinking if splittings the loops is a good idea.
Best regards
Ahmed
03-12-2020 12:17 PM
This is what I believe as well. I know that you can specify which core to be used when using timed loop, but I think using two while loops in parallel also split the work on two cores if the processes are demanding. Array replace operation works on a predefined fixed size array so it do not allocate memory and therefore it should be less demanding "I think".