Reading 12 daq cards with 16 channel each

ahmalk71 · ‎03-12-2020

Hi,

I have been developing a cod to read 12 daq cards with 16 channel each. The rate is 100k and they all are synchronized. My approach is to read all of them in one producer loop and send them to a consumer loop using RT FIFO. The system is a RT PXI. My question is if I should split the producer to 2 loops with 6 cards each or should I keep this to one loop. The reason I'm wondering is I think the having two loops will minimize the cpu load on a single core by using a second core to do half the task. Did I got it right? Is there any negative aspect by doing so? BTW it is continues sampling.

best regards

Ahmed

Worle · ‎03-12-2020

That is quite a sizeable amount of data.

What are you doing with the data? Are you doing the same thing to every channel?

I would be keen to keep it in one loop to make your code easier to follow. Extra efficiency can be had by running code in parallel within that loop and the use of parallel loops. If you are using parallel loops be wary that some of the LabVIEW shipped functions are not re-entrant (For example the pixmap tools which I have recently fallen foul of)

Having said that if you aren't doing the same thing to all of your data there would be an argument to send it to different loops. I would still sooner see it in the same loop though with this sort of layout. Each subvi will run in parralel, as long as they don't contain the same non-entrant VI's.

Without knowing exactly what you are doing in your processing loop it is difficult to give you more advice i'm afraid.

Also, beware...asking a question like this you are likely to get multiple ways of doing the same thing all of which are right.

johntrich1971 · ‎03-12-2020

I would be more concerned about the consumer loop than the producer loop, unless you're doing a lot of data processing in the producer loop. Assuming that the data is not all processed the same I would consider consolidating data by processing type and sending it to separate processing loops. Processing in separate loops (I would likely make them subvis with relevant icons) allows the code to be more independent and can be highly readable (at the top level you can make an Enqueue subvi for each data type so that it's easily readable as to what data is going where). If all of the data are processed the same then other parallelization methods are appropriate.

Kevin_Price · ‎03-12-2020

I'd echo the concerns from johntrich that the consumer may be the bottleneck, and perhaps the more important place to look to optimize parallel processing.

I'm reminded of a thread where I first heard of the concept of "futures". <search, search, search...> Ah yes, here's a key message from that thread, but it'll be good to read it from the beginning.

The situation there *may be* a bit different in that it's concerned with a situation where the chunks of data shipped off for parallel processing must eventually be re-assembled in the original order (even if the parallel "workers" don't finish the work in that order).

-Kevin P

ALERT! LabVIEW's subscription-only policy came to an end (finally!). Unfortunately, pricing favors the captured and committed over new adopters -- so tread carefully.

ahmalk71 · ‎03-12-2020

The producer loop reshape the 2D array coming from three daq read tasks, the first task is the master containing 16 channels, the two other tasks are slaves containing 5 and 6 cards respectively. All the reshaped arrays are replaced in a predefined array that are sent trough the RT FIFO to the consumer loop. The consumer loop sends the data to an another PC for processing (we use two network cards to send the data). This works fine and we havn't seen any problem with it yet. With other words there are no re-entrant VIs. I was just thinking that if I I split the producer and consumer loops in to two loops would be a better approach since I'm using two TCP connection to send the data.

ahmalk71 · ‎03-12-2020

There are no processing in the consumer loop, it just sends the data to a different PC using two TCP networks.

Kevin_Price · ‎03-12-2020

I'm not a deep expert on how LabVIEW breaks down block diagram code for different cores and threads, especially under RT. For raw data processing / transfer bandwidth, it makes *intuitive* sense to me that getting more physical cores involved should help. But splitting up the work into more threads on a given core probably doesn't.

I would guess it'd be worthwhile to make sure each consumer is on a distinct core as it receives data from 1 or more RTFIFOs and delivers it out a distinct physical network card.

It may or may not help to split up the producer. Reshape operations that need no resizing will often require negligible CPU. But I'm not sure how much CPU may be involved in your "array replace" operations.

I *do* know that under Windows, I would typically aim to wire data direct from DAQmx Read to Enqueue with no other operations or wire branching in between. When I do that, the data isn't ever really copied. LabVIEW recognizes the pattern, the Queue is granted (temporary) ownership of the array memory space & pointer, and the consumer's Dequeue operation can subsequently take instant ownership without data copying as well.

-Kevin P

ALERT! LabVIEW's subscription-only policy came to an end (finally!). Unfortunately, pricing favors the captured and committed over new adopters -- so tread carefully.

mcduff · ‎03-12-2020

@ahmalk71 wrote:

Hi,

I have been developing a cod to read 12 daq cards with 16 channel each. The rate is 100k and they all are synchronized. My approach is to read all of them in one producer loop and send them to a consumer loop using RT FIFO. The system is a RT PXI. My question is if I should split the producer to 2 loops with 6 cards each or should I keep this to one loop. The reason I'm wondering is I think the having two loops will minimize the cpu load on a single core by using a second core to do half the task. Did I got it right? Is there any negative aspect by doing so? BTW it is continues sampling.

best regards

Ahmed

I'll make some assumptions here:

Raw data width 16 bits, data stream 12 cards 16 channels each 100kSa/s - 38.4 MBytes/s data stream
Raw data width 32 bits (24 bit A/D), data stream 12 cards 16 channels each100kSa/s - 76.8 MBytes/s data stream

This seems like a lot, but on PXI should be manageable.

Some suggestions and questions:

Do you need RT or can you use Windows? If you don't need RT, then you can use DAQmx and the built in DAQmx TDMS logging VIs, no producer consumer needed. These logging VIs are highly efficient. I have logged at 32MBy/s data rate over USB using them.
If you do need RT, does your controller support DAQmx for Real Time. This is new, I haven't used it, but basically implements the DAQmx API on RT. Might be more efficient to use, than a home brew solution, especially if you can use the DAQmx TDMS logging VIs.
When writing to the disk ALWAYS write in an exact multiple of the disk sector size, way more efficient.
If possible, attach a RAID array to the PXI chassis. NI has controllers for these. They are expensive, but do make a big difference in write speeds.

mcduff

ahmalk71 · ‎03-12-2020

Hi Mcduff,

We chose RT to be on the safe side and avoid BSOD, that is one reason, other reason is that we actually have two of these system pushing out 76,8 MB each through the network to the host Windows PC with 4 network card and hosts the NI diskarray. I know about the DAQ TDMS write function, but unfortunately RT do not support diskarrays. There are other reasons we have chosen this solution which I rather not talk about :-/. Don't take me wrong the system is working fine, it is just that we need to update some functionality to the system and I was thinking if splittings the loops is a good idea.

Best regards

Ahmed

ahmalk71 · ‎03-12-2020

This is what I believe as well. I know that you can specify which core to be used when using timed loop, but I think using two while loops in parallel also split the work on two cores if the processes are demanding. Array replace operation works on a predefined fixed size array so it do not allocate memory and therefore it should be less demanding "I think".

LabVIEW

Reading 12 daq cards with 16 channel each

Reading 12 daq cards with 16 channel each

Re: Reading 12 daq cards with 16 channel each

Re: Reading 12 daq cards with 16 channel each

Re: Reading 12 daq cards with 16 channel each

Re: Reading 12 daq cards with 16 channel each

Re: Reading 12 daq cards with 16 channel each

Re: Reading 12 daq cards with 16 channel each

Re: Reading 12 daq cards with 16 channel each

Re: Reading 12 daq cards with 16 channel each

Re: Reading 12 daq cards with 16 channel each