Reading 12 daq cards with 16 channel each

ahmalk71 · ‎03-13-2020

"nothing in that loop happens in parallel" that was new for me :-/. Anyhow I'm using while loop, the idea of using timed loop was to make sure that two cores are used and not letting LV to make that decision. I know the rule of don't touch if it works, but I'm a freak when it comes to optimization 🙂

mcduff · ‎03-13-2020

@ahmalk71 wrote:

The file format that our costumer use support only SGL, Changing the data to SGL in the beginning of the acquisition will save us unnecessarily scaling downstream, and since it hasn't been an issue I haven't given it much thought. I have included a snippet on how I want to change the code to two producer/consumer loop pairs. This is just a quick fix to show how I was thinking, I haven't put the sync details and other stuff, so please don't bother about it.

Regarding the bandwidth reduction using raw data instead, we use NI-4497 and they are using 24 bit ADC, correct me if I'm wrong, using I16 as output wouldn't that reduce the dynamic of the signal?

Best regards

Ahmed

Don't have LabVIEW here at the moment to check these out, but here are some observations:

I am highly skeptical, pretty sure, that your DAQ cards are synchronized. For identical cards in the chassis, you should be able to use Channel Expansion and combine all the cards into 1 task. (Not sure how you are syncing in the actual application. I know you said don't bother, so return the favor to me.)
You can get the data in I32 integers and then scale as previously suggested. Right now you are losing precision. I32 and SGL are 4 bytes, so no change in the data stream.
I don't believe the NI-4497 supports 100k sampling rate; it is a DSA device whose allowed sample rates are integer multiples of the main clock. It is probably sampling at 101.2kSa/s.
Can't tell right now, because I don't have LabVIEW here, but you don't need the to check the number of points before a DAQ read. The DAQ read will wait until the number of points you requested is in the buffer or a timeout is reached, whichever occurs first.
I would read the DAQ point every 100 ms, instead of 10 ms. At least on Windows, it is much more stable. There is overhead in reading the DAQ, read too often that overhead will ruin your acquisition.
You set the buffer for 100k points, or 1s of acquisition. If you have the memory available, I would set to 400000 or 4s of acquisition, once again in my experience on Windows a lot more stable.

If I was doing this project I would use Channel Expansion, then there would only be 1 task and 1 data stream. I would get the raw data in I32 form. I would then only need 1 FIFO or Queue to stream the data. The tasks you are doing are not CPU intensive, and on PXI, not even bandwidth intensive. Trying to have multiple loops for each task makes things harder.

mcduff

ahmalk71 · ‎03-13-2020

Trust me on this one, I'm using a 6674T time and sync card for synchronization and the PXI_Clk10 as the refClk.src in DAQmx Timing, so getting 100k is not a problem. The code is on a off-line PC and I'm not allowed to show any parts of the actual code.

I think you will lose precision any way since the scaling is in DBL and you need the final data in SGL.

If you run your system in continues mode, the number of samples decide the size of buffer and not the actual sample you want to read. Letting the DAGmx read wait for the actual number to read is CPU demanding, I think it polling the data. This was at least my experience.

The original code reads 100 ms block, In this update we want to prepare the system for a future upgrade where we need to read 10 ms Data. I had some problem with reading 10 ms data before and I didn't put so much effort to play with it, so I went for 100 ms. This is one of the reason I want to split the loops.

Edited: pooling to polling 😕

mcduff · ‎03-13-2020

@ahmalk71 wrote:

Trust me on this one, I'm using a 6674T time and sync card for synchronization and the PXI_Clk10 as the refClk.src in DAQmx Timing, so getting 100k is not a problem.

It is not getting ~100k, I just an unconvinced that 100k is the actual sample rate, I think it is 101.2kSa/s.

I have synced a NI6366 and NI4499 together using the refClk.src. These cards could not be combined using Channel Expansion, in addition, I had the sample rates for each card set to different values. The NI6366 had 8 channels at 2MSa/s and the NI4499 had 16 channels at 204.2kSa/s, for an effective raw data rate of ~45MByte/s, or roughly half your data rate. Using the methods I described earlier, on a Windows machine, I had no problem running continuously for days.(I could also display the data, do FFTs, and filter the data.)

I think it is better to combine in a single loop for pulling data from the cards, and another dequeuing loop. This has been my experience.

@ahmalk71 wrote:

If you run your system in continues mode, the number of samples decide the size of buffer and not the actual sample you want to read. Letting the DAGmx read wait for the actual number to read is CPU demanding, I think it pools the data. This was at least my experience.

Not true. Look at the help, you are manually setting the buffer. OVERIDES the automatic input buffer ..

You can decide how many points to read with each Read VI. I usually use a DAQmx Event, "N samples in buffer", to trigger an event case to read the data. No polling here. (Look at the example Continuous Input with Events.) You are polling every 5 ms in your code, on top any polling the Read VI does. Just guessing here, but I think before reading the data the Read VI will query the number of points, it does not know that you just made that query before.

I don't have a lot of experience with RT, but loops in RT can have priority and starve other loops. Once again, your task is not CPU bound, sometimes trying to parallelize your code does not give you much benefit. In your case, for the pseudo-code shown, I don't think you gain anything from splitting into more loops, only harder to debug, change in the future, etc.

mcduff

ahmalk71 · ‎03-13-2020

Sorry my misstake, I just added this one to ensure that I'll get high buffer size, since I used the 1000 samples as input in the daqmx timing. The original code dose not have the buffer set. I Use 100k as input in the daqmx timing and the number of sample to read is constant somewhere else in code.

Well the rate we have got and verified is 100000,000000021.

Well I'll check the daqmx event, never tried it before. Thanks for mentioning that.

ahmalk71 · ‎03-13-2020

I have looked into this doc https://www.ni.com/pdf/manuals/371235h.pdf. On page 2-21 it is written that you can get different timing than the one on-board.

mcduff · ‎03-13-2020

@ahmalk71 wrote:

Sorry my misstake, I just added this one to ensure that I'll get high buffer size, since I used the 1000 samples as input in the daqmx timing. The original code dose not have the buffer set. I Use 100k as input in the daqmx timing and the number of sample to read is constant somewhere else in code.

Well the rate we have got and verified is 100000,000000021.

Well I'll check the daqmx event, never tried it before. Thanks for mentioning that.

I never used a 6674T time and sync card, I guess that is overriding the internal sample clock.

mcduff · ‎03-13-2020

@ahmalk71 wrote:

I have looked into this doc https://www.ni.com/pdf/manuals/371235h.pdf. On page 2-21 it is written that you can get different timing than the one on-board.

Look at pages 2-27 to 2-29; that is what I was originally thinking of. I did not know the 44xx could be coerced, your rate is shown on 2-29.

mcduff

Kevin_Price · ‎03-13-2020

Rejoining the conversation, a lot's been going on since I last checked in.

1. My typical bias in Windows is to make the DAQ loop as lean as possible, because the buffer overflow error that could happen if it bogs down is irreversible. On the other hand, if a consumer loop bogs down, it just makes a backlog build up in the queue.

This thinking may not be fully appropriate for RT where the RTFIFO has a fixed size. Thus, a consumer loop that bogs down has more drastic consequences, possibly leading to data loss. (If I recall correctly -- it's been a *looooong* time since I last used RTFIFO's under RT.)

So I'm now thinking it may be reasonable for your DAQ loop and your consumer loop to each have some but not all the demanding stuff (data copying, memory space, CPU usage)

Your DAQ loop has the DBL->SGL conversion and the data copy into the fixed size array. This might be further copied when pushed into the RTFIFO, I'm not sure. Your consumer loop will presumably just wire from the RTFIFO "dequeue" output to the network stream "write" input.

It seems to me that you'd balance the load better by doing the DBL->SGL conversion in the consumer loop. Your fixed size producer loop accumulator array would use 2x the memory this way, but it never grows so should be ok. (Though the question is begged: why the big mismatch between high-end 24-bit DAQ devices on one hand and a file format that only supports SGL precision on the other?).

Reading raw 32-bit integers instead of DBLs remains an option, but based on recent info in the thread I'd now look to do the scaling to SGL in the consumer loop.

2. The suggestion from mcduff about channel expansion sounds like an excellent idea to me, provided your devices support it.

3. I only ever once worked a project that used a Time & Sync device similar to the 6674T. I think I remember it being able to generate an almost infinitely variable but high frequency clock via DDS. I could then route this clock out to a PXI trigger line (or maybe some more special signal line on the backplane?) from which the DSA devices could use it as a timebase for deriving their own sample clocks, according to rules like those in the linked manual.

So it makes sense to me that the right choice of DDS clock frequency would allow for a pretty precise 100 kHz sample clock frequency.

4. The posted snippet has quite a few unwired error ins/outs. The real code doesn't ignore errors, right?

-Kevin P

ALERT! LabVIEW's subscription-only policy came to an end (finally!). Unfortunately, pricing favors the captured and committed over new adopters -- so tread carefully.

mcduff · ‎03-13-2020

I found that Channel Expansion works for almost any combination of cDAQ devices, for PXI, much more limited. The OP has identical cards which should work for Channel Expansion.

mcduff

LabVIEW

Reading 12 daq cards with 16 channel each

Re: Reading 12 daq cards with 16 channel each

Re: Reading 12 daq cards with 16 channel each

Re: Reading 12 daq cards with 16 channel each

Re: Reading 12 daq cards with 16 channel each

Re: Reading 12 daq cards with 16 channel each

Re: Reading 12 daq cards with 16 channel each

Re: Reading 12 daq cards with 16 channel each

Re: Reading 12 daq cards with 16 channel each

Re: Reading 12 daq cards with 16 channel each

Re: Reading 12 daq cards with 16 channel each