03-13-2020 06:42 AM
"nothing in that loop happens in parallel" that was new for me :-/. Anyhow I'm using while loop, the idea of using timed loop was to make sure that two cores are used and not letting LV to make that decision. I know the rule of don't touch if it works, but I'm a freak when it comes to optimization 🙂
03-13-2020 08:12 AM
@ahmalk71 wrote:
The file format that our costumer use support only SGL, Changing the data to SGL in the beginning of the acquisition will save us unnecessarily scaling downstream, and since it hasn't been an issue I haven't given it much thought. I have included a snippet on how I want to change the code to two producer/consumer loop pairs. This is just a quick fix to show how I was thinking, I haven't put the sync details and other stuff, so please don't bother about it.
Regarding the bandwidth reduction using raw data instead, we use NI-4497 and they are using 24 bit ADC, correct me if I'm wrong, using I16 as output wouldn't that reduce the dynamic of the signal?
Best regards
Ahmed
Don't have LabVIEW here at the moment to check these out, but here are some observations:
If I was doing this project I would use Channel Expansion, then there would only be 1 task and 1 data stream. I would get the raw data in I32 form. I would then only need 1 FIFO or Queue to stream the data. The tasks you are doing are not CPU intensive, and on PXI, not even bandwidth intensive. Trying to have multiple loops for each task makes things harder.
mcduff
03-13-2020 08:48 AM - edited 03-13-2020 09:04 AM
Trust me on this one, I'm using a 6674T time and sync card for synchronization and the PXI_Clk10 as the refClk.src in DAQmx Timing, so getting 100k is not a problem. The code is on a off-line PC and I'm not allowed to show any parts of the actual code.
I think you will lose precision any way since the scaling is in DBL and you need the final data in SGL.
If you run your system in continues mode, the number of samples decide the size of buffer and not the actual sample you want to read. Letting the DAGmx read wait for the actual number to read is CPU demanding, I think it polling the data. This was at least my experience.
The original code reads 100 ms block, In this update we want to prepare the system for a future upgrade where we need to read 10 ms Data. I had some problem with reading 10 ms data before and I didn't put so much effort to play with it, so I went for 100 ms. This is one of the reason I want to split the loops.
Edited: pooling to polling 😕
03-13-2020 09:31 AM
@ahmalk71 wrote:
Trust me on this one, I'm using a 6674T time and sync card for synchronization and the PXI_Clk10 as the refClk.src in DAQmx Timing, so getting 100k is not a problem.
It is not getting ~100k, I just an unconvinced that 100k is the actual sample rate, I think it is 101.2kSa/s.
I have synced a NI6366 and NI4499 together using the refClk.src. These cards could not be combined using Channel Expansion, in addition, I had the sample rates for each card set to different values. The NI6366 had 8 channels at 2MSa/s and the NI4499 had 16 channels at 204.2kSa/s, for an effective raw data rate of ~45MByte/s, or roughly half your data rate. Using the methods I described earlier, on a Windows machine, I had no problem running continuously for days.(I could also display the data, do FFTs, and filter the data.)
I think it is better to combine in a single loop for pulling data from the cards, and another dequeuing loop. This has been my experience.
@ahmalk71 wrote:
If you run your system in continues mode, the number of samples decide the size of buffer and not the actual sample you want to read. Letting the DAGmx read wait for the actual number to read is CPU demanding, I think it pools the data. This was at least my experience.
Not true. Look at the help, you are manually setting the buffer. OVERIDES the automatic input buffer ..
You can decide how many points to read with each Read VI. I usually use a DAQmx Event, "N samples in buffer", to trigger an event case to read the data. No polling here. (Look at the example Continuous Input with Events.) You are polling every 5 ms in your code, on top any polling the Read VI does. Just guessing here, but I think before reading the data the Read VI will query the number of points, it does not know that you just made that query before.
I don't have a lot of experience with RT, but loops in RT can have priority and starve other loops. Once again, your task is not CPU bound, sometimes trying to parallelize your code does not give you much benefit. In your case, for the pseudo-code shown, I don't think you gain anything from splitting into more loops, only harder to debug, change in the future, etc.
mcduff
03-13-2020 09:47 AM
Sorry my misstake, I just added this one to ensure that I'll get high buffer size, since I used the 1000 samples as input in the daqmx timing. The original code dose not have the buffer set. I Use 100k as input in the daqmx timing and the number of sample to read is constant somewhere else in code.
Well the rate we have got and verified is 100000,000000021.
Well I'll check the daqmx event, never tried it before. Thanks for mentioning that.
03-13-2020 09:52 AM
I have looked into this doc https://www.ni.com/pdf/manuals/371235h.pdf. On page 2-21 it is written that you can get different timing than the one on-board.
03-13-2020 09:57 AM
@ahmalk71 wrote:
Sorry my misstake, I just added this one to ensure that I'll get high buffer size, since I used the 1000 samples as input in the daqmx timing. The original code dose not have the buffer set. I Use 100k as input in the daqmx timing and the number of sample to read is constant somewhere else in code.
Well the rate we have got and verified is 100000,000000021.
Well I'll check the daqmx event, never tried it before. Thanks for mentioning that.
I never used a 6674T time and sync card, I guess that is overriding the internal sample clock.
03-13-2020 10:03 AM
@ahmalk71 wrote:
I have looked into this doc https://www.ni.com/pdf/manuals/371235h.pdf. On page 2-21 it is written that you can get different timing than the one on-board.
Look at pages 2-27 to 2-29; that is what I was originally thinking of. I did not know the 44xx could be coerced, your rate is shown on 2-29.
mcduff
03-13-2020 09:35 PM
Rejoining the conversation, a lot's been going on since I last checked in.
1. My typical bias in Windows is to make the DAQ loop as lean as possible, because the buffer overflow error that could happen if it bogs down is irreversible. On the other hand, if a consumer loop bogs down, it just makes a backlog build up in the queue.
This thinking may not be fully appropriate for RT where the RTFIFO has a fixed size. Thus, a consumer loop that bogs down has more drastic consequences, possibly leading to data loss. (If I recall correctly -- it's been a *looooong* time since I last used RTFIFO's under RT.)
So I'm now thinking it may be reasonable for your DAQ loop and your consumer loop to each have some but not all the demanding stuff (data copying, memory space, CPU usage)
Your DAQ loop has the DBL->SGL conversion and the data copy into the fixed size array. This might be further copied when pushed into the RTFIFO, I'm not sure. Your consumer loop will presumably just wire from the RTFIFO "dequeue" output to the network stream "write" input.
It seems to me that you'd balance the load better by doing the DBL->SGL conversion in the consumer loop. Your fixed size producer loop accumulator array would use 2x the memory this way, but it never grows so should be ok. (Though the question is begged: why the big mismatch between high-end 24-bit DAQ devices on one hand and a file format that only supports SGL precision on the other?).
Reading raw 32-bit integers instead of DBLs remains an option, but based on recent info in the thread I'd now look to do the scaling to SGL in the consumer loop.
2. The suggestion from mcduff about channel expansion sounds like an excellent idea to me, provided your devices support it.
3. I only ever once worked a project that used a Time & Sync device similar to the 6674T. I think I remember it being able to generate an almost infinitely variable but high frequency clock via DDS. I could then route this clock out to a PXI trigger line (or maybe some more special signal line on the backplane?) from which the DSA devices could use it as a timebase for deriving their own sample clocks, according to rules like those in the linked manual.
So it makes sense to me that the right choice of DDS clock frequency would allow for a pretty precise 100 kHz sample clock frequency.
4. The posted snippet has quite a few unwired error ins/outs. The real code doesn't ignore errors, right?
-Kevin P
03-13-2020 10:13 PM
I found that Channel Expansion works for almost any combination of cDAQ devices, for PXI, much more limited. The OP has identical cards which should work for Channel Expansion.
mcduff