04-06-2017 10:16 PM
I'm running a very time sensitive application with the NI PCI-6110 with BNC-2110. I noticed that the scan time when acquiring data was approximately 3 times slower than it should be and started probing around.
I attached a VI that simply initializes a DAQ input and runs a loop while the device reads finite data. I also included a small indicator that keeps track of how long it takes the loop to execute. I've tried this with real hardware and a simulated 6110 DAQ device you can add in MAX with the same result.
Shouldn't the overhead in this case be under 1 us, or am I missing something? Why is the overhead an almost exact multiple of how long it takes to fill the read buffer?
04-07-2017 09:32 AM
Blunt remark: you really can't justify any of your conclusions based on the code you attached. Quick survey over notable issues:
1. No sequencing to know when "Tick Count" executes relative to the code you're trying to time.
2. Inclusion of signal processing (spectral measurement) and GUI updates (the graph) within the data acq loop you're characterizing
3. Use of implicit auto-start on the DAQmx Read adds extra overhead. Better to start explicitly. Even better to call DAQmx Control Task with "commit" action before the loop.
4. You haven't clearly defined what you want to time. I'll assume it's the time from starting the task until receiving all data. I'll *NOT* include any of the overhead involved in stopping & restarting the task, or analyzing and displaying the data.
5. Not sure what you define as "overhead" but running under Windows, 1 usec is *not* a reasonable amount of overhead to expect for calls into a hardware driver.
Here's a modified version of your vi I made and quickly tried on a slower sampling X-series board. I typically saw less then 1 msec "overhead" when collecting nominally either 5 or 50 msec of data. I ran at 500 kHz sample rate, 2500 or 25000 samples, 1000 or 100 loops.
-Kevin P
04-07-2017 03:12 PM
I made a mistake in the VI posted, there was supposed to be a "DAQmx Start Task" at the beginning of the loop (though that didn't make a difference in the overhead).
1. I am trying to time the execution of the loop itself, not a specific section of code inside the loop. I was assuming the execution time would have to be very close to the time it takes to read samples at a specific rate but in reality found that it was a multiple of the read buffer time i.e. 15 ms @ 5MS/s for 25k samples when it should be close to 5 ms. This multiple would increase when adding channels, so 15 ms -> 20 ms if I declare ai1:2 for example.
2. The VI I'm using manipulates the data read from the buffer so I included a simple chart and psd though they do not add to the overhead of the loop significantly. I am using gui updates (vision displays) in the signal aq loop so this was to approximate what I was doing.
3. Calling the control task with "commit" now has the time per loop down to 5 +- 1 ms, thanks for this. I knew I was missing something important that added quite a bit of overhead. I will need to look further into why the overhead was so large otherwise.
4. Again, I am timing everything, including stopping/starting the task (because they are based on a start trigger connected to the output signal generator).
5. I define overhead as the time deviated from how long it takes to read the samples at a certain rate and read the buffer to the time it takes to execute a single loop, i,e, 5 ms to take 25k samples at 5MS/s however the time per loop is 15 ms -> 10 ms overhead.
I hope this makes things clearer.
I tried using this setup together with the signal generator but unfortunately the overhead did not improve much. Is there anything specific that should be done if the task for read buffer is set to start with a trigger from the signal generator?
04-07-2017 05:41 PM
1. The trouble with characterizing the *entire* loop is that it includes some things that probably do not *need* to be in the loop.
A. you may not need to stop and start if you can setup hardware retriggering. With your board, you may need to do it in a roundabout way though.
B. the processing and display could be moved into another independent loop using a queue. If the *only* destination for the DAQ data from AI Read is straight to an Enqueue function, it'll execute lightning fast. That'll basically transfer ownership of the data to the queue without needing to make a copy of it. The other loop will take ownership back when it dequeues in order to process and display.
When you do this, the CPU still does the same total work, but your DAQ loop doesn't have to wait for processing and display to finish before it can loop back around on the next iteration. While waiting (~5 msec) for the next chunk of AI data, the other loop can be processing and displaying the current one.
2. See #1.
3. For more info, search for "DAQmx state model".
4. Let's come back to this below. This seems like a key requirement driving your concerns with timing, and I don't understand it clearly.
5. Ok got it, but subject to further discussion of #4, starting below.
I saw no sign of hardware triggering in your AI task. So how *exactly* do you react to the "start trigger connected to the output signal generator"?
Also, why *exactly* is the overhead an issue for your app? What would be the absolute ideal relationship between the external signal generator, the "start trigger", and the chunks of AI data?
Just want to be sure that the specific overhead I'm looking to minimize is actually relevant, and that I'm not missing other stuff that's also important.
-Kevin P
04-07-2017 11:38 PM
1.
A. This may need to be done as waiting/stopping/starting the task is probably slowing things down quie a bit. I think using continuous mode will speed things up quite a bit but need to know how to setup the triggers so that the DAW Read starts whenever the DAW write begins, and DAQ read stops whenever it has eached a specific number of samples. Would using Start Trigger and Reference Trigger work in this case?
B. I don't think the data processing to the Vision Image Display (IMAQ) adds any significant overhead. This is now up to how fast the CPU is but would be interesting to implement a queue so that data processing is thrown to another loop.
The application is similar to laser con-focal microscopy. I am using two galvanometers, one for the x direction, and one for the y direction. I run the galvanometers by the output of the signal generators. Upon starting the signal generators, the DAQ "in" would start reading the data via Start Trigger. Two detectors are being used so there are two channel in.
I have two different DAQ outputs, one is on demand for the y-direction galvanometer, meaning I declare the output without a sample clock and send voltage single points, and the other is declared using the sample clock in waveform mode.
I attached a simplification of the (testDAQ2_simple) VI on how I am currently setting everything up. You can replace WAV GEN with just a waveform signal generator and the on demand out is using a single point generator (though in this case I set it all to zero for simplification).
The overhead is important to the end user as they do not want to wait over 3x the amount of time that the scan is supposed to finish. The user needs to be able to see changes per line.
The ideal relationship between the output/input would be synchronization via triggering. The input will remain active when the output is active. I think the CPU is fast enough that there's no noticeable overhead on data processing with the chunks of data.
I attached another (reshapeRMS_setLINE) VI showing roughly how the data is processed once read from the buffer.
04-10-2017 11:05 AM
1.
A. I suspect there's more to be gained by sharing sample clock signals between AO and AI tasks than from triggering.
Now, a few more questions to hone in on things. I'm trying to make sure that none of the hurdles to get over have been inadvertently self-imposed.
6. You seem to be looking for 5 MHz sampling from AI, segmented into 25000 sample (5 msec) chunks. What's the rationale for the choice of sample rate and chunk size?
7. It appears that AO is also defined as a recurring 5 msec pattern, though rate and # samples isn't clear. Hopefully it isn't 25000 samples at 5 MHz since the board max is only 4 MHz for a single channel.
And if we can move forward by combining both AO signals into a single task, that max will decrease to 2.5 MHz.
8. The 25000 samples-per-line will apparently get processed into a line of pixels in an image. What's the image resolution? Is it 25000 pixels wide? If not, what purpose is served by the extra samples?
Also, the board has only 12 bits (4096 values) of theoretical input resolution, and the spec sheet calls out only 11 bits of "effective resolution". 25000 samples will be quantized into 2048 or 4096 distinct values.
9. Similar question on the AO side -- can the galvo's handle the resolution and sweep speed you're using? How do the discrete LSB jumps in the sawtooth pattern compare to your signal noise? I.E., is all that pattern resolution helpful?
General thoughts on what I'd be wanting to do:
- sync AO and AI via sample clock, run the tasks at the same rate so that sample #'s correspond to one another. I tend to prefer syncing tasks via sample clock because it seems more direct and it extends more cleanly to apps where the tasks are on different boards.
- consider adding the 2nd AO channel to the buffered AO task, and making the tasks run in continuous mode rather than a sequence of finite runs that require stops and starts. This will eliminate overhead, but may require writing continuously to the AO buffer. Depends on speed and buffer size.
If running in continuous mode, you'll need to wire the # samples input to DAQmx Read to pull out data in 1-line chunks. You'll know the # samples per line from your waveform generation subvi.
- if you stick with finite tasks, don't forget to commit the tasks before entering the loop to speed up your subsequent stop / restarts.
- however fast the IMAQ display may be, I remain confident that data processing and gui indicator updates in the data acq loop cannot possibly be *helping* to speed up your loop execution times. If you stick with finite tasks, this is one more thing you have some control over.
-Kevin P
04-10-2017 04:41 PM
1. I think the best bet is to work with continuous mode (though correct me if this can be done otherwise) as the main culprit in introducing overhead is restarting the task inside a loop (i attached a mod version of your mod including a case for AI;read and AO;write to show this case). The optimal situation would have both AI1-2 retrigger on AO1 (the x galvo) and AO1 retrigger on AO2 (the y galvo).
6. 5 MS/s is the fastest the board can handle and this is required in our application. 5 msec is just a test case, it can be any number i.e. 10 msec, 500 msec, 1 msec, etc.
7. The sampling rate for AO is at 4 MS/s and samples is defined by msec input from the user i.e. 20k samples for 5 ms
8. The image resolution will depend on the user. The samples will be reshaped into a matrix where RMS will be performed on. In short, each pixel will represent V RMS (Volts RMS). For example, if we want 500 pixels per line and take 25k samples, that would be 500 pixels with 50 samples each. An RMS will be performed on each row and then the 500 data points (pixels) will be written into a line.
9. Yes, the galvo can handle the resolution and sweep speed.
There would need to be some way to stop the AI once the galvo moves on to the next line or the image will appear skewed. This is the prime reason why I am running in finite mode, stop the process when the y galvo moves, then start again.
04-11-2017 07:14 AM
I don't know your field of study so it's hard to know what's going to work out best. So if you bear with me, here are some further thoughts:
- there's a version of approaching this where both AO are part of the same continuous hw-timed task. It'd be nice to predefine the whole AO buffer, but it may be too big. At 5 MHz sampling, you'll need to be pretty efficient with data streaming. Might want to consider the built-in DAQmx TDMS logging, then pull data from the file at the end of the run.
I imagine the X galvo would sweep back and forth left to right then right to left using a triangle waveform. In such a mode, I'd read 1 line worth of AI samples at a time and perform Reverse 1D Array on alternate lines to make the data line up for the image.
- another version is to continue iterating 1 line at a time with a software-controlled time gap between lines. In this version, the Y galvo AO can remain as an on-demand task. You could then drop the AI rate to match AO so they can share a sample clock.
AO and AI could be configured as continuous, but you'd control them by using a self-generated finite pulse train as a shared sample clock. For each raster line, you'd use a software call to update Y galvo AO then a software call to start the finite pulse train.
-Kevin P
04-11-2017 06:47 PM
Today I ran the system in continuous mode and the overhead was practically non-existent. Unfortunately, the image was completely scrambled as there was no synchronization between the on-demand AO and continuous AO/AI. I did not go with a triangle waveform, rather a seasaw, but just wanted to see the overhead on the production machine as it's different via simulation on my programming machine.
Processing the data into their correct lines is trivial, the problem is that in continuous mode the AI keeps taking in data when the y galvo moves (and in the above case, when the x galvo moves back in a nonlinear fashion via seasaw waveform). The original reason why I went with trigger finite mode was to solve this problem by pausing the AI when the galvo(s) return to their intended position, then start the AI along with the galvo(s). The disadvantage of this is the unaccounted for overhead when restarting the task.
The requirement is that AI and AO(x galvo) must be continuous while AO(y galvo) remain on demand. They must be controlled via synchronization or the image will be scrambled. The solutions you came to mind is also exactly what I was thinking as well.
Is there a way to pause continuous AI with a software-controlled time gap while the galvo(s) return to their position? If I use a triangle waveform, I would have to split it into two parts as it would also have to pause while the y galvo moves on to the next line.
Summary of process;
1. Y galvo moves (on-demand AO)
2. X galvo (continuous AO) and triggered (by x galvo) continuous AI starts read -> first half of triangle waveform
3. X galvo and AI pause after reading a certain number of samples
4. Y galvo moves to next line
5. X galvo and AI continue in sync -> 2nd half of triangle waveform
Is there a way to do this software-controlled or is the best bet using retriggering with finite pulse train as a shared clock source for AO/AI?
04-12-2017 07:00 AM
So a fraction of a msec (i.e., 1 sample interval) between raster lines is not enough, the ~10 msec the thread started from is too much. The earlier suggestion in the thread about committing the finite task to speed up subsequent stop/start sequences seemed to bring the delay down to 1-2 msec. I've been supposing that was also too much since the the thread has continued.
I think the bottom line is that for software controlled timing under Windows, you won't get consistent and reliable improvement on that 1-2 msec. The 2nd idea in my previous msg might be slightly faster on average, but probably not more than a factor of 2. If you need a consistent and reliable sub-msec time gap, it'll have to be based on hw-timing from the tasks.
One example, supposing 5 msec sweep, 0.1 msec delay between lines: put both AO channels in one task. X data will ramp up for 5 msec, hold steady for 0.1, ramp down for 5 msec, hold steady for 0.1. Y data will hold steady for 5 msec, increment and hold for 5.1, increment and hold for 5.1, etc.
You'll then alternate your AI Reads. Read 5 msec worth of data and keep it. Read 0.1 msec of data and discard. Read 5 msec, reverse it and keep it. Read 0.1 msec and discard.
This kind of approach lets you tune that "0.1" msec Y galvo time to be whatever amount you actually need to adjust position. There's a small variation on this where the X galvo remains as a sawtooth and you don't reverse alternate kept chunk of AI data.
-Kevin P