07-23-2011 02:55 PM
Hello all,
In my application I'm trying to acquire 32 continuous analog channels at 32 kS/chan/sec on a PCI-6254. In the final system it will be 4 of these PCI cards bridged by an RTSI cable for a total of 128 channels. The analog in buffers need to be timestamped relative to other signals that aren't coming through NI cards (eg - an older recording system is sampling the same data at the same rate, and we are taking in video frames on another computer). So we're keeping track of the global time using a 10kHz square wave generated by one of the old components, and every part of the system independently counts cycles of that same square wave to timestamp its respective buffers. For the NI card part, I'm collecting and processing one buffer every millisecend, so each buffer has 32 samples per channel. Those short buffers are a pretty critical part of the application (trying to do realtime).
The strategy with the M-Series card was to run the externally generated clock signal line to a 32-bit counter on the card. Then register a callback with EveryNSamplesEvent for my analog in task - within the callback, the first thing I do is grab the count value from the counter, then process my data. At this point I noticed that about once per 100 buffers, the buffer timestamp is late by almost exactly 1ms (one buffer in length). And the following buffer's timestamp will follow by only a tiny fraction of a millisecond. (eg timestamps: buffer1: 1.0ms | buffer2: 2.0ms | buffer3: 3.95ms | buffer4: 4.0ms)
Here's the jitter, at 3 zoom levels. I hope the formatting comes out ok.
I could think of 4 possible reasons for this
1) There is jitter in the actual clock signal
2) My processing of the data buffer within EveryNCallback is occasionally slower than 1kHz, and this blocks the execution of the subsequent call to that callback.
3) Reading the counter value is slow - it occasionally blocks the subsequent call to EveryNCallback, or the call has to wait while the PCI bus is busy
4) The jitter comes from the driver/API. Sometimes EveryNCallback is called late, perhaps due to background things going on in the kernel, or perhaps because the PCI bus is clogged up and waiting.
To try to troubleshoot I did various things to remove possible sources of lag. I moved all of my data processing out to a separate thread; and eventually simply commented out all the data processing steps. I also removed the call to DAQmxReadBinaryI16. And I replaced DAQmxReadCounterScalarU32 with a call to getTimeOfDay. None of these changes had any effect on the jitter. Finally I wrote a small c program (attached), that simply sets up and starts an analog-in task with variable numbers of channels and sampling rates, then writes the system time to a file every time EveryNSamplesCallback gets called. The pattern of lags is exactly the same in this test output as it is in my full-scale program.
I tried turning the number of channels down from 32 to 1, and the sampling rate down from 32000 samp/sec/chan to 16000 samp/sec/chan (values that I can't use in the application - just for troubleshooting). This partially resolves the problem:
This suggests to me that the bandwidth of the PCI bus, or the time taken with the driver or on the card to process samples (although, not to send the samples to my application - I removed the call to DAQmxReadAnalog...) is causing a 1-buffer-long delay on some calls from the API to my callback. The computer I'm running is a brand-new Linux machine with a quad-core i7 processor, 2 gigs of RAM, and an nvidia 7300gs graphics card in a PCIx16 slot. (the mobo has several PCIx slots and two PCI slots).
Sorry to have taken up so much space explaining the situation. My question for the forum is - where could this jitter/lag be coming from? Should I be spending time trying to do things like setting the CPU affinity for my application, and for the driver? Do I need a real-time kernel? Do I have things going on in the PCI bus that I could kill, freeing up time? Would the problem be solved by using a PCIx NIDAQ card - supposedly the express bus is faster?
Or on the other hand, is this kind of latency a fact of life when using callbacks?
If you have any suggestions or insights about things that will or will not work, or if there is any other info I should post about my system, the application, etc etc, please let me know. Any help is hugely appreciated!!
-Greg
Solved! Go to Solution.
07-29-2011 01:22 PM
Greg -
To address your questions :
where could this jitter/lag be coming from? Jitter will be a fact of any clock based measurements, to accurately try and measure whether it is realistic you would want to know the ppm error on the clock. This alone could give you some insight to a typical loss for your clock's frequency. From what the information I have been able to find, I don't think PCI vs PCIe is going to affect the amount of jitter because it is directly related to its clockbase you're referencing. PCIe allows your more bandwidth but you are still subject to the clock you are running off of regardless of bus type.
Should I be spending time trying to do things like setting the CPU affinity for my application, and for the driver?
Do I need a real-time kernel? Having a real-time application would add the prioritization of certain loops to run in consistent, predictable response times for high-priority applications. Essentially, this could make a significant difference if your operating system is the culprit of some of the latency and allow you to run critcal code within a known execution time.
Do I have things going on in the PCI bus that I could kill, freeing up time? Would the problem be solved by using a PCIx NIDAQ card - supposedly the express bus is faster? With any OS, other running applications could affect your latency certainly...
Or on the other hand, is this kind of latency a fact of life when using callbacks? You are correct that there is always going to be some lag on these calls.
The most pertinent thing I can think of is that, regardless of your processing power and PC capability, there is always going to be latency and jitter to deal with. However if you do have a deterministic (real-time) system running, the jitter is a consistent (smaller) and known value. I hope this helps with your questions.
Regards,
Ben
National Instruments
07-29-2011 02:22 PM
Hi Greg,
From your numbered list of possible causes, I'd say the most likely is a combination between #2, #3, and #4 (i.e. the jitter is almost certainly within the software as opposed to the external clock):
As Ben mentioned, you really can't guarantee determinism on a non-RT OS. One thing that you might try is to set the DAQmx Read "wait mode":
DAQmxSetReadWaitMode(TaskHandle, DAQmx_Val_Poll);
By Default, DAQmx yields control of the thread to the OS for other tasks while it is waiting for data to be available. Presumably since you are using the Every N Sample event the data should already be present, but I still say it's worth a try.
The best solution however would be to implement everything in hardware. M Series DAQ cards have 2 counters available, I'd suggest using one of them to divide-down your AI Sample Clock (in your case, by 32). Then, use this divided-down signal as the sample clock for your counter input task. This way, you don't rely on software events to synchronize each AI Buffer with the external timestamp.
Best Regards,
07-30-2011 05:06 PM
Thanks a million for your answers Ben.
Yeah, when designing this thing I imagined there would almost definitely be _some_ jitter. What still strikes me as being weird is that the jitter isn't normally distributed. It's almost always exactly one buffer-length (1 ms, 32 samples). I imagine that this is a pretty strong hint about the source of the jitter, and the key to its undoing, but I haven't been able to make anything of that hint. If you keep in on the back-burner in your mind.... or maybe it's already clear to you why OS jitter would manifest itself that way, and we can't really glean anything from that 'hint' that could be applied to the problem.
I seriously considered real-time linux kernel. But as I was reading about one from CERN, I got hung up on their assurance that 90% of speed-up comes from optimizing the interrupts & cpu-affinities and stuff; the actual RT only gives you the last 10%. Maybe they just say this to prevent people from getting ridiculously high expectations about their speed-up. Anyway, I'm still mulling over the costs and benefits. I'm already worried about the RHEL5 restriction for the driver, so I'm wary about adding more requirements (which could break other parts of my system).... but the jitter has to go, so as other attempts at fixing it fail, I may at some point bite the bullet and try to get the RT thing going (I don't have any experience with installing, running, programming-for those... so the learning curve is part of the utility calculation for me unfortunately).
BUT I played around with running my application at different "nice" levels. It seems I can modulate the number of lag occurrences by running my process at high or low priority. And maybe interestingly, this seems to increase the likelihood of getting an exactly-1-ms lag, but it doesn't give me 15ms lag events for example. Really low priority settings can give me lags that stack on top of one another, but always in integer multiples of 1ms. Fun! 🙂
From your number of posts it looks like you're new the help forums? If so keep up the good work! Thanks again for your thoughts.
-Greg
07-30-2011 06:16 PM
Hey John,
DAQmxSetReadWaitMode(TaskHandle, DAQmx_Val_Poll); <-- I was very excited to try this! Unfortunately it ran without DAQmx error but gave the same pattern of lags. As I mentioned in my re: to Ben - it seems funny that the jitter isn't normally distribited around 0ms... it's almost always 1 ms on the nose, and the thing that gets modulated is the likelihood of that 1ms lateness. Just a moment ago I tried changing my linux runlevel startup mode in /etc/inittab to 2, so that linux didn't even have to give me a windowing environment, thinking that that would take a lot of the baseline load off the computer. That seemed to make the lags more frequent!
I really like the hardware timed counter task idea. I tried to implement that in a little test program and in my bigger application, in both cases I used
// write a ( 80e6 Hz / 80e3 = 1kHz square wave)
DAQmxCreateCOPulseChanTics( cnt_generation_task, "Dev2/ctr0", "", "OnboardClock", DAQmx_Val_Low, 0, 40000, 40000 );
// Set that task up to write continuously? Grabbed from example code but not clear on why we use imlicit timing
DAQmxCfgImplicitTiming ( cnt_generation_task, DAQmx_Val_ContSamps, 1000 );
// Set up counter in
DAQmxCreateCICountEdgesChan( count_in_task, "Dev2/ctr1" , "", DAQmx_Val_Rising, 0, DAQmx_Val_CountUp );
// Set the hardware trigger. PFI9 is hardwired to ctr0 because ConnTerms seems to crash for me
DAQmxCfgSampClkTiming( count_in_task, "/Dev2/PFI9", 1000.0 , DAQmx_Val_Rising, DAQmx_Val_ContSamps, 1000 );
// Then I set up the analog in task.
// I DAQmxSetRefClkSrc on the AI task to "OnboardClock"
// b/c otherwise I get a DAQmx error about that resource being used by another task.
This works GREAT in a test application. When I replicate it in my bigger thing, it seems to accumulate about 3 seconds of slowness for every minute of run time. And after running for about 10 minutes I get an error from mx saying that I'm requesting old samples from the counter task... I'll be messing with the code more to try to figure out what I'm doing wrong. Hopefully I'll get it in the next few days, then I'll click "Accept as Solution" - or if I don't get it maybe I'll post again with better information about what's happening.
Thanks for reading and for the advice - you guys are the best!
-Greg
08-01-2011 01:14 PM - edited 08-01-2011 01:16 PM
Hi Greg,
The counter output and Analog Input are both derived from the same source in your implementation, but there would still be a delay between when each is started. I think you should instead do something like this:
//divide the AI sample clock down directly. Output is 50% duty cycle. First pulse occurs on the 32nd tick of AI.
DAQmxCreateCOPulseChanTics( cnt_generation_task, "Dev2/ctr0", "", "ai/SampleClock", DAQmx_Val_Low, 32, 16, 16);
//Implicit timing means that the source (ai/SampleClock) determines when the next sample (a.k.a. period) begins.
DAQmxCfgImplicitTiming ( cnt_generation_task, DAQmx_Val_ContSamps, 1000 );
// Set up counter in
DAQmxCreateCICountEdgesChan( count_in_task, "Dev2/ctr1" , "", DAQmx_Val_Rising, 0, DAQmx_Val_CountUp );
// Sample the internal output of the other counter.
DAQmxCfgSampClkTiming( count_in_task, "Ctr0InternalOutput", 1000.0 , DAQmx_Val_Rising, DAQmx_Val_ContSamps, 1000 );
...
//set up analog in task, no need to call DAQmxSetRefClkSrc, OnboardClock is the default. You can't change it unless you set the same RefClkSrc for all tasks using an internal timebase.
...
//start counter tasks before ai task to ensure synchronization
DAQmxStartTask(count_in_task);
DAQmxStartTask(cnt_generation_task);
DAQmxStartTask(th);
...
You can read back the count value in your Every N Sample Event callback. Even if the callbacks don't occur exactly every 1ms, the count value is still sampled and buffered on every 32nd AI Sample Clock edge. However, for this to keep up over time, the code inside your callback needs to execute once per ms on average--I can't guarantee that this will necessarily be the case. Do you have to output something every ms, or is it OK to buffer up the data and run the loop less frequently?
Best Regards,
08-04-2011 04:46 PM
Thanks for the explanation and the fix to my couple example code lines. This works! I'm so happy! Our lab owes you a cheesecake!
To answer your question - no it's not really essential that my program outputs data once per milisecond. If we get backed up by 20 or 30 cycles then it's bad (buffer overflows from nidaqmx and other problems in our code downstream), but 2 or 3 ms here and there is acceptable. The only real problem was with the timestamping, and it looks like the triggered count trick is working beatifully. Thanks!!
08-04-2011 04:50 PM - edited 08-04-2011 04:51 PM
It sounds like it's working then, but if you do run into issues then you can likely fix the problem by triggering the callback less frequently (e.g. after every 10 ms of data instead of every 1 ms). There is a certain overhead associated with implementing the callback and calling into DAQmx Read, so reading more samples per callback will give you less overhead. You can read more than one sample from the counter each loop if you still want to sample the external oscillator (for the timestamp) every 1 ms.
Best Regards,