Counter stops sampling during several-minute acquisitions

ericson.15 · ‎07-15-2008

Hello All,
I am working on rewriting a data-acquisition program for DAQmx. The code was originally written for traditional DAQ, but ceased working when we upgraded to LabView 8.5 and the corresponding version of DAQmx. The program is designed to sample many accelerometers using one AI Voltage task, and also sample a motor tach sensor and record all this data. The program works fine right now except sometimes after 2 or 3 minutes of recording, the counter suddenly stops gathering data for no apparent reason. It's not a timebase overflow error, because I've already dealt with that problem seperately. I'm hoping someone has encountered similar behavior before and can help me out, because i'm not sure how to proceed with troubleshooting.
The counter task is a period measurement which runs continuously and takes all the samples from the buffer each time. This number varies with motor speed (more samples are taken per loop) and also seems to vary 10 or 20% independent of motor speed (luckily this doesnt affect or post processing programs). Interestingly, i recorded the number of samples taken at one of these "counter stopping events" and found the last few sample numbers were: 947, 981, 964, 2(?!), 0, 0, 0... I'm hoping the presence of that bizzare 2 helps someone out.

Here is a picture of the relevant section of this code:
The sub-vi "Idler MX Config" contains the configuration for the voltage task, and the sub-vi "Idler MX Read" contains the voltage read block, these work fine having been tested over many programs. It's a work in progress so I apologise for the mess.

Any advice is greatly appreciated. The program does work, but this intermittant problem is hampering its usefulness at the moment.

Tristan Ericson

Ohio State University Mechanical Engineering
Dynamics and Vibrations
201 West 19th Ave Suite N350
Columbus, OH 43210-1142

Office: 614-292-9029
Lab: 614-247-8077
Fax: 614-292-3163

"No one is useless in this world who lightens the burden of another." - Charles Dickens

Kevin_Price · ‎07-16-2008

I've done a lot of counter apps and I don't believe you've stumbled on a known issue with the board or driver.

- Perhaps you have a noisy tach signal causing an error in your counter task? This could explain *part* of your observations. The other part, well, I would start by suspecting your app code.

- The 10-20% timing variability didn't immediately raise any alarms, but the fact that it's independent of motor speed *does* trigger some suspicions. It suggests that spinning your motor at different speeds causes your software timing variability to change inversely. Double the speed and a 10% variation in # of period samples represents only *half* as much time as the 10% variation meant at the original speed. Something about that relationship doesn't sit right with me. Dunno what exactly to point to, but it's, well, weird.

- My guess for why you get an instance of 2 samples before getting stuck at 0 is that for some reason, the *prior* iteration finished much faster than normal. Perhaps this is because an error was asserted causing a block of time-consuming code to be skipped? Whatever the cause, it's telling you that you looped back around so fast that the task didn't have time to buffer the normal ~900 samples, but only had time for 2. So I agree, the "2" should be a bigger clue than the sequence of 0's.

- comment: is that a 10000 second timeout to your counter task? Is that on purpose? Really?

- just curious -- I notice you don't specify a buffer size for the counter task. Try explicitly setting that to a fairly large size (or at least query the size with a property node). I don't know how well DAQmx guesses at a default buffer size for an "Implicit Timing" period measurement task. (It generally does well for explicitly timed tasks.)

- what does the rest of the loop code do?

-Kevin P.

ALERT! LabVIEW's subscription-only policy came to an end (finally!). Unfortunately, pricing favors the captured and committed over new adopters -- so tread carefully.

ericson.15 · ‎07-16-2008

Thanks for the reply Kevin, you've given me several new things to think about.

I've done a lot of counter apps and I don't believe you've stumbled on a known issue with the board or driver.

- Perhaps you have a noisy tach signal causing an error in your counter task? This could explain *part* of your observations. The other part, well, I would start by suspecting your app code.

> I've checked the tach signal with an oscilloscope, and found it to perform well, additionally the old version of the code never had any problems with it

- The 10-20% timing variability didn't immediately raise any alarms, but the fact that it's independent of motor speed *does* trigger some suspicions. It suggests that spinning your motor at different speeds causes your software timing variability to change inversely. Double the speed and a 10% variation in # of period samples represents only *half* as much time as the 10% variation meant at the original speed. Something about that relationship doesn't sit right with me. Dunno what exactly to point to, but it's, well, weird.

> Our motor has a little bit of speed variation (I think <1%) which explans some of it. I don't believe the software timing is changing much (is there a way to measure this?). The variability in number of samples with respect to motor speed makes sense to me given the number of pulses passing duing a sampling interval. at 50 RPM it's 16 or 17 at 3000 RPM it's around 1000.

- My guess for why you get an instance of 2 samples before getting stuck at 0 is that for some reason, the *prior* iteration finished much faster than normal. Perhaps this is because an error was asserted causing a block of time-consuming code to be skipped? Whatever the cause, it's telling you that you looped back around so fast that the task didn't have time to buffer the normal ~900 samples, but only had time for 2. So I agree, the "2" should be a bigger clue than the sequence of 0's.

> This is interesting too, is there an easy way to measure the time each loop iteration takes and store it in an array?

- comment: is that a 10000 second timeout to your counter task? Is that on purpose? Really?

> Originally I thought the timeout might be to blame for the counter stopping after awhile, so I put some arbitrarily huge number in there to rule it out. Can this have downsides?

- just curious -- I notice you don't specify a buffer size for the counter task. Try explicitly setting that to a fairly large size (or at least query the size with a property node). I don't know how well DAQmx guesses at a default buffer size for an "Implicit Timing" period measurement task. (It generally does well for explicitly timed tasks.)

> This is a good suggestion, I'll try some things along this avenue and get back to you.

- what does the rest of the loop code do?

> To the right hand side of what I posted, there's only some file writing code and the end of the loop. I neglected to mention it, and probably should have, but there's a second while loop in this program running concurrently. It sends an analog out voltage via out 6704 card every 50ms to control motor speed. It's completely independent of the rest of the code but might be interfering?

-Kevin P.

Tristan Ericson

Ohio State University Mechanical Engineering
Dynamics and Vibrations
201 West 19th Ave Suite N350
Columbus, OH 43210-1142

Office: 614-292-9029
Lab: 614-247-8077
Fax: 614-292-3163

"No one is useless in this world who lightens the burden of another." - Charles Dickens

ericson.15 · ‎07-16-2008

UPDATE:

I set the buffer to a large number (100,000) and am still having the same problem. I realized I had been "hard stopping" the program whenever this problem occured, and instead tried stopping the loop so the program terminated gracefully, this allowed the (outside the loop) error reporter to report and it turns out my counter task is throwing error -200141, the data overwrite error. This seems strange to me as the program is set to read all available samples, so this shouldn't be possible. What am I missing? I've been stumped by this error before in other applications and don't think I ever fixed it.

Tristan Ericson

Ohio State University Mechanical Engineering
Dynamics and Vibrations
201 West 19th Ave Suite N350
Columbus, OH 43210-1142

Office: 614-292-9029
Lab: 614-247-8077
Fax: 614-292-3163

"No one is useless in this world who lightens the burden of another." - Charles Dickens

Kevin_Price · ‎07-16-2008

> Our motor has a little bit of speed variation (I think <1%) which explans some of it. I don't believe the software timing is changing much (is there a way to measure this?). The variability in number of samples with respect to motor speed makes sense to me given the number of pulses passing duing a sampling interval. at 50 RPM it's 16 or 17 at 3000 RPM it's around 1000.

Ok, I'm with you on this now. Sloppy thinking on my part. The nominal # samples per loop cycle varies directly with speed. So a 10% variability always represents a constant *time* variability. Yeah, makes sense.

> is there an easy way to measure the time each loop iteration takes and store it in an array?

yeah, pretty easy. There's a newish "Elapsed Time" express vi that should probably help with that. I've never explored it personally, though, and if it doesn't help enough there are pretty easy manual ways to do it. (Some of the express vi's are significantly sub-optimal, and I've developed a rule of thumb to just avoid them.)

> Originally I thought the timeout might be to blame for the counter stopping after awhile, so I put some arbitrarily huge number in there to rule it out. Can this have downsides?

Well, if the motor stops or the signal wire comes loose, you'll be waiting a really *really* long time to get your timeout error and find out. And your GUI will likely be unresponsive during this time, judging by the visible code. I've *very* wary of long or infinite timeouts. I occasionally use them in specific situations, but I usually prefer a short timeout value followed by a timeout-handling case structure. (During debug, the case may be simply a dialog that says something like, "Oops, hung.")

> To the right hand side of what I posted, there's only some file writing code and the end of the loop. I neglected to mention it, and probably should have, but there's a second while loop in this program running concurrently. It sends an analog out voltage via out 6704 card every 50ms to control motor speed. It's completely independent of the rest of the code but might be interfering?

Ok, that explains some stuff. File writing can be expected to have significant timing variability. It further sounds like this loop may be wanting/trying to burn 100% CPU all the time and it's just the file write time that throttles down the loop rate. That leads me to make a bit of an inductive leap.

Problems that develop after 2-3 minutes are somewhat commonly linked to uncontrolled memory consumption where something in code keeps demanding more and more memory. A very common culprit is an array whose size grows every cycle of a loop. Eventually, so much memory is needed that the memory manager gets hung trying to find / make itself a big enough chunk. I don't see an obvious suspect in the posted screenshot, but do you have a growing array over on the right someplace? Open your Task Manager and observe your memory usage as you start the app. See if it keeps consistently growing as the app runs.

Can you post the actual code? I can't look at it easily (currently working under Linux without DAQ drivers), but maybe someone else can...

-Kevin P.

ALERT! LabVIEW's subscription-only policy came to an end (finally!). Unfortunately, pricing favors the captured and committed over new adopters -- so tread carefully.

ericson.15 · ‎07-16-2008

Attached is the vi and the subvis it's dependent on (i hope).

Kevin, did you see my update about error -200141? That seems to be the problem here, I ran the program and didnt see any increading memory or processor usage during a 300 second test (which interestingly didn't include an error, but i think that's jsut a fluke)

Tristan Ericson

Ohio State University Mechanical Engineering
Dynamics and Vibrations
201 West 19th Ave Suite N350
Columbus, OH 43210-1142

Office: 614-292-9029
Lab: 614-247-8077
Fax: 614-292-3163

"No one is useless in this world who lightens the burden of another." - Charles Dickens

Kevin_Price · ‎07-16-2008

No, I hadn't seen your update about the -200141 error. (I worked on my reply slowly as I found a couple minutes here and there). Ok, now we're getting somewhere.

That error can *also* be thrown by a fast burst of edges coming into the counter. Most NI DAQ boards provide only a 1 or 2 sample FIFO on board for counter tasks. So the data has to get shipped off to system RAM almost immediately or else the FIFO will overflow. Anytime there's a rapid burst of edges at the counter gate, you're at risk for a FIFO overflow and a -200141 error. Note that *apparent* rapid bursts can be caused by poorly conditioned sensor outputs, "ringing" on transitions due to electrical interface issues, induced noise due to DC motor brushes, etc. I'm no expert on electrical noise, but have had multiple experiences where the counter circuitry produced similar errors when a scope probe showed an apparently clean signal. I'm told that the transient behavior of the signal during a transition can be affected by the presence / absence of a scope probe.

Ok, so now what? Well, let's first diagnose. I like to turn the counter task inside out. Rather than doing period measurement where I buffer count values on active edges, I switch to edge counting where I merely increment the count on active edges. Then I sample the count at a constant rate. While the motor is at a near-constant speed, I can expect a predicatable, near-constant change in count per sample interval. If any intervals show too large a change in counts, I can suspect there was a "burst" during that time interval, helping to provide evidence of this diagnosis.

Well, ok, but so what? What do you *do* about it? Well, bottom line is that you should stomp out the noise or glitching or whatever it is. As long as you allow it the possibility of affecting your measurement, you can't fully trust your data. If the bursts are caused by ringing, you can likely fix the problem via software and your board's other counter. (Search on "retriggerable single pulse" for more info.) If the bursts happen at seemingly random times, as could happen with a DC motor brush, you'll probably need some electrical signal conditioning. If you're lucky, better shielding and wire routing may be enough too.

-Kevin P.

ALERT! LabVIEW's subscription-only policy came to an end (finally!). Unfortunately, pricing favors the captured and committed over new adopters -- so tread carefully.

del-donno.1 · ‎07-16-2008

Kevin, I'm the original poster logged in under a different account (I'm not at my DAQ computer).

Your explanation of the 200141 error was good, and fits in line with other things I've read on this site. The seemingly random nature of this problem would fit the description of some sort of noise problem. I should take this time to say we're using 6602 cards, so it has a puny 2 slot FIFO buffer.

I'm curious why we never saw this error with our traditional DAQ implimentation, which as far as I know was used for years on this same hardware uneventfully. I recall reading another discussion post where someone had this problem with a DAQmx version, but not a trad-daq implimentation. Unfortunately, since we've upgraded to labview 8.5 (from 7.0, 7.1) and to the latest daq drivers, these older programs no longer work. Next time I get a chance I'll play with the shielding and mag pickup airgap and see if i can improve reliability.

I read another post suggesting that this error can be handled, and the task restarted. Is this something that can be done quickly enough that there might be only one bad block of data (i.e. the error would be cleared and the task restarted before the next loop starts)? I think I could live with a once-in-a-while glitch if it didn't totally ruin the data set, like it is now with the counter stopping totally once this error happens.

Kevin_Price · ‎07-16-2008

I'll address the last question first since it's easiest. If I were going to count on a scheme where I reprogrammed the task every time I detected a fairly rare FIFO overflow error, here's what I'd do: To minimize my blind time, I'd spin my DAQ Read loop quite fast. To accomodate the file logging, I'd send the DAQ data off to a queue and let another loop pull the data out of the queue for logging. For more info, search on "producer consumer" pattern.

Generally however, it's better to avoid / prevent the error than to react and recover from it.

Next, with the 6602, it's possible to configure input PFI lines for digital filtering of incoming pulses. This will essentially cause the task to ignore pulses shorter than a specified duration. Perhaps this was programmed into the old trad NI-DAQ app, hence the better behavior?

A similar effect can be obtained by configuring your other counter for retriggerable single pulse generation and then performing period measurement on its output. I made a brief reference to this in a previous post. I have a vague and uncertain notion that this method can support better time resolution on your period measurement. I'm pretty sure that under trad. NI-DAQ, the digital filter would quantize the timing of whatever it allowed through. I'm less sure but kinda think that digital filtering under DAQmx no longer injects that limitation. The former is based on memory of my own experience, the latter on memory of discussions here. So take it with a grain of salt.

Finally, you *can* still install the trad NI-DAQ driver to run your old code and test it out. But there are some constraints on switching back and forth between drivers when accessing a given board -- you must perform a reset, which you can do in MAX. From experience, I have apps in LabVIEW 8.5 which interact with some boards via DAQmx while interacting with other boards via traditional NI-DAQ. So you do have some options here, but I would generally advise continuing with your conversion to DAQmx once you finish your troubleshooting.

Final thought: the same hardware isn't necessarily the same hardware. Cabling gets moved, components age, contacts oxidize and corrode, etc. Troubleshooting Law: trust no assumptions, especially your own.

-Kevin P.

Edit: clarification

Message Edited by Kevin Price on 07-16-2008 05:36 PM

ALERT! LabVIEW's subscription-only policy came to an end (finally!). Unfortunately, pricing favors the captured and committed over new adopters -- so tread carefully.

ericson.15 · ‎07-17-2008

Kevin, your help has been invaluable. I implimented digital filtering as you suggested and haven't had that error all morning (after running a bunch of tests). So it looks like I'm all set with this problem. I'll keep your producer/consumer format suggestion in mind as well for another project we have going on here where we're sampling many encoders and many voltage tasks and doing a bunch of processing in real time. That loop runs really slow and causes problems. But that's an issue for another day, i'm thrilled to have this code working well now.

Thanks again!

Tristan Ericson

Ohio State University Mechanical Engineering
Dynamics and Vibrations
201 West 19th Ave Suite N350
Columbus, OH 43210-1142

Office: 614-292-9029
Lab: 614-247-8077
Fax: 614-292-3163

"No one is useless in this world who lightens the burden of another." - Charles Dickens

Counter/Timer

Counter stops sampling during several-minute acquisitions

Counter stops sampling during several-minute acquisitions

Re: Counter stops sampling during several-minute acquisitions

Re: Counter stops sampling during several-minute acquisitions

Re: Counter stops sampling during several-minute acquisitions

Re: Counter stops sampling during several-minute acquisitions

Re: Counter stops sampling during several-minute acquisitions

Re: Counter stops sampling during several-minute acquisitions

Re: Counter stops sampling during several-minute acquisitions

Re: Counter stops sampling during several-minute acquisitions

Re: Counter stops sampling during several-minute acquisitions