uninstalling a callback from within the callback

menchar · ‎03-26-2010

More info -

Using the windows handle to the comport and getting comm properties with the Win32 SDK function GetCommProperties() is a waste of time.

The com properties do not show the max in q size (always reports 0) and the actual size always reports as 512.

I sucessfully set the inqueue length to 65536 using OpenComConfig ( ) and let if fill up and then probed the InQLen and saw it fill to the size I had set.

I changed the callback to uninstall itself whenever it was called for RXFLAG.

I changed the main thread logic to process system events every 100 milliseconds.

I instrumented the callback to tell me several things:

1. How many bytes were in the inqueue when it was called.

2. How many bytes it had to read with CommReadTerm to get the RXFLAG character (start of message character)

3. How many bytes it had to read with CommReadTerm to get the crg rtn after the RXLAG character (the size of the message body)

4. If the callback is ever invoked because the inqueue is half full and needs to be read out (half-flushed).

Then I ran it and dumped all of this whenever I got a CRC error.

As it turns out, every time I get a CRC error, the number of bytes I have read to get to the RXFLAG char is exactly 60 !?

The InQ length varies as would be expected, it usually has a few hundred bytes in it. I flush it every time before installing the callback. The slowest message is emitted every second, so it should never take long for the driver to see RXFLAG.

The callback is never invoked for the inqueue being half full.

I usually get some part of a valid message body when the CRC error occurs, usually the last part of it, with a gap of one or more chars missing within the body of the message at the start of the message body. Sometimes I get the message body for a completely different message stuffed into what should be the body of the message I'm after - the ComReadTerm for the CRG RTN winds up finding the end of the next message in the queue it seems.

I see this behavior on multiple message types.

It seems I can affect the behavior by changing the length of time between when the callback is first entered, and when I make the ComReadTerm to get the start of message char (the RXFLAG char). I'll experiment with this.

I'm running on a realtively fast quad core Xenon and XP Pro, so there's true concurrency happening here.

I can watch the serial message stream till the cows come home using Hyperterminal and the message stream never screws up ... this is something happening in my design or in the rs232 library I have to think.

I'm wondering if the callback is getting invoked for the serial driver having seen the RXFLAG character but the inqueue isn't fixed up with the right data?

I know NI made the serial driver multi-threaded, with threads doing queue management (I think NI explained how this was done once, I need to search the forum), I wonder if there's a race condition of some kind here.

I'll eventually byte the bullet and code a retry mechanism (the messages are cyclic, I just need to grab one of them) but something seems very wrong here.

Menchar

Message Edited by menchar on 03-26-2010 04:09 PM

menchar · ‎03-26-2010

More info -

Now I see that what is happening is that the error occurs whenever the message I'm interested in follows a particular message, let's say the 'x' message that is 60 bytes long, whenever it is perfectly fitted into the start of the inqueue after I flush the inqueue prior to installing the callback.

So, if I'm trying to capture a 'y' message I do this:

FlushInQ ()

InstallComCallback (RXFLAG == 'y')

and then if by chance the inqueue winds up being exactly this:

x <x msg body> <crg rtn> y ....

Then it screws up and what should be the y message body isn't, it's usually the first part of a y message body but then with either chars missing or the chars from some other message in it.

The 'x' message is the most common message, more than half of the messages are 'x' messages so it may not be that it's an 'x' message but rather that some message has started at the exact beginning of the inqueue after an inqueue flush, so there's exactly one intact message in front of the 'y' message in the queue.

It's hard to figure how this would matter to the windows serial driver or the rs-232 library.

I've captured thousands of messages with hyperterminal and never see a problem.

I know the windows serial driver and/or the NI rs232 library does special handling for the CRG RTN looking for a trailing LF, maybe this has some implications.

Weird, huh.

menchar · ‎03-26-2010

So I tried putting various delays at the start of the callback, and a delay of about 100 msec is a sweet spot, the frequency of the error goes way down to once every several minutes instead of once every 30 seconds or so.

I tried eliminating the InQ flush that I was doing right before I installed the callback for the message character (e.g. RXFLAG = 'y') and now I can't get it to fail. I've also eliminated the delay at the start of the callback, so the improved behavior is all a consequence of eliminating the flush of the inQ.

This could be due to something not quite right with the FlushInQ code, but then it also makes it extremely rare that the InQ will start with an 'x' message perfectly fitted into the start of the Q and with the 'y' message right behind it, though for the life of me I can't explain why that would matter.

Now the callback is called for LWRS_RECEIVE periodically (and reads out the first half of the InQ), but the InQ is never flushed with the FlushIQ function.

It's run solid for 25 minutes now without missing a beat.

There's nothing in the CVI help that would make you think that there are any constraints on the the use of FlushInQ.

I should point out that I'm using a virtual com port, it's a USB port physically and I'm using a Silicon Labratories USB to UART bridge, driver version 5.4.24.0 so maybe that's part of the problem when flushing the InQ. But as I said you can watch this same port HW/SW with Hyperterminal all day and it never fails. (but Hyperterminal isn't flushing the InQ )

This is making an old man out of me and I surely don't need that. I'm going to declare victory and eliminate all of the FlushInQ calls.

Menchar

Message Edited by menchar on 03-26-2010 07:31 PM

Message Edited by menchar on 03-26-2010 07:32 PM

Mert_A. · ‎03-29-2010

My guess is that the reason not flushing the queue solves your problem is that it makes the first call to ComRdTerm take longer (because there are more bytes in the queue that need to be read through, byte-by-byte). It's essentially having the same effect as putting in an explicit delay, or zeroing out your 4096-byte szInputArray one byte at a time.

I noticed that in your posted code you don't error check (or return value check) the calls to ComRdTerm. It might be possible that the second call is timing out. The callback is being triggered when your event character is received, but since that character preceeds the message body, it might be possible that the message body is not completely available before the timeout period expires. If you are not first zeroing out the array, and you don't append a nul character based on the number of bytes returned by ComRdTerm, you might be looking at stale data.

I realize you've already found a way to avoid your problem, but this might be a possible explanation for what is happening.

Mert A.

National Instruments

menchar · ‎03-29-2010

Mert -

Thanks for the reply.

As it turns out, the same thought occurred to me on Saturday, maybe it's timing out, so I added a check for this and it turns out that I am not seeing a timeout.

I used the thread-safe routine (getrs232err I think it is) and checked for -99 and that never happens.

The thing is so solid without the FlushInQ call I do have to think that's the problem. I thought it also might be the special carriage return processing to strip a trailing LF - that's got to take some time to check even if there's no LF there. I don't know what level the LF strip is implemented at - is it in the rs232 library?

The fact that I'm running on a quad core Xeon this time makes me wonder if there isn't some subtle race condition on the flush - this same technique/design seems to have worked all OK three years ago with CVI 8.5 and a Hyperthreaded (single core) machine.

When you think about it, flushing the In Queue is rather drastic, and it's not hard to imagine that it may be hard to implement (much less prove correct) a scheme that can synchronously flush the inqueue without dropping any bytes at all out of a high speed data stream.

The scheme works even without the queue probe and read/discard (GetInQLen + ComRd) in lieu of the flush - it's just that the callback gets invoked for the inqueue being half full (I set LWRS_RECEIVE to half the size of the inqueue) and I can avoid this by draining the queue first.

I did manage to get the in queue size up to 65536 bytes and that affects the execution scenario as well.

Menchar

menchar · ‎04-01-2010

More info -

I've tested the code for many hours now, tens of thousands of iterations, and it's solid, not a single error with the new method of reading out and discarding all of the data in the inq prior to installing the callback. I only use the FlushInQ call when I first open and config the serial port.

I was also using the technique to snag transient (one time) messages and I was getting a timeout on that. I increased the timeout and that problem disappeared. The effect of the timeout was that it looked like the message was not being emitted by the HW.

As for Mert's thought that a timeout could be causing the original error, that would be an explanation except that I wasn't getting a timeout error, and I was seeing characters towards the front of the message body that were missing, with the rest of the message still being captured. If a timeout on ComRdTerm is the same as with ComRd, I would expect a timeout return from ComRdTerm the first time it times out for receipt of any char, and that it wouldn't go back and keep looking for the Term character.

Menchar

Message Edited by menchar on 04-01-2010 10:32 AM

LabWindows/CVI

uninstalling a callback from within the callback

Re: uninstalling a callback from within the callback

Re: uninstalling a callback from within the callback

Re: uninstalling a callback from within the callback

Re: uninstalling a callback from within the callback

Re: uninstalling a callback from within the callback

Re: uninstalling a callback from within the callback