I need a recommended way of recovering from Bus Off errors

Flump · ‎12-07-2006

I have write problems when a UUT is instructed to reset. When the UUT in the reset state I get Error Passive warnings and Bus Off errors after attempting to write extended CAN messages using the Frame API.

In brief the test goes like this,

1. I send extended message 0x500 with 8 data bytes containing information to tell the UUT to go into reset.
2. I wait 400ms hopeing that the UUT get's at least 1 of the 3 possible messages. It always does and does a reset.
3. I then MUST send message 0x500 with updated data telling the UUT to come out of reset.

Problem is the write fails with a Bus Off error (can't remember the error code as I am typing this at home). I can get this to work in a brute force kind of way by repeating these steps below several times in a loop,

1. reopen the network object
2. reopen all perodic tx objects,
3. do a ncAction NC_OP_RESET on the network object,
4. do a ncAction NC_OP_RESET on all periodic tx objects,
5. do a ncAction NC_OP_START on the network object,
6. do a ncAction NC_OP_START on all the periodic tx objects,
(no warnings or errors so far from these calls allthough occasionally ends up with an exception and NI-CAN internal driver errors. I'm probably abusing the CAN standard and API with all the rapid opening and closing of all these handles and blindly ignoring errors.)
7. then do a ncWrite for all periodic tx objects (we usually get Error Passive warnings here, if the write is repeated it frequently gets a Bus Off error).

When the UUT (by chance in all honesty) gets the 0x500 message and comes out of reset, CAN operations are fine, but the problem lies when the UUT is in reset, I can't send the updated 0x500 message to tell it to come out of reset. I get randomly Error Passive errors and Bus Off errors.

Found out today this is what the UUT is doing when in reset (written in PDL),

while not received 0x500 with data indicating to come out of reset

   possibly repower most of the UUT circuitry (I can't remember)
   reset Bosch CANBUS controller circuitry on ASIC (takes 2us I'm told)
   do some unit reset processing, takes up to 100ms

wend

(yes, it resets the CANBUS controller roughly every 100ms!)

I need a sensible way of recovering from a Bus Off error and retry sending that 0x500 message again.

Any thoughts, comments, solutions?

Regards.

AdamB · ‎12-12-2006

Hi Flump,

The idea here is that many CAN devices will "sleep" after some predetermined period of inactivity (not receiving a frame). In such cases, the device usually wakes up after seeing activity on the bus, where the amount of time it takes to go from the "sleep" state to an "active" state will inevitably vary from device to device. Well, suppose the controller on a CAN network sends a frame to a device which is "sleeping," and the device takes, for arguments sake, 10 seconds to "wake up" and become active again. By definition in the CAN standard, frames which are not acknowledged will be retransmitted. Also in the CAN standard is the requirement that a device or controller implement transmit and receive "error counters" in order that an "errant" device or controller can be "silenced" if it continues to generate errors. There are 3 basic error states, the last (worst) of which is the Bus Off Error State, which occurs when the error counter exceeds 255. Herein lies the problem; if a device takes a long time to wake up, then a controller will send, and subsequently resend, the frame while it attempts to communicate with the "sleeping" device. Since the controller's transmit error counter will increase by 8 for each frame which is sent and NOT acknowledged, and it will continue sending frames until acknowledged, the controller can actually reach a Bus Off Error state before the device fully "wakes up." This is usually undesirable, and can be prevented.

For more information about the CAN standard, see Appendix B of the NI-CAN Hardware and Software Manual linked in the Related Links section below.

The solution may be to send a single wake-up frame (just one time), then delay to allow the device to "wake-up," and then continue normal communication. It is important to realize that when a device "sleeps," it actually relies on the fact that a CAN controller will send frames multiple times. That is, the first frame received when a device is "sleeping" is NOT processed. The sudden voltage change on the bus caused by a frame transmission is sensed by a CAN device and will cause it to resume active operating conditions, but the frame which initiates the wakeup cannot be processed because the hardware was previously asleep (some of it literally not powered). Thus, if we have a mechanism for sending a single "wake-up" frame, and then delay until all devices (or at least the one we intend to communicate with) wake up, we can resume normal communications while knowing deterministically that subsequent commands should/will be processed by the device to which we wish to communicate.

In the NI-CAN API, the way to transmit a single frame - one time only - is by setting the Single Shot Transmit attribute to 1 (using the set attribute function: in LabVIEW use the ncSetAttr.vi for the Frame API and CAN Set Property.vi for the Channel API). For Frame API users, the Network Configuration object (programmed explicitly) can be used, where of course we must stop and start the task (using ncAction.vi) around the attribute setting. The sequence of events would generally be: Network Config (should have happened anyway at some point), Network Open, Stop, Set Attribute, Start, Write "wakeup frame," proceed with the program after sufficient delay. Please note that the required delay may be very small; the "10 second" wake-up time suggested for a device above is much much longer than a normal device's "wake-up period". Of course, the baud rate used on a given network will factor into how many frames can be sent by a controller in a given period, and therefore how fast a corresponding error counter will increment as a result of unacknowledged frames.

Attached is an example, which will write a single "wake-up" frame using the technique described above, where the write will take place when a "Wake-Up" button is clicked.

Is this what you are looking for?

AdamB

Message Edited by AdamB on 12-12-2006 04:47 AM

Applications Engineering Team Leader | National Instruments | UK & Ireland

DirkW · ‎12-12-2006

Hi Flump,

have a look to this Post. Perhaps it helps.:

http://forums.ni.com/ni/board/message?board.id=30&message.id=2246

DirkW

Flump · ‎12-12-2006

Thanks for the lengthy reply AdamB. We don't actually have LabView but I think I know what you are getting at. We'll try this in the next day and gives us some food for thought.

Thanks for the link DirkW, I'll forward it on to work and read it there.

Automotive and Embedded Networks

I need a recommended way of recovering from Bus Off errors

I need a recommended way of recovering from Bus Off errors

Re: I need a recommended way of recovering from Bus Off errors

Re: I need a recommended way of recovering from Bus Off errors

Re: I need a recommended way of recovering from Bus Off errors