LabVIEW

cancel
Showing results for 
Search instead for 
Did you mean: 

cRIO disconnects from Ethernet network and stops responding to pings

We are running a cRIO application that periodically disconnects from the Ethernet network even though the VI continues to run on the cRIO.

The application has two parallel loops.  One loop reads data from the FPGA and writes it to a RT FIFO.  The second loop opens a TCP connection to a remote host, reads data from the RT FIFO, and transmits the data to the remote host.  We have instrumented each loop to send messages to the serial port that indicate their status.  When the cRIO disconnects, both loops continue to send their status messages to the serial port. The first loop happily continues reading data from the FPGA and writing it into the RT FIFO.  The RT FIFO overwrites the oldest data when it fills up to keep it from overflowing.  The second loop reports a TCP write timeout (tcp write error 56) around the time that the cRIO disconnects, after which it closes the existing TCP connection and then repeatedly tries to open a new TCP connection (each attempt times out after a 5 second delay).

All that appears normal and usually the connection is automatically restored.  Occasionally, however, the cRIO is not able to re-open the TCP connection.  Furthermore, the cRIO stops responding to pings from the remote host.  On two occasions, the cRIO was finally able to open a new connection after 30-40 minutes.  Other than that, the only way we've found out of this dilemma is to reboot the cRIO.

The frequency of the disconnects and the time before they occur doesn't seem connected to whether or not the cRIO is set to run the application at startup or if the application is run via an interactive VI window on the remote computer.

It is possible that the disconnects have something to do with which computer the cRIO is connected to, as the problems seem to happen much more frequently (often within minutes) when the cRIO is sending data to a laptop running Windows XP, but it can run for hours when sending data to a rackmount server running Windows Server 2003.  Everything else (the speed and size of data being sent, the ethernet hub and cables, the VI running on the remote computer) is the same.

0 Kudos
Message 1 of 17
(12,780 Views)
Hi JohnZ,

Which real time controller and chassis do you have?  What versions of LabVIEW Real Time and NI-RIO do you have?

Do you have any statistics on CPU and memory usage?  If you have anything in your loops which could cause the loop to run continually, it can starve lower priority threads which handle network communications.  Can Measurement & Automation Explorer see the cRIO after you have lost your TCP connection?

Regards,

Jeremy_B

Applications Engineer
National Instruments
0 Kudos
Message 2 of 17
(12,753 Views)
We're done most of the testing with a 9014 real time controller and a 9104 chassis.  However, we've also seen similar behavior with 9014/9103 and 9102/9101 combinations.

The 9014 cRIO normally runs at around 50% CPU usage and less than 40% memory usage when the data is flowing steadily.  The RT System Monitor did not indicate a spike in either before the cRIO disconnected.

Measurement and Automation Explorer is not able to connect to the cRIO after it disconnects.  We tried pinging the cRIO from a different computer, but that also failed.

After disconnecting, the cRIO sends error messages to the serial port while trying to open a new TCP connection. It reports an error 56 (time-out error) 3 or 4 times (for a total duration of 15 or 20 seconds) and then reports an error 42 (generic error) at least 10 times per second for 18 seconds, and then switches back to 15 seconds of timeouts, and so on. When getting error 42, it ALWAYS gets it for 18 seconds straight.  It usually gets error 56 for 15 seconds (3 timeouts) but sometimes for 20 seconds (4 timeouts).

When the cRIO disconnects, it can't be contacted by any computer through the ethernet port, yet the link lights still show that it is connected. If its connected directly, both the green and orange link lights on the cRIO light up, but when it is connected through the 10 Mb hub, only the orange link light lights up, but the hub shows that the cRIO is connected to it. In spite of this, it is still impossible to ping the cRIO in either scenario.

It is possible that, in addition to what computer the cRIO is connected to, the errors could be related to whether the cRIO has a direct connection to a computer or if it is connected through a hub. Today, we have only managed to get errors when the cRIO is directly connected to a laptop (one error after 1.5 minutes, one after 10 minutes, one after 7 minutes, etc.)  There were no disconnects when the cRIO was connected to a 10 Mbps hub for more than half an hour.

Do any of those symptoms give you clues about what is happening?

Thanks!
0 Kudos
Message 3 of 17
(12,734 Views)
Hi JohnZ,

What version of NI-RIO are you using?  If it is not the latest (2.4.1), you could try updating it in case there were any bug fixes. 

http://joule.ni.com/nidu/cds/view/p/id/1057/lang/en

Would it be possible for you to post your real time VI for us to take a look at?  Normally, disconnections are related to either a hardware failure (which sounds unlikely, given that you've reproduced it with different hardware), or from thread starvation.  If you post your VI, we can try to reproduce it here.  If there is a bug in the driver software itself, we would definitely want to investigate.  If you aren't comfortable posting your files here, I can arrange for you to be able to e-mail them to us.
Regards,

Jeremy_B

Applications Engineer
National Instruments
0 Kudos
Message 4 of 17
(12,709 Views)
It turns out the cRIO's do have older versions of NI-RIO (2.3.0 and 2.3.1).  We'll install the updates and see if that resolves the issue.

Thanks!
0 Kudos
Message 5 of 17
(12,696 Views)
Hi JohnZ,
 
Let us know if that helps, if it doesn't, I'll definitely want to investigate further.
Regards,

Jeremy_B

Applications Engineer
National Instruments
0 Kudos
Message 6 of 17
(12,692 Views)
We just upgraded the cRIO software to NI-RIO 2.4.1, and the error has changed.

Now it will crash once, recover, and then crash again. After the second crash the status LED blinks 4 times (indicating that the cRIO crashed twice without rebooting correctly in between). The time between crashes varies; it has crashed after only 7 minutes, but has also lasted almost a full hour on another run.

Neither the CPU nor memory usage spiked before either crash.

Unlike before, the cRIO completely stops the program it's running when it crashes (even the serial port stops generating output), but it now responds to pings and can be rebooted through MAX. The application on the cRIO cannot be opened in debug mode after the cRIO's second crash.

With the old NI-RIO software, the cRIO would keep running our application but no data could pass through the ethernet port in either direction. With the new NI-RIO software, the cRIO stops running our application, but the ethernet connection remains open.

Jeremy_B, if you think it would be beneficial to see our code, we could email it to you.

Thanks!
0 Kudos
Message 7 of 17
(12,665 Views)
Hi JohnZ,

I would like to see it.  I don't want you to have to post your e-mail address on the forum (unless you feel comfortable doing so), so if you would like to e-mail me you can send us a support e-mail (http://www.ni.com/support) and in the text of your e-mail reference service request number 1213457, and they'll route the e-mail to me.

You are also welcome to post it here or to our ftp site at ftp://ftp.ni.com/incoming, if you are comfortable with that.
Regards,

Jeremy_B

Applications Engineer
National Instruments
0 Kudos
Message 8 of 17
(12,648 Views)
In case anyone is interested in the solution, we finally tracked it down to the firewall interfering with the communication between the host and the cRIO.  See this KB for a list of all ports you should add exceptions for in your firewall configuration to prevent it from interfering: What Ports Do I Need to Open on My Firewall for National Instruments Software Products?
Regards,

Jeremy_B

Applications Engineer
National Instruments
Message 9 of 17
(12,403 Views)

Hello,

 

I am having a similar issue (cRIO disconnects from the network and stops responding to pings).  I am using multiple cRIOs on a local network, and opening a Remote Panel Connection to each from another computer on the network.  There are multiple "intelligent" switches in the network.

 

The problem is INTERMITTENT however.  Some days, no issues at all. Then the next day, multiple cRIO systems will stop responding through the ethernet port (no pings either) and require powering off/on just to be able to connect (the cRIO box seems to be operating as normal except one must turn it off and on to reconnect through the ethernet port).

 

I have read the above posts about opening ports... my question is... If the ports were not open already, how could the cRIO have been connected through that switch?

 

I am trying to consistently recreate the my problem of "locking up" the ethernet port, then fixing it. 

 

I have also looked at the following kb describing issues with CISCO intelligent system "cRIO 900x or cFP-21xx Controller Hangs During Boot-Up Sequence of an Intelligent Ethernet Switch"

 

http://digital.ni.com/public.nsf/allkb/F6C7D14A7BF4541A8625713F005A20D8

 

FYI:


I am using a cRIO 9004 and the following:

 

LabVIEW RT 8.0.1
 
Datasocket for RT 4.3
NI-IrDA RT 1.0.2
NI-RIO FCF 2.2.0
NI-Serial RT 3.1.0
NI-VISA 4.0
NI-VISA Server 4.0
NI-Watchdog 2.1.5
Network Variable Eng. 1.0.0
Variable Client Support 1.0.0
 
Again no "NI-RIO" driver specifically, other than NI-RIO FCF 2.2.0, but I understand that is all that is required.

Thanks in advance!

 

Con

0 Kudos
Message 10 of 17
(12,076 Views)