cRIO-9073 stops running my rtexe

paljacob · ‎11-15-2012

I have a cRIO-9073 (SW: NI System Configuration 5.3.0, NI-RIO 12.0, NI-Serial RT 3.8.2, NI Watchdog 5.2.0, System State Publisher 3.0.0) running a rtexe made in LV2012 f3 (32bit).

The rtexe is set as startup application.
It does a batch log every minute and does some analyzes
It has a watchdog which will restart the application one of the five loops are not incrementing it's loop integer within 40sec.
The application has been testet to work alone for weeks without any issues.

Then I needed to add a log fetcher at a site. I used a cRIO-9012 that by FTP check every 10sec if a new log is available. If the log file has been updated it downloads it to its USB-stick. It closes the FTP connection every time. This logger seems to work for weeks without any issues.

After 1-2 days the cRIO-9073 goes into NO-APP mode or something (does not trigger the watchdog either). The CPU load is at 2-3% and FTP access works fine, but my rtexe does not seem to be running.

I tested to add a watchdog whack failure every hour. Every hour my logfile shows a new boot up and it starts running my rtexe just fine, but still after 1-2 days it went into the same NO-APP mode.
I tested to just push the reset button on the cRIO-9073, and it makes the system start again.

So do anyone have any explenation or good guess on what's happening and what I can do to fix this?

Does not a watchdog reset not do a full reset?

I will try to use the nisyscfg.lvlib:Restart.vi every hour instead of preventing the watchdog to get whacked, but if this works I would get scared to use the watchdog feature...

paljacob · ‎11-19-2012

Now I have tested with use of the nisyscfg.lvlib:Restart.vi every hour and it gives the same result. It lasted for 27.5 hours before it just stopped and went into idle mode (the CPU load is at 2-3% and FTP access works fine, but my rtexe is not running). Note that this stop was after ~40minutes since last reset therefore 20min to next scheduled restart and the watchdog was not triggered.

There system is only logging noise at the moment, so it should not enter any exeptions...

Anyway, a press on the reset button will make it run for another 1-2 days.

paljacob · ‎11-20-2012

Found a log file on the cRIO that looks to be related to the problem.

It looks like an internal error 2 is occuring. As also reportet in this thread DWarn: Internal error 2 occurred. They say the error is related to out of memory. I did not see any problems in the first 27 hours it ran (between 14-22MB free, see the log if you want), and since it is reseted every hour it should not enter this problem.

Anyway if it goes out of memory, will not the watchdog handle an internal error 2 so it can recover from it?

lvrt.out_12.0__cur.txt:

####
#Date: SAT, NOV 17, 2012 06:33:07 PM
#OSName: VxWorks
#OSVers: 6.3
#OSBuild: Jun 14 2012, 08:46:35
#AppName: /c/ni-rt/system/lvrt.out
#Version: 12.0
#AppKind: AppLib

<DEBUG_OUTPUT>
11/17/12 07:11:08.797 PM
DWarn 0xC1EAEA9C: Internal error 2 occurred. The top-level VI "cRIOmain.vi" was stopped at unknown "" on the block diagram of "Determination.vi".
source/server/RTEmbEditor.cpp(105) : DWarn 0xC1EAEA9C: Internal error 2 occurred. The top-level VI "cRIOmain.vi" was stopped at unknown "" on the block diagram of "Determination.vi".

</DEBUG_OUTPUT>

christian_w · ‎11-20-2012

So it's probably not a RAM problem but your SSD is maybe filled up with measurement data. Use the MAX to have a look at your system resources.

Hope it helps

Christian

paljacob · ‎11-20-2012

The disk is very little used. Rewrites a log file every minute (2MB) and put some strings in a few per day log files which is compressed when new ones are started.

Total Disk Space 119MB

Free Disk Space 90.6MB

paljacob · ‎11-23-2012

I have tried to provoke this error by running the "Determination.vi" 1million times per minute instead of 1 time. I also recompiled the VI, but this still used colse to a day to fail (23.5 hours).

I will now disable this VI and see if it helps.

No matter what, I will now move my watchdog down to the FPGA, since the cRIO watchdog will fail when such an exeption happens.

paljacob · ‎11-26-2012

By disabling the "Determination.vi", it looks like the .rtexe do not fail as previous. So I will re-enable it and remove parts of it.

This vi is quite simple, except it uses the NI_AALPro.lvlib:1D Cross Correlation (DBL).vi, which is calling a the fuction Rxy80 in the lvanlys.*

Do anyone think that this could be the one making the .rtexe to fail?

paljacob · ‎12-10-2012

By adding a watchdog down into the FPGA, the problem is handled. This should not be the solution, but for me it works since I do not have any time critical stuff running on my cRIO.

This scares me a bit and prevents me to use a cRIO in control applications, where the FPGA will not be able to do all what I want.

LabVIEW

cRIO-9073 stops running my rtexe

cRIO-9073 stops running my rtexe

Re: cRIO-9073 stops running my rtexe

Re: cRIO-9073 stops running my rtexe

Re: cRIO-9073 stops running my rtexe

Re: cRIO-9073 stops running my rtexe

Re: cRIO-9073 stops running my rtexe

Re: cRIO-9073 stops running my rtexe

Re: cRIO-9073 stops running my rtexe