11-15-2012 07:37 AM
I have a cRIO-9073 (SW: NI System Configuration 5.3.0, NI-RIO 12.0, NI-Serial RT 3.8.2, NI Watchdog 5.2.0, System State Publisher 3.0.0) running a rtexe made in LV2012 f3 (32bit).
Then I needed to add a log fetcher at a site. I used a cRIO-9012 that by FTP check every 10sec if a new log is available. If the log file has been updated it downloads it to its USB-stick. It closes the FTP connection every time. This logger seems to work for weeks without any issues.
So do anyone have any explenation or good guess on what's happening and what I can do to fix this?
Does not a watchdog reset not do a full reset?
11-19-2012 06:18 AM
Now I have tested with use of the nisyscfg.lvlib:Restart.vi every hour and it gives the same result. It lasted for 27.5 hours before it just stopped and went into idle mode (the CPU load is at 2-3% and FTP access works fine, but my rtexe is not running). Note that this stop was after ~40minutes since last reset therefore 20min to next scheduled restart and the watchdog was not triggered.
There system is only logging noise at the moment, so it should not enter any exeptions...
Anyway, a press on the reset button will make it run for another 1-2 days.
11-20-2012 03:05 AM - edited 11-20-2012 03:06 AM
Found a log file on the cRIO that looks to be related to the problem.
It looks like an internal error 2 is occuring. As also reportet in this thread DWarn: Internal error 2 occurred. They say the error is related to out of memory. I did not see any problems in the first 27 hours it ran (between 14-22MB free, see the log if you want), and since it is reseted every hour it should not enter this problem.
Anyway if it goes out of memory, will not the watchdog handle an internal error 2 so it can recover from it?
lvrt.out_12.0__cur.txt:
####
#Date: SAT, NOV 17, 2012 06:33:07 PM
#OSName: VxWorks
#OSVers: 6.3
#OSBuild: Jun 14 2012, 08:46:35
#AppName: /c/ni-rt/system/lvrt.out
#Version: 12.0
#AppKind: AppLib
<DEBUG_OUTPUT>
11/17/12 07:11:08.797 PM
DWarn 0xC1EAEA9C: Internal error 2 occurred. The top-level VI "cRIOmain.vi" was stopped at unknown "" on the block diagram of "Determination.vi".
source/server/RTEmbEditor.cpp(105) : DWarn 0xC1EAEA9C: Internal error 2 occurred. The top-level VI "cRIOmain.vi" was stopped at unknown "" on the block diagram of "Determination.vi".
</DEBUG_OUTPUT>
11-20-2012 03:28 AM
So it's probably not a RAM problem but your SSD is maybe filled up with measurement data. Use the MAX to have a look at your system resources.
Hope it helps
Christian
11-20-2012 06:53 AM - edited 11-20-2012 06:55 AM
The disk is very little used. Rewrites a log file every minute (2MB) and put some strings in a few per day log files which is compressed when new ones are started.
Total Disk Space 119MB
Free Disk Space 90.6MB
11-23-2012 05:25 AM - edited 11-23-2012 05:25 AM
I have tried to provoke this error by running the "Determination.vi" 1million times per minute instead of 1 time. I also recompiled the VI, but this still used colse to a day to fail (23.5 hours).
I will now disable this VI and see if it helps.
No matter what, I will now move my watchdog down to the FPGA, since the cRIO watchdog will fail when such an exeption happens.
11-26-2012 02:56 AM
By disabling the "Determination.vi", it looks like the .rtexe do not fail as previous. So I will re-enable it and remove parts of it.
This vi is quite simple, except it uses the NI_AALPro.lvlib:1D Cross Correlation (DBL).vi, which is calling a the fuction Rxy80 in the lvanlys.*
Do anyone think that this could be the one making the .rtexe to fail?
12-10-2012 04:08 AM
By adding a watchdog down into the FPGA, the problem is handled. This should not be the solution, but for me it works since I do not have any time critical stuff running on my cRIO.
This scares me a bit and prevents me to use a cRIO in control applications, where the FPGA will not be able to do all what I want.