Real-Time Measurement and Control

cancel
Showing results for 
Search instead for 
Did you mean: 

RT EXE Crashes in 2019, not in 2018

I have a relatively large application that was first developed in 2017, then updated to 2018 and now I'm looking to get it working in 2019.  The application runs on an cDAQ-9136 running Linux RT.  Primarily the application uses XNet, some small amount of DAQmx for single point slow stuff, it has several RS-232 communications to instruments, and it has ethernet for other instrument control, and reporting to the host.  It isn't OO heavy but does have a few classes but most of the application is broken up into libraries.  It uses Queues, DVRs, VIMs, but I don't think any Channel Wires, and nothing new to 2019 like Maps or Sets.

 

The application ran fine in 2018 but now that I updated to 2019 and rebuilt it, at times during testing it crashes.  I run the application on RT startup, and it seems to be working fine with Network Stream communication to the host showing it is connected.  FTP works, and the device is seen and responds in MAX.  I also have LEDs blinking on the controller to show it is running and I've set the tester to log to a file if it was set to shutdown gracefully.

 

Leaving the application running on the RT seems to be stable.  The application has been running without any issue for days just sitting idle waiting for a command.  But if I command the software to start performing a test the application crashes after a few hours.  By crash I mean the application is no longer running.  The LEDs stop blinking, Network Streams stop responding, and logging stops.  It still has FTP access, but in MAX it often just hangs and doesn't respond, but sometimes I can command it to restart at which point it starts on startup and works fine until a test is commanded again.  As I mentioned I can tell the software isn't going through a graceful shutdown by the way devices are left, and the absents of a log.

 

Going to the View Error Log in MAX there appears to be something at the time of the crash and it is attached.  I also attached the MAX report for the controller.  In LabVIEW 2016 I experienced similar crashing on RT.  In that case RT had a major bug where trying to access a reference that became invalid would cause RT to just stop executing with little output.

 

To keep testing going I'm likely going to have to revert these testers back to 2018 so my ability to debug this is limited however I was able to cause a crash on a stand without having any device under test so it is possible to try to send the project to NI.  This post is really just to ask the question on how can I help isolate the issue or get more information to NI to investigate it?  I'll likely try to just keep disabling parts of the application until the issue isn't seen but since it takes a while to happen it may take a while.

Download All
0 Kudos
Message 1 of 2
(2,352 Views)

We worked with Hooovahh on this via an service request, but I want to provide a summary of the issue here as well.

 

We were able to reproduce the hang provided. The application was starving some OS specific threads on core 0. This is something that NI has seen in previous version of software. There is a chance that this may have become more likely due to kernel upgrades and specific driver changes. At the moment, we do not plan to take any further action on this. We will monitor this hang root cause, and if we see an increase in reports, we can investigate further.

 

Our recommendation is to move Timed Structures and other high priority tasks off of core 0 using the RT SMP CPU Utilities. This is a best practice to make sure that OS processes that run on Core 0 are not starved.

 

Thanks,

Andy 

Staff PSE

Message 2 of 2
(2,169 Views)