11-03-2021 07:45 PM
I have a small AF project, with 6 actors, and one appear to crash after a minute of so of running. By crash I mean that it just stops running. It hasn’t received a stop message, it doesn’t send a last Ack message, it just stops. There is no error on the error out of the method that last ran before it stops. Further attempts by its parent actor to send messages results in 1556 as all actor queues have been released. This can be seen in the dett output below. This also shows that the actor is just stopping, and not shutting down, as none of the queue references are closed properly.
The actor that stops is called Event Logger. This is a child Actor to the actor called Event Manager. Event Manger has a ‘ProcessEvent’ time delay message that it receives every 500ms, that flushes a queue containing event objects and then sends a ‘New Events To Log’ message to Event Logger with these objects (during testing have verify that this queue never gets more that 2 or 3 items before getting flushed). The ‘New Event To Log.vi’ generates a string array from the event objects and logs it to disk.
The Event Logger, just receives the same ‘New Events To Log’ message every 500ms, and nothing else. All actors start up and shutdown without any problem (when the program is shutdown before Event Logger crashes… and even after it crashes, all other actors shut down properly). But if I let the program run, after about 55 seconds, the Event Logger just stops. It is happily processing messages up till then, and then abruptly after completing the Receive Message.vi, the Actor.vi stops execution (see dett trace).
If I change the log rate to 200ms it crashes after 25 seconds, every time. With a log rate of 1000ms it crashes every 115 seconds. So it seems related to message count, as if there is a queue or something filling up. But I read all the actor messages, and there aren’t any other queues or user events, or anything associated with this actor to fill up.
I have commented out the code that processes and logs the event data, and still have the same behaviour, so it doesn’t seem to be related to file issues.
If I comment out the code in the ProcessEvents.vi that sends the ‘New Event To Log’ message, then Event Logger does not crash. I just ran the code for over an hour, and Event Logger didn’t stop (until requested to). So it seems after Event Logger receives the New Events To Log message about 100 times, it just stops.
I am puzzled… anybody have any ideas?
Windows 10 Pro, LV2020 SP1 32bit
11-03-2021 11:27 PM
A couple of vague guesses/questions (numbered for ease of reference):
11-04-2021 09:49 AM
Does the event logger have an override of handle error? What does it do? An actor can't stop without handle error being called, so something is triggering handle error to be called - if you don't have an override, any handle error call will stop the actor. If you're looking for a quick debug option, just throw a dialog into your handle error override that displays the error received.
11-04-2021 10:49 AM
MGI's Monitored Actor toolkit has a function that lets you log messages sent to the actor. It might be helpful in getting an exact count.
Any chance you could upload a minimum working example?
11-04-2021 01:06 PM
Thanks everyone for ideas... as I laid awake in bed last night unable to stop thinking about this issue, I finally figured it out. Unfortunately I'm realizing now the key piece of information was not in my original post.
cbutcher - time to crash did not change after removing logging. I believe it was around 100 messages and 200 event to crash.
paul.r.r - that was my though too... can't stop without the error handler being called. But yet, somehow it did. Just stopped executing without any cleanup.
BertMcMahan - thanks for the tip... I will look into that. I wasn't aware of this tool.
In case you are curious, the issue was related to the events. There are several different types of events that are created (Actor Started/Stopped, Error, CustomDebug), much like dett, but accessible to the application for testing/debug/and some control. There are also Send and Receive events created each time a message is sent and received (this is done with a slight mod to ActorFramework.lvlib). I was clever enough to realize that creating a send Event for the message that is sent to the Event Logger would cause a feedback issue, so I prevented those events from being created. But didn't think this through enough to realize that this also applied to the Receive Event.
Essentially what was happening was when the Event Logger received the 'New Events To Log' message, it created a Receive Event object that contains the received message (as all actors in the system do). This is then added to the Event Queue (like all other event objects), which is later read by the Event Manager, which generates another 'New Events To Log' message that is sent to the Event Logger. The Receive Event event created this time will now also contain the Receive Event created last time, and so on and so forth, building a larger and larger nested structure each time through this multi actor loop.
It wasn't a memory problem as I was monitoring this and it didn't move much, so I'm guessing there is just some internal limit to LabVIEW in terms of the size of a nested structure it can handle. Seems to be about 100, based on when the actor stopped. It is annoying that no error was generated, and the process just shutdown.
I have implemented the fix to not put the Receive Event on the Event queue for these messages, and have successfully run for several minutes without the actor crashing.
11-04-2021 06:06 PM
For interest, I did a quick test of how many Objects inside Objects I could do (using Messenger-Library Messages), and it failed at 227, although I got a "stack overflow" error dialog. I don't know why you didn't get this message.
11-09-2021 09:15 AM
Well, eating my words here as yes, you are correct an actor can stop without handle error being called if LabVIEW 'crashes' the actor, for lack of a better word. In my experience when LabVIEW crashes, it brings down the whole application, but that's not what's happening here. It seems this issue just brings down the vi hierarchy where the nesting occurred - I hadn't seen this before. This behavior persists in an exe as well - this is somewhat concerning and I think someone from NI should take a look. No notification of any sort means actors in my application could be crashing without me knowing about it. Attached is a simple project that replicates the issue.
11-09-2021 12:02 PM
@paul.r.r wrote:
In my experience when LabVIEW crashes, it brings down the whole application, but that's not what's happening here. It seems this issue just brings down the vi hierarchy where the nesting occurred - I hadn't seen this before.
If the whole application is one VI hierarchy (no async stuff) then the whole application is stopped. But it has always been the VI hierarchy that stops.
I don't understand why you aren't getting an error dialog, though.
Note that the "leaked" references can be used to tell when an async VI has stopped for any reason, allowing one to act on that info (log, restart actor, shutdown app).