Actor Crashing?

JohnG3k · ‎11-03-2021

I have a small AF project, with 6 actors, and one appear to crash after a minute of so of running. By crash I mean that it just stops running. It hasn’t received a stop message, it doesn’t send a last Ack message, it just stops. There is no error on the error out of the method that last ran before it stops. Further attempts by its parent actor to send messages results in 1556 as all actor queues have been released. This can be seen in the dett output below. This also shows that the actor is just stopping, and not shutting down, as none of the queue references are closed properly.

The actor that stops is called Event Logger. This is a child Actor to the actor called Event Manager. Event Manger has a ‘ProcessEvent’ time delay message that it receives every 500ms, that flushes a queue containing event objects and then sends a ‘New Events To Log’ message to Event Logger with these objects (during testing have verify that this queue never gets more that 2 or 3 items before getting flushed). The ‘New Event To Log.vi’ generates a string array from the event objects and logs it to disk.

The Event Logger, just receives the same ‘New Events To Log’ message every 500ms, and nothing else. All actors start up and shutdown without any problem (when the program is shutdown before Event Logger crashes… and even after it crashes, all other actors shut down properly). But if I let the program run, after about 55 seconds, the Event Logger just stops. It is happily processing messages up till then, and then abruptly after completing the Receive Message.vi, the Actor.vi stops execution (see dett trace).

If I change the log rate to 200ms it crashes after 25 seconds, every time. With a log rate of 1000ms it crashes every 115 seconds. So it seems related to message count, as if there is a queue or something filling up. But I read all the actor messages, and there aren’t any other queues or user events, or anything associated with this actor to fill up.

I have commented out the code that processes and logs the event data, and still have the same behaviour, so it doesn’t seem to be related to file issues.

If I comment out the code in the ProcessEvents.vi that sends the ‘New Event To Log’ message, then Event Logger does not crash. I just ran the code for over an hour, and Event Logger didn’t stop (until requested to). So it seems after Event Logger receives the New Events To Log message about 100 times, it just stops.

I am puzzled… anybody have any ideas?

Windows 10 Pro, LV2020 SP1 32bit

cbutcher · ‎11-03-2021

A couple of vague guesses/questions (numbered for ease of reference):

Are the "New Events To Log" messages sent with Low priority? (Not saying you should, just curious)
Does the time (/number of messages) to crash change when you remove the file I/O and logging?
Is it easy to check the actual number of Messages and Events (I guess ~2-3x the number of Messages, based on your description) that are handled before it stops?
Can you verify by checking the output file the behaviour of the logger? That is, does the file just stop growing? What size does the file have at this point? (I suppose not that large, unless you're making really really long strings for each entry).

paul.r.r · ‎11-04-2021

Does the event logger have an override of handle error? What does it do? An actor can't stop without handle error being called, so something is triggering handle error to be called - if you don't have an override, any handle error call will stop the actor. If you're looking for a quick debug option, just throw a dialog into your handle error override that displays the error received.

BertMcMahan · ‎11-04-2021

MGI's Monitored Actor toolkit has a function that lets you log messages sent to the actor. It might be helpful in getting an exact count.

Any chance you could upload a minimum working example?

JohnG3k · ‎11-04-2021

Thanks everyone for ideas... as I laid awake in bed last night unable to stop thinking about this issue, I finally figured it out. Unfortunately I'm realizing now the key piece of information was not in my original post.

cbutcher - time to crash did not change after removing logging. I believe it was around 100 messages and 200 event to crash.

paul.r.r - that was my though too... can't stop without the error handler being called. But yet, somehow it did. Just stopped executing without any cleanup.

BertMcMahan - thanks for the tip... I will look into that. I wasn't aware of this tool.

In case you are curious, the issue was related to the events. There are several different types of events that are created (Actor Started/Stopped, Error, CustomDebug), much like dett, but accessible to the application for testing/debug/and some control. There are also Send and Receive events created each time a message is sent and received (this is done with a slight mod to ActorFramework.lvlib). I was clever enough to realize that creating a send Event for the message that is sent to the Event Logger would cause a feedback issue, so I prevented those events from being created. But didn't think this through enough to realize that this also applied to the Receive Event.

Essentially what was happening was when the Event Logger received the 'New Events To Log' message, it created a Receive Event object that contains the received message (as all actors in the system do). This is then added to the Event Queue (like all other event objects), which is later read by the Event Manager, which generates another 'New Events To Log' message that is sent to the Event Logger. The Receive Event event created this time will now also contain the Receive Event created last time, and so on and so forth, building a larger and larger nested structure each time through this multi actor loop.

It wasn't a memory problem as I was monitoring this and it didn't move much, so I'm guessing there is just some internal limit to LabVIEW in terms of the size of a nested structure it can handle. Seems to be about 100, based on when the actor stopped. It is annoying that no error was generated, and the process just shutdown.

I have implemented the fix to not put the Receive Event on the Event queue for these messages, and have successfully run for several minutes without the actor crashing.

drjdpowell · ‎11-04-2021

For interest, I did a quick test of how many Objects inside Objects I could do (using Messenger-Library Messages), and it failed at 227, although I got a "stack overflow" error dialog. I don't know why you didn't get this message.

paul.r.r · ‎11-09-2021

Well, eating my words here as yes, you are correct an actor can stop without handle error being called if LabVIEW 'crashes' the actor, for lack of a better word. In my experience when LabVIEW crashes, it brings down the whole application, but that's not what's happening here. It seems this issue just brings down the vi hierarchy where the nesting occurred - I hadn't seen this before. This behavior persists in an exe as well - this is somewhat concerning and I think someone from NI should take a look. No notification of any sort means actors in my application could be crashing without me knowing about it. Attached is a simple project that replicates the issue.

drjdpowell · ‎11-09-2021

@paul.r.r wrote:

In my experience when LabVIEW crashes, it brings down the whole application, but that's not what's happening here. It seems this issue just brings down the vi hierarchy where the nesting occurred - I hadn't seen this before.

If the whole application is one VI hierarchy (no async stuff) then the whole application is stopped. But it has always been the VI hierarchy that stops.

I don't understand why you aren't getting an error dialog, though.

Note that the "leaked" references can be used to tell when an async VI has stopped for any reason, allowing one to act on that info (log, restart actor, shutdown app).

Actor Framework Discussions

Actor Crashing?

Actor Crashing?

Re: Actor Crashing?

Re: Actor Crashing?

Re: Actor Crashing?

Re: Actor Crashing?

Re: Actor Crashing?

Re: Actor Crashing?

Re: Actor Crashing?