‎02-12-2013 12:12 PM
(RFC means "request for comments". This is a brainstorm discussion on a topic I consider to be an open question with no recommended design pattern at this time.)
This is a (long) question about systems with multiple actors that are more than just command processors -- actors that have their own state machine behaviors that they work on when not receiving messages. These are actors that have overridden Actor Core.vi so they can do work independent of the directives of their caller. This is a large dump of my knowledge of the problem space and questions about best practices in this arena. If I've missed anything (because I have not had to write systems like these but once or twice in my life), please chime in to correct me.
First Situation: You have an actor to control a robot (there are various nested actors for its subsystems, but those are internal to the robot's operation so we'll ignore them and assume the main actor is coordinating all internal operations). The robot grabs a widget from a conveyor belt whenever one appears (no regular schedule). The widget is missing parts. The robot analyzes the widget, goes to a wall of available part bins, finds the missing part and adds it to the widget, then puts the widget back on the belt. Exactly which part is missing is different for each widget, and sometimes there's more than one part missing.
Commentary: This is a straightforward actor to write. I'm definitely not saying that control systems are easy to write. I am saying that the state machine for the robot is straightforward to conceive and the set of external messages is small: "widget arrives" and "stop" are about the only messages that have to be sent to the actor. It's output is likewise small, limited to things like "need to order more parts".
Second Situation: Two such robots, each with its own conveyor belt and with its own wall of part bins.
Commentary: Programming this is the same as the first... you launch two copies of the same actor, one for each robot. There's no additional complexity.
Third Situation (the interesting case): Two such robots, but grabbing widgets from the same conveyor belt and taking parts from the same wall.
Commentary: Now we have a more interesting messaging situation. The external messages remain the same, although the external system probably needs to handle the possibility of both robots sending the "order more parts" message at the same time to avoid double ordering parts. But the internal situation is massively more complex. To begin with, they need to negotiate which one will grab a given part off of the conveyor belt. This is likely easy ... if the first robot is working on a widget already, let the part go to the second robot in line -- a kind of priority ordering. But then they need to not collide when going to the wall of parts. Each robot needs to the part it needs efficiently. The exact part that each robot needs isn't known in advance, and sometimes a robot needs more than one part, so its path to the wall may involve moving along the wall before it returns to the conveyor belt.
Although it might not seem like it at first, this is a resource allocation problem, where the resource in question is "empty space". This problem is identical to problems of claiming other system resources, like exclusively using a particular DAQ channel or broadcasting on a particular frequency without others being on the same frequency. It can even be mapped to data manipulation problems, like claiming a table in a database, a file on disk, or a region of a graph data structure. Somehow these actors must coordinate their activities.
The strategies that I know of divide into two broad categories: the strategies that avoid collision and the strategies that allow collision and deal with it.
Allow Collision And Deal With It
Most famous of these strategies is the most common solution to the "two wireless devices both need to talk on the same frequency". This strategy is called the "Exponential Backoff Algorithm". Device A starts talking on the channel. If no cross-talk occurs during an initial test broadcast, it assumes everyone else heard the test broadcast and won't start talking themselves, so Device A proceeds with the main content of its message. But suppose Device B starts talking at the same time as Device A? Now both devices hear cross-talk. The solution? All other devices heard the cross-talk, so they know to stay silent until the all-clear is signaled. The two* colliding devices both stop sending and pick a random number of milliseconds to wait, and then try again. If one of them rolls a lower random number than the other, it will start broadcasting and the other will stay quiet. If they collide again, they each pick a new random number, but this time they square the range. So if the first roll was 1 through 10, the second is 1 through 100. This increases the probability that they will not roll the same number. Repeat until a collision is avoided.
If you've ever seen two people try to walk through the same door or try to dip their chips in the same bowl of salsa, you've probably seen this algorithm in action.
The list of collision detection algorithms is long. But from an actor perspective, these are quite straightforward -- the only new message that gets added to the system is "collision occurred" and then the actor deals with it. There may be some messages to work out the details between the colliding actors, but these are nicely scoped and do not represent a continuous message-passing burden on the system as a whole. You don't even have to let the robots actually physically touch -- seeing that a robot has entered a zone around another robot is the same.
But consider this: the impact sensors or vision sensors that allow one robot to know it hit another robot are just another form of message passing. The message may be passed by photons or vibrations, but it is passed. And when you don't have sensors that can see the impact and generate messages that the collision occured, you have to write that messaging system yourself. An easy example is two actors, each wanting to pick a unique identifier for itself. How does one actor "see" that another actor has picked the same identifier? You have to introduce some system where an actor announces "I picked this identifier". As soon as you start building that, your problem is now functionally equivalent to the collision avoidance algorithms.
* various amendments to this strategy handle three or more devices colliding.
Avoid The Collision Entirely
Here's where things get more interesting. To have collision avoidance (which is functionally equivalent to computing collision detection), you need some way to allocate resources among actors. And this is where the open questions start, for me. I have two actors. Is there a most efficient strategy for resource allocation? Is it the same strategy if I have three or more actors to coordinate? If there is not a general efficient strategy, what are the strategy options and what use cases would lend themselves best to which strategies? If there are two co-equal strategies, is one easier to program than another?
Here are the strategies that I know about for resource allocation:
There are variants that combine several of these strategies, but this list covers all the basic strategies I can recall learning about. Do you know more that could be added to the list? Do you have recomendations? Pros/cons?
‎02-13-2013 04:55 AM
AristosQueue wrote:
Third Situation (the interesting case): Two such robots, but grabbing widgets from the same conveyor belt and taking parts from the same wall.
Commentary: Now we have a more interesting messaging situation. The external messages remain the same, although the external system probably needs to handle the possibility of both robots sending the "order more parts" message at the same time to avoid double ordering parts. But the internal situation is massively more complex. To begin with, they need to negotiate which one will grab a given part off of the conveyor belt. This is likely easy ... if the first robot is working on a widget already, let the part go to the second robot in line -- a kind of priority ordering. But then they need to not collide when going to the wall of parts. Each robot needs to the part it needs efficiently. The exact part that each robot needs isn't known in advance, and sometimes a robot needs more than one part, so its path to the wall may involve moving along the wall before it returns to the conveyor belt.
My initial thought on reading this is that one should have "Widget Conveyor" and "Part Wall" actors. The two robots would negotiate with those two actors to get permission to go to those resources. "Part Wall" could include "air-traffic control" to allow two robots to go to the wall at the same time.
Looking at your list, this seems most like "Central actor as resource allocator", though with separate actors for each resource.
‎02-13-2013 12:35 PM
The particulars of the example are kind of irrelevant to the question at hand... I put that up so that it was clear how we were getting into this situation. I'm not actually looking to solve that use case. More, I'm interested in all the possible answers for this and any similar coordination challenge.
Now, having said that, if we consider the exact problem, you say you would have gone for "Central actor as resource allocator". My first thought was Chain of Command. One robot is primary and drives whatever route it wants between the conveyor belt and the wall of parts and just assumes nothing gets in its way. The second asks permission for any given route and if told "no", picks a different route and asks permission for that. What pros/cons would you see between the two solutions in this case?
‎02-13-2013 12:49 PM
Most of my work is on embedded industrial systems where a 'collision' is very bad, so my bias is strongly towards slower solutions that are utterly paranoid about avoiding collisions.
I see the Chain of Command case as being somewhat dangerous, especially this part: "This model allows the high-priority actor to quit and the other actors keep going." I worry about the corner case where we lose communication to the higher-level actor, at which point we could have multiple actors who each think they are the highest priority in the system, and the ensuing chaos.
I would personally default to option 1 or 2 unless other requirements made those unfeasible (such as performance requirements, scalability concerns, fault tolerance if the central resource allocator/air traffic control had a problem, etc). In my mind, they seem much simpler to test and debug as well, since you remove several of the concurrency considerations.
Essentially, in a paranoid system, I want each actor to assume that by default it does not have control of a resource, rather than assuming that it does have control unless told by an overriding actor that it does not.
To be fair, I have never tried implementing most of these strategies on a real system.
‎02-13-2013 07:42 PM
MattP wrote:
I worry about the corner case where we lose communication to the higher-level actor, at which point we could have multiple actors who each think they are the highest priority in the system, and the ensuing chaos.
Surely a disconnect is easy to tell apart from a proper hand-shake goodbye. I'd be surprised if that was an issue. Still, it's worth keeping in mind as an issue if you have a network connection. I don't generally have those crop up in most of my projects -- all actors in my systems are, for the most part, operating on the same machine in the same process. Reason #4553 to avoid using real hardware. 😉
MattP wrote:
I would personally default to option 1 or 2 ... In my mind, they seem much simpler to test and debug as well, since you remove several of the concurrency considerations.
Let's take a closer look at that statement. I would have figured that having the chain of command would be less complex than a central coordinator. You have N actors needing scheduling, with each one talking only to the one above it and the one below it, compared with N+1 actors all talking to the +1. On the flip side, the latency is longer in chain because a lowest level request has to pass through N-1 layers of approval, as opposed to everyone's request being handled by the central implementor. Not sure which one ends up more complex.
‎02-14-2013 04:29 AM
AristosQueue wrote:
The particulars of the example are kind of irrelevant to the question at hand... I put that up so that it was clear how we were getting into this situation. I'm not actually looking to solve that use case. More, I'm interested in all the possible answers for this and any similar coordination challenge.
The general pattern I applied was to identify all the "things" involved and assign a dedicated actor to each one. Two robots, one conveyor and one wall equals four actors. And only one "thing" per actor, so a central repository of multiple resources would, if used, have to be an additional actor.
AristosQueue wrote:
Now, having said that, if we consider the exact problem, you say you would have gone for "Central actor as resource allocator". My first thought was Chain of Command. One robot is primary and drives whatever route it wants between the conveyor belt and the wall of parts and just assumes nothing gets in its way. The second asks permission for any given route and if told "no", picks a different route and asks permission for that. What pros/cons would you see between the two solutions in this case?
One could mix things a bit by having the primary robot create and own the resource actors, and pass thier addresses down the chain of robots. If the primary robot shuts down (taking the resource actors down with it) the new primary creates new resourse actors.
Like MattP, I was thinking of network communication being involved somewhere, and that introduces a potential safety issue in "Chain of Command", depending on the application.
‎02-14-2013 12:37 PM
AristosQueue wrote:
MattP wrote:
I would personally default to option 1 or 2 ... In my mind, they seem much simpler to test and debug as well, since you remove several of the concurrency considerations.
Let's take a closer look at that statement. I would have figured that having the chain of command would be less complex than a central coordinator. You have N actors needing scheduling, with each one talking only to the one above it and the one below it, compared with N+1 actors all talking to the +1. On the flip side, the latency is longer in chain because a lowest level request has to pass through N-1 layers of approval, as opposed to everyone's request being handled by the central implementor. Not sure which one ends up more complex.
The use case I was thinking of was not having a singular central resource coordinator that handles all resources everywhere, but rather a single resource coordinator per resource (or collection of very similar resources). I also had in mind the type of application where resource requests are relatively sparse (such that the resource coordinator is not constantly inundated with messages - this would be something that would cause me to not use this approach). Scaling it out to N, I completely agree it does not make as much sense as Chain of Command. For smaller or well-constrained values of N, however, I see it giving an implementation advantage.
In the applications I work with, high reliability first, low latency second are the constraints we usually work under.
I also see it being relatively easier with a central implementor to get a debug view of which items are trying to get access to a resource at a given moment in time, which is something I like having as an indication of system health. Sure, it could be done with any of the other systems, but implementation seems most straightforward with this approach.
‎02-14-2013 03:15 PM
drjdpowell wrote:
The general pattern I applied was to identify all the "things" involved and assign a dedicated actor to each one. Two robots, one conveyor and one wall equals four actors. And only one "thing" per actor, so a central repository of multiple resources would, if used, have to be an additional actor.
I added this as a new pattern in my original post (#7). I like the idea for some things, like use of DAQ channels. How do you establish an "actor" for driving across a floor where empty space is the resource? Is there an actor for every tile square?
‎02-14-2013 04:00 PM
AristosQueue wrote:
How do you establish an "actor" for driving across a floor where empty space is the resource? Is there an actor for every tile square?
Well, one has to make judgements about what a counts as an independent thing, applying your #8 strategy in some cases.
‎02-14-2013 04:10 PM
AristosQueue wrote:
drjdpowell wrote:
The general pattern I applied was to identify all the "things" involved and assign a dedicated actor to each one. Two robots, one conveyor and one wall equals four actors. And only one "thing" per actor, so a central repository of multiple resources would, if used, have to be an additional actor.
I added this as a new pattern in my original post (#7). I like the idea for some things, like use of DAQ channels. How do you establish an "actor" for driving across a floor where empty space is the resource? Is there an actor for every tile square?
Maybe the air traffic control method could work in a literal sense. Actors file their movement intentions, air (floor) traffic control replies with amendments, assuming that this controller is aware of the grid of resources (floor tiles). Some notion of impending collision detection (is ref locked, is space full, etc) when moving to next 'unit' as well.