What’s Wrong with the Actor Model

Although it existed for many decades, the Actor Model started gaining momentum in the mid 2000s. It was first introduced by Carl Hewitt and Henry Baker in 1977 in “Laws for Communicating Parallel Processes“. The popularization of the actor model was partly due to weakening of Moore’s Law. Processors were not becoming faster anymore so we needed to use multiple cores and start writing parallel software… or at least so it seemed. The industry first started turned to some existing implementations like Erlang, and more recently to some new ones like Akka.

So what is this Actor Model anyway? It’s actually pretty simple: it is the good old message passing paradigm with a few additions. You can read the original paper, but Wikipedia provides a nice summary of what actors are:

An actor is a computational entity that, in response to a message it receives, can concurrently:

  • send a finite number of messages to other actors;
  • create a finite number of new actors;
  • designate the behavior to be used for the next message it receives.

The Actor Model is meant to simplify the notoriously difficult world of concurrent programming. Notice that these three operations can be performed “concurrently”. In practical terms this may reintroduce the problems that the Actor Model tries to solve. But this is not the problem I wanted to talk about, this is actually circumvented by most frameworks by using a sequential execution model inside the actor.

A much bigger problem is the misunderstanding of the actor model caused by all the marketing. You will often find actors sold as high availability, no-deadlock solutions that make it really easy to write safe distributed and concurrent applications. It is enough to write a simple implementation of the dining philosophers to discover that the truth is far away from this. The Actor Model is actually a very powerful and experts-only model where you really need to know what you’re doing. It is not a safe environment and it’s fairly easy to write deadlocks and it promotes spaghetti code. If you want a safe concurrent and fault tolerant programming model you should use EJBs with container managed transactions.

So what is the Actor Model good for then? Precisely for EJBs container managed transactions: if you want to implement your own clustering algorithm with distributed transactions or any kind of distributed algorithm you should consider using the Actor Model as the underlying mechanism. If you are familiar with model checking, temporal logic or any kind of correctness proving of concurrent or distributed algorithms you might notice that the definition of the Actor Model is a very fitting mathematical abstraction for proving correctness.

When an actor receives a message it will: 1) transition into a different (consistent) state, 2) possibly create other actors and 3) send messages to other actors. We will call this an actor transition. An actor model implementation must ensure that these operations are performed atomically or at least as three steps in the sequence described. It is also important that the actor receives one message at a time and will not receive the next message until it has performed all of these steps. The actors should also behave in a deterministic way, which means that given a state and a message they should always make the same actor transition. Actors make transitions only in response to messages. Unlike a generic process, an actor will not spontaneously change its state and nothing outside the actor will change its state.

Now we can describe the whole distributed system as a collection of actors, each with it’s own state, and a collection of circulating messages.

A system transition is made on any of the following:

  • an actor transition,
  • an actor crashes
  • a message is lost
  • an external message is introduced in the system (event)

Given a system state, we will always know what are the possible system transitions from there. This allows us to reason in a mathematical way about what states are reachable and non reachable. Given a specific algorithm that we want to implement, we will typically define what constitutes an incorrect state and ideally prove that no such state is reachable. Alternatively we can also prove that a certain incorrect state is reachable and thus our algorithm is flawed.

In general testing concurrent and distributed software is difficult and unreliable. Having a programming model that closely resembles the mathematical proof makes it much easier to test things. First of all, since each actor transition is single threaded, deterministic and terminating, it is much easier to test the actors in isolation and check whether actors correctly implement actor-transitions. Second, it is possible to simulate certain scenarios of the whole system including processes crashing and message losses. This doesn’t guarantee that our concurrency logic is correct, but it can detect some faulty scenarios and we can write tests to ensure that changes to the system don’t break these properties. If we are really cool, we can also use property based testing and do the model checking directly on the implementation.

In the real world there is one big obstacle for things to work this way. It’s the receive() statement implemented by almost every major actor framework. This completely breaks the atomicity of actor transitions, it gives them the ability to block, it makes it difficult to test actors and difficult to map the entire system to a mathematical model that can be used in correctness proofs. Receive should be a method implemented by actors, not invoked.

The truth is that for a lot of algorithms, the code that uses threads and blocking requests is way more readable than an actor based implementation. Actors are inherently unreadable because the logic is spread out in different places. Adding the receive() statement slightly alleviates the readability problem at the expense of depriving us from all the nice features that the actor model, as described in the original paper, gave us. The current actor frameworks are a compromise that has the disadvantages of both programming models: threads and actors.

In conclusion there is nothing wrong with the actor model if implemented and used correctly, which very rarely happens to be the case.


One comment

  1. What is WordPress thinking?


    Just wanted to let you know those video ads on your blog that unpause themselves after a while are extremely annoying. Especially when they are the same two videos. Great article though, had to print it as a pdf to continue reading. :/

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s