Tuesday 15 November 2011

Akka 1.2 Made it Simple to Get Started



Dear Junior

The release of Akka 1.2 is about a month old, but it contains a change that is makes it much easer to try out Akka. It is not some revolutionary new feature, quite the contrary. They have simplified the package structure of Akka so much that the core Akka actor package now has no external dependencies. Correct: no, none, zip external dependencies - stands for itself.

Earlier, if you wanted to try out Akka, you had the problem that Akka relied on a ton of other open source libraries. So, if you just wanted to try out to write a simple actor, you still had to download a long list of other jars. Well, that can be handled by maven, or by the wonderful little tool "sbt" (Simple Build Tool), but still that is a threshold to get over before getting started. And that threshold has noting to do with "understanding actors".

In Akka 1.2 they have done a wonderful job of splitting out dependencies and managed to refine to "core actor" parts to be self-contained, i e it does not require any other libraries in place to work.

To get started writing your first Akka actor, simply download Akka 1.2 and put "akka-actor-1.2.jar" on your classpath. Ready to start hakking.

An Example

In InteillJ, I create a new project "akka12" in a library with the same name:

dajob05:akka12 danbj$ pwd
/Users/danbj/var/tmp/akka12
dajob05:akka12 danbj$ ls -l
total 8
-rw-r--r-- 1 danbj staff 796 15 Nov 15:55 akka12.iml
drwxr-xr-x 2 danbj staff 68 15 Nov 15:55 src

Well, I happen to use IntelliJ, but any other modern IDE would work as well.



Now, I create a "lib" directory for third party libraries (i e Akka), download akka and put the jar in "lib"

mkdir lib
curl http://akka.io/downloads/akka-actors-1.2.zip >akka-1.2.zip
unzip akka-1.2.zip akka-actors-1.2/lib/akka/akka-actor-1.2.jar
mv akka-actors-1.2/lib/akka/akka-actor-1.2.jar lib/

Now the self-contained Akka jar is in the lib directory.

dajob05:akka12 danbj$ ls lib
akka-actor-1.2.jar

I add the jar as a project library in my IntelliJ project, and now I can create an actor class together with a small script that sends a message to that actor.

import akka.actor.Actor

object helloworld extends App {
  val worlder = Actor.actorOf[Worlder]
  worlder.start()
  worlder ! "hello"
}

class Worlder extends Actor {
  def receive = {
    case msg => println(msg + " world")
  }
}

Running it (from inside the IDE) gives the expected

hello world

I just love what the Typesafe crowd have done to make it so easy to get started.

Happy hakking

Yours

   Dan

Thursday 13 October 2011

Scala Actors and Estragon's Bad Memory


Dear Junior

The event-based actor model in Scala is obviously so much more memory-efficient than the thread-based, so there must be a catch somewhere.  There sure is, the price we pay is that the code does not always behave in the way we intuitively expect. And unfortunately the coding model does not shield us completely from those situations.

The reason I think this is important is that I want to walk the road towards Akka. Once we get there it will be obvious why Akka’s design give us both an efficient runtime model and a carefully guiding programming model.

Unintuitive Effects of Event-Based Model

Let us return to the programming model of event-based actors in Scala standard library, and the things that might baffle us.

One of the things that might happen is that inside one instance of actor, consecutive lines of code might be executed by different threads. That is not what we expect.

Let us pick apart an event-based actor to see what is going on, and the same with a thread-based actor – just for reference.

Dissecting Thread-Based Vladimir

To see how that can happen, let us start with a simple thread-based actor – our dear friend Vladimir. In a small script, we create an actor, start it and send it the message (Godot) that it is waiting for. OK, in the play Waiting for Godot, the point is that Godot never shows up, but let us be nice to poor Vladimir.

object vladimirwaitandrun extends Application {
  def T() = "T" + Thread.currentThread().getId
  println("Creating actor " + T)
  val vlad = new Vladimir(42)
  println("Starting actor " + T)
  vlad.start()
  println("Sending Godot message " + T)
  vlad ! Godot
  println("Finished script " + T())
}

The Godot-waiting actor Vladimir is implemented as a subclass to “actors.Actor” of the Scala standard library.

class Vladimir(id : Int) extends Actor {
  def T = "T" + Thread.currentThread.getId
  def name: String = "Vladimir" + id

  def act() {
    println(name + " is waiting " + T)
    receive {
      case Godot =>
        println(name + " saw Godot arrive! " + T)
    }
  }
}

All set up – let us run the script.

danbj$ scala godot.vladimirwaitandrun
Creating actor T1
Starting actor T1
Vladimir42 is waiting T11
Sending Godot message T1
Finished script T1
Vladimir42 saw Godot arrive! T11

First we note that the script and the actor are run by different threads: T1 and T11 respectively – no surprise. We also note that both lines of Vladimir where run by the same thread. No surprise that either, the thread-based model gives each actor a thread and when the actor pauses (waiting for Godot), the thread pauses.
But let us walk through this just to see the difference when we come to the event-based model.
  • An actor of the class Vladimir is created and started – its “act” method start executing in a thread of its own (T11) starting with a print.
  • The actor executes the method “receive” (still in thread 11). The call to receive takes an argument, which is the so-called “message handler” – a code-block to be used when matching the incoming message. To be precise, the argument is an anonymously defined function, which is passed as the argument to “receive”.  
  • As there is no message in the actor’s mailbox, “receive” puts the thread (T11) to sleep. 
  • Message “Godot” is dropped into Vladimir’s mailbox (by T1). Waiting thread T11 is notified. Thereafter the script finishes.
  • T11 wakes up and starts applying the message-handler to the arrived message. There is a match in the (only) case-clause and T11 prints “Vladimir42 saw Godot arrive! T11”
  • Execution of message handler is finished. As there is no more code in “act” the thread returns.

OK, there are more subtleties going on, but this will do for now.

Dissecting Event-Based Estragon

Now let us look at the same thing happening with an event-based actor: our dear friend Estragon.

object estragonwaitandrun extends Application {
  def T() = "T" + Thread.currentThread().getId
  println("Creating actor " + T)
  val estragon = new Estragon(42)
  println("Starting actor " + T)
  estragon.start()
  println("Sending Godot message " + T)
  estragon ! Godot
  println("Finished script " + T)
}

The Godot-waiting actor Estragon is also implemented as a subclass to “actors.Actor”, and the only difference is that it uses another method for awaiting a message in the mailbox. Instead of “receive” it uses “react”.

class Estragon(id : Int) extends Actor {
  def T = "T" + Thread.currentThread.getId
  def name: String = "Estragon" + id

  def act() {
    println(name + " is waiting " + T)
    react {
      case Godot =>
        println(name + " saw Godot arrive! " + T)
    }
  }
}

All set up. Let us run this.

danbj$ scala godot.estragonwaitandrun
Creating actor T1
Starting actor T1
Estragon42 is waiting T10
Sending Godot message T1
Finished script T1
Estragon42 saw Godot arrive! T13

Now, let us walk through this run and see what actually happened.
  • An actor of the class Estragon is created and started. A thread (T10) is taken from the thread pool and connected to the actor. The thread start running the “act” method, which starts with a print.
  • The actor executes the method “react” (still in T10). The call to react takes an argument, which is the so-called “message handler” – a code-block to be used when matching the incoming message. To be precise, the argument is an anonymously defined function, which is passed as the argument to “react”. 
  • As there is no message in the actor’s mailbox, “react” disconnects the thread from the actor. The message-handler is registered at the event handler. The thread goes back to the thread pool.
  • Message “Godot” is dropped into Estragon’s mailbox from the script (by T1). The event-handler is notified. The script thereafter finishes.
  • The event-handler picks an available thread (T13) from the thread pool and connects it with the actor. The thread is given the message-handler to run and starts applying the message-handler to the arrived message. There is a match in the case-clause and T13 prints “Estragon42 saw Godot …”
  • Execution of message handler is finished. Thread is disconnected and goes back to pool.

Note that there are two different threads (T10 and T13) involved in executing the actor Estragon42. Having a pool of threads does not only mean that one thread can serve many actors (thus conserving resources). It does also mean that different threads might be involved in running the same actor during its lifespan. 

To be honest, I must admit that I did run the script a few times before I got a run with different threads in “is waiting” and “saw Godot arrive”. The allocation of threads is non-deterministic from the point of view of the program, and the first few runs happened to reuse same thread in both phases – but that would not serve to make my point.

The Difference in Short

Let us focus on the core difference between these two examples; first thread-based Vladimir.

    println(name + " is waiting " + T)
    receive {
      case Godot =>
        println(name + " saw Godot arrive! " + T)
    }

Here the same thread (T10, marked as green), execute the consecutive lines. It executes the println and the receive, makes a small pause, and finish with the matching case clause and the last println.

This is execution of code as we learned in Programming 101.

Now, time for event-based Estragon

    println(name + " is waiting " + T)
    react {
      case Godot =>
        println(name + " saw Godot arrive! " + T)
    }

Here two threads are involved. First one (marked blue) execute println and react, ending with registering the message-handler. Then at a later point of time some other thread (marked purple) executes the case-clause and the last println.

The result is that consecutive lines of code inside the same actor object is executed by different threads.

Not what we learned in Programming 101.

Does it Matter?

OK, but does it matter? Unfortunately there are situations where this makes a difference for us as programmers. For example, many security frameworks take user credentials of the authenticated user and stuff it into thread locals. In that way the credentials need not to be passed around explicitly but can be fetched from the thread when needed. Alas, that does not work if the code “suddenly changes horses midrace”.

There are also other situations where the code behaves in unintuitive ways due to this “thread switching” and where the programming model does not shield us programmers from strange effects and risk of making errors. More on that later. 

Estragon’s Bad Memory

On a side not the difference in execution model can also explain one aspect of the play Waiting for Godot. In the play, Estragon suffers from a severe memory condition. When the Act II starts on the morning of the second day, it seems like Estragon has no recollection at all from what happened in Act I the previous day. This is enormously frustrating for his companion-in-waiting Vladimir, who clearly remembers how they waited in vain for Godot to show up.

Now given the knowledge about threading it is obvious why Vladimir remembers the previous day. Thread-based Vladimir is in Act II connected to the same thread of execution as in Act I – so the events in Act I happened to “the same memory-line” as Act II. Event-based Estragon on the other hand is not necessarily connected to the same thread in Act II as he was in Act I, so Estragon in Act II is not “the same memory-line” as Estragon in Act I. In a way Estragon experience the same situations as a person with multiple-personality disorder. Even if it is the same Estragon-body (object, actor), it is not the same Estragon-mind (thread) from time to time.

That might be an explanation for Estragon’s bad memory. It would be interesting to diagnose it more closely, like if we could probe his mind for what is going on from time to time.

Yours
   Dan

Friday 16 September 2011

Public final and data encapsulation


Dear Junior

We were discussing immutable value objects and using "public final" data-fields for their representation. In that discussion I started off mixing up immutability and encapsulation. Now that we have covered immutability, let us return to encapsulation.

Obviously, declaring a field as public will break data encapsulation. You lose your freedom to change data representation without having to bother the clients.

Let us have a look at a name-class with some actual usage.

public class Name {
    public final String fullname;

    public Name(String fullname) {
        if(!fullname.matches("[a-zA-Z\\ ]+"))
            throw new IllegalArgumentException();
        this.fullname = fullname;
    }

    public String[] names() {
        return fullname.split(" ");
    }
}

The public API of this class consists of three parts: the construction of a name from a string, the attribute “fullname” (accessible through property data field), and the attribute “names (accessible through accessor method).

An example of what client code looks like we can find in the tests.

public class NameTest {

    private final String danbjson = "Dan Bergh Johnsson";

    @Test
    public void shouldHaveFullNameAsAttribute() {
      Assert.assertEquals(danbjson, new Name(danbjson).fullname);
    }

    @Test(expected = IllegalArgumentException.class)
    public void shouldNotAllowWeiredCharsInName() {
      new Name("#€%&/");
    }

    @Test
    public void shouldSplitIntoNamesAtSpaces()  {
      Assert.assertEquals(
        new String[] {"Dan", "Bergh", "Johnsson"},
        new Name(danbjson).names());
    }

}

Imaging that we want to change the internal data representation to using a char-array instead. Doing so will break all the clients as they rely on having that public data-field “fullname”. Thus, it is a bad design. Or?

I would argue that it is still a good design. Using modern tools it is easy to add the encapsulation when needed. Applying "Encapsulate Field" in e g IntelliJ yields.

public class Name {
    private final String fullname;

    public Name(String fullname) {
        if(!fullname.matches("[a-zA-Z\\ ]+"))
            throw new IllegalArgumentException();
        this.fullname = fullname;
    }

    public String[] names() {
        return fullname.split(" ");
    }

    public String fullname() {
        return fullname;
    }
}

And of course the client code has been changed accordingly.

    @Test
    public void shouldHaveFullNameAsAttribute() {
        Assert.assertEquals(danbjson, new Name(danbjson).fullname());
    }


Now, I could have chosen "getFullname" as the method name for the new method. However, I have always found that naming convention a little bit awkward, the property is “full name” and adding a boilerplate “get” does not add any value in my opinion. By the way, JavaBeans is just one naming convention in Java, the naming convention for CORBA predates JavaBeans. In the CORBA convention if you have a property “fullname” of type String, then the way to access it was to call a method “String fullname()” and the way to change the property was to call a method  “void fullname(String)”. So the convention I use is not new to Java at all.

In some languages there is no distinction in syntax between accessing a field and calling a no-arg method. Had the code been written in Eiffel or Scala, the syntax would have been the same and there would be no change to the client code at all.

Now that we have encapsulated the usage via “fullname()” we can change the internal representation to a char array.

public class Name {
    private final char[] chars;

    public Name(String name) {
        if(!name.matches("[a-zA-Z\\ ]+"))
            throw new IllegalArgumentException();
        this.chars = name.toCharArray();
    }

    public String[] names() {
        return new String(chars).split(" ");
    }

    public String fullname() {
        return new String(chars);
    }
}

Just for reference, in Scala the corresponding change would start with this "final public" representation – here represented by the keyword “val”.

class Name(val fullname: String)
{
  if(!fullname.matches("[a-zA-Z\\ ]+")) throw new IllegalArgumentException();

  def names = fullname.split(' ')
}

The change would take us to the slightly more verbose char array representation. Here we have no “val” in external class declaration, but a private val-field hidden inside the class-block instead.

class Name(name: String)
{
  if(!name.matches("[a-zA-Z\\ ]+")) throw new IllegalArgumentException();
  private val chars = name.toCharArray

  def fullname = chars.toString

  def names = fullname.split(' ')
}


In conclusion: As the public final field is accessible by the clients, it is also a part of the API for Name: This breaks data encapsulation. If you want to change data representation, you will need to create an accessor method to which you direct the client.

In Scala or Eiffel this is a no-issue, as the new accessor could transparently have the same name as the old datafield ("fullname") and be accessed with exactly the same syntax ("name.fullname"). However, in Java you have to include an empty pair of parenthesis to call the accessor method - thus the client code must be changed.

Now, I consider this a small issue, as there is excellent refactoring support in modern IDEs that eliminate the change to a fully automated four-click, one-minute exercise. Thus, there is no point in designing for that change up front.

So, even if we *do* break data encapsulation, I would say that it is not much of a problem.

Now, there are other kinds of encapsulations that are more interesting than data encapsulation, but that analysis will have to wait for some other letter.

Yours

   Dan

Monday 12 September 2011

Public final is also Immutable, Again

Dear Junior

I must apologise as in my last letter I mixed two separate discussions into one: one about immutability, and one about encapsulations. What I was after was immutability, even though encapsulation is also interesting.

To make thinks clear, let us return to the Name class for representing value objects for a name e g "Dan Bergh Johnsson". To focus on immutability, let us simplify it.

public class Name {
    public final String fullname;

    public Name(String fullname) {
        this.fullname = fullname;
    }
}

Now, if I create an object of this class, then no client can mutate the state of that object. This is because

  1. The datafield is final so the client cannot make the reference point to some other String 
  2. Strings are immutable, so the referred object cannot be changed 

Just for reference. I mentioned that Scala has a very elegant way of defining "public immutable datafield properties". The corresponding Scala class would be a one-liner.

class Name(val fullname: String)

Elegant, isn't it? "Name is a class with the attribute value 'fullname' of type String". Scala promotes the use of immutable constructs by making it simple to declare them.

Now, as my friend Tommy Malmström pointed out to me, I should also make my class final. This is a very valid and interesting point, so let me dig into.

The problem in this case is not the objects we construct ourselves, but objects that are sent to us. We should rightfully assume that such objects are also immutable. However, someone could wittingly and deviously create a mutable subclass of Name.

Back in Java land such a subclass could for example overrride toString

package names;

class EvilName extends Name {
    public EvilName(String fullname) {
        super(fullname);
    }

    public String toString() {
        return "Voldemort";
    }

    public static void main(String[] args) {
        Name name = new EvilName("Harry Potter");
        System.out.println(name);
        System.out.println(name.fullname);
    }
}

danbj$ java names.EvilName
Voldemort
Harry Potter

You see how confusing it could be when toString is used to render the name object "Harry Potter".

So, forbidding such overrides by declaring the Name class as final is really a good idea.

On a side note, this is the reason why String is declared final. The String class is for example used to represent class and package information when loading code dynamically over the network - so imagine the consequences had it been possible to make a mutable phoney String.

Well, that was about immutability.

On the issue of encapsulation it can be argued whether "public final String fullname" breaks encapsulation or not - and that discussion also depends on what encapsulation we mean. Any way, that is a separate discussion.

Yours

   Dan

Friday 9 September 2011

"public final" is also Immutable

Dear Junior

Immutable value objects are one of my favourite programming idioms. I really like how they aid and ease the burden of the rest of the code by taking care of small pieces of complexity. When they are based on the concepts of the domain they become yet another magnitude more valuable. 

Implementation-wise they are most often a primitive type wrapped up in a protecting box. So it is pretty natural that a "name" is stored with a String containing the full name. Wrapped together in the box is probably as well some complexity, like the validation of the name and some interpretations of the data - in this case the logic to split a full name into its parts. At the end of the day, it is not so interesting to encapsulate the data as such - it is encapsulating the interpretation of the data that is crucial. In Java such a name class would look like this.

public class Name {
    public final String fullname;

    public Name(String fullname) {
        if(!fullname.matches("[a-zA-Z\\ ]+"))
            throw new IllegalArgumentException();
        this.fullname = fullname;
    }

    public String[] names() {
        return fullname.split(" ");
    }
}

Now, one thing worth noting is the datafield "fullname". It plays double roles both as data storage and as an attribute. Should we not have a private field and an accessor method instead?

Well, the integrity of the object is still guaranteed as

  • the datafield is final so the referred object cannot be exchanged 
  • the referred object (String) is immutable so the referred object cannot be changed 

So, yes, people can get to the field from the outside, but they cannot break anything. Of course there is the question that if you change the data representation, then you will break the clients.

However, that is no big deal. If the situation should arise, we can apply the refactoring "encapsulate field" to introduce a new method "String fullname()" and replace every access to the field with a call to that method instead. Checking the entire codebase for accesses to "fullname" might be a large task. But, guess what, using a modern IDE there will be a menu item in the "Refactoring" menu that will do exactly that - fully automated.

The alternative would be to have that code in from the start. However, I cannot see that there is a point in paying the overhead of more lines of code in the meantime.

Making a field public does not break encapsulation. The important encapsulation is the interpretation and constraints of the data that is found in the constructor validation and the logic of the method "names()".

By the way: in Scala you would not be able to see the difference between a public field and a method with the same name. Nice.

Yours
   Dan

Sunday 7 August 2011

Scala Actors are Just Code - No Magic

[Note: This letter was originally written about Scala actors when Akka was still a separate framework. Nowadays (fall 2015), Akka have become the standard actor framework for Scala, as well as close to a de-facto standard for Java. Syntax looks different when defining an actor, but the basic idea still holds - there is no language magic, just a library of very clever code.]

Dear Junior

First time I tried out actors in Scala I though: "OK, there is a fair amount of magic going on here". I have later realised that there is actually no magic at all involved, but I would like to share with you my misconception and how it cleared out.

I am talking about syntax like
vlad ! Godot
and like
receive {
    case Godot => ...
}
The good part is that once I had my misconceptions cleared out, it was much easier to understand some weird parts of the Scala standard actors, and the path to understanding the design of Akka became much clearer.

What I Mean with Magic

Let me for a brief moment clarify what I mean with "magic" in this context. A program consists of two parts - those I can build myself, and those where magic occurs. And when there is magic, there are often some magic words involved.

Take for example the thread model of Java (and the JVM) where objects can be "locked". Down in the runtime, each object has a lock associated with it and this lock can be "obtained" using the magic formula
synchronized (objectToLock) { … }
Here “synchronized” is a special construct to handle those locks. There is no way for me to create a lock with the same functionality through "ordinary programming". I have to wave my wand and utter that precise magic phrase - then I conjure the object-locks down in the runtime to do my bidding.

In language design these constructions are sometimes referred to as "special forms" because they often have special syntax.

Another example of special forms is the for construct in Scala
for(vlad <- vladimirs) { vlad.start() }
There is no way for me as a programmer to build a construct that behaves the same way. The special syntax is built into the language, and reserved for some special behaviour decided by the language designers. For me as a programmer – some magic occurs. I can understand its effect, but not reproduce how it works.

The actor syntax in Scala definitely looked like this kind of magic to me.


The Stage

Let me start with what confused me. To simplify the discussion slightly, there are two places where there seem to be "magic" involved about actors: on the outside, and on the inside.

Allow me to reuse my example of the actor "Vladimir" with inspiration from Waiting for Godot.

On the "outside" of the actors we have the way one actor sends a message to another actor. A really simplified example is a script that creates an actor and then sends a message to it.
object vladimirwaitandrun extends Application {
    val vlad = new Vladimir(42)
    vlad.start()
    vlad ! Godot
}
In this example we assume that the class "Vladimir" is defined as an actors, and that "Godot" is a simple message type (defined as a case class). Now what looks very "magic" to me is the line where the message is sent.
vlad ! Godot
This does not look like ordinary Scala syntax to me. It looks like one of those special forms with its associated magic going on.

We will later see that I was mistaken here.

On the "inside" we have how the actor reacts on messages that are passed to it. In Scala standard actors this happens inside the "act" method of a class that extends "actors.Actor"
class Vladimir (id : Int) extends actors.Actor {

    def name: String = {
        "Vladimir" + id
    }

    def act() = {
        println(name + " is waiting ")
        receive {
            case Godot =>
            println(name + " saw Godot arrive! ")
        }
    println(name + "'s wait is over ")
    }
}
In this code the "receive-block" matches out arrived messages and run the appropriate code for the message.
receive {
    case Godot =>
        println(name + " saw Godot arrive! ")
}
Well, this is the thread-based model, there is also the event-based model where "receive" is replaced with "react" and does something very similar. The difference is beside the point for this discussion, but this discussion is crucial to understand the difference and its subtleties.

To me, this receive-block looks like a "special form" that invokes some magic. If anything, it reminds me of the synchronized-block mentioned earlier.

We will later see that I was mistaken here as well.

To make the setting of the stage complete, let us run the script that creates an actor and sends a message to it, a message upon which the actor reacts.
danbj$ scala godot.vladimirwaitandrun
Vladimir42 is waiting
Vladimir42 saw Godot arrive! 
Vladimir42's wait is over


My Erlang Heritage 

One way to explain my misconception is of course my preconceived notion based on my previous experience. When I first learned actor programming some fifteen years ago, I did so using Erlang.

In Erlang actors are called "processes" and they interact by dropping messages to each other. The receiving process reacts on the message and does something, very often to send other messages to other processes. The syntax for this message passing and handling is very tightly woven into the heart of the language.

The syntax for dropping a message to another process looks like
NetlistenProcess ! terminateSignal
and the syntax for receiving and reacting upon an incoming message looks like
receive
    onhook ->
        disconnect(),
        idle();
end

Looks familiar, does it not?

Now, in Erlang both the exclamation mark (!) and "receive" are special forms in the language - syntax with a special meaning that unleashes some magic going on deep down in the runtime. 

Seems not to far-fetched to assume same thing was going on in Scala.

However, I was mistaken. In Scala actors are just plain old objects with perfectly normal method calls. It is not even special syntax involved.

No Magic on the Outside 

Now let us pick apart the "special syntax" for passing messages to see that there is actually no magic at all involved. We take a fresh look at the message-passing script.
object vladimirwaitandrun extends Application {
    val vlad = new Vladimir(42)
    vlad.start()
    vlad ! Godot
}
We start with noting that Vladimir is in fact an object. It is an object that is instantiated from the class Vladimir and that object is referred to with the reference "vlad".
val vlad = new Vladimir(42)
Nothing magic here really.

What makes "vlad" an actor is that the class Vladimir is a subclass of Scala actors.Actor.
class Vladimir(id : Int) extends actors.Actor {
This means that all the methods that are defined in actors.Actor are inherited to Vladimir, and thus are available to call using the reference "vlad". And one of those methods are named "!" (pronounced “bang”).

As you are as curious as I am, we can even have a look at the definition of that method. We find it in ReplyReactor which is a trait that Actor makes use of.
override def !(msg: Any) {
    send(msg, …)
}
So, “bang” is just a method, which we call on the actor to receive the message. We now realise that the strange syntax is just one way to write an ordinary method call.
vlad ! Godot
To make it totally obvious that this is just a method call, we can use the alternative syntax – the one that looks a little bit more familiar to people used to the syntax of C++/Java/C#.
vlad.!(Godot)
Oh, yes, that means exactly the same thing. It compiles and runs the same way.
danbj$ scala godot.vladimirwaitandrun
Vladimir42 is waiting
Vladimir42 saw Godot arrive! 
Vladimir42's wait is over 
Now, I do agree that the exclamation mark feels like a weird method name. Now, remember that in ReplyReactor everything method “bang” did was to delegate to method “send”.
override def !(msg: Any) {
    send(msg, …)
}
To make things more “object-readable” we can inline the body of “bang” and use “send” instead. Only difference is that “send” takes another argument, but for the purpose of this discussion that does not make any difference – we can send in null there in this case.

The “send” method in its turn is just an ordinary method. It takes the argument (the message) and stuffs it into the mailbox – which is just some kind of list. There the message will sit waiting for processing, but that is not anything that concerns the “send” method. In other words – it is a very ordinary method.

There is no need to dig into details, but there is no magic going on. Now we can change the script to use the alternative syntax and "send" as method.
object vladimirwaitandrun extends Application {
    val vlad = new Vladimir(42)
    vlad.start()
    vlad.send(Godot, null);
}
Now, this looks like plain use of an object, does it not? No magic, just a method call.

No Magic on the Inside Either 

Now, on the inside of the actor we have the magic "act" method, which contains
receive {
    case Godot =>
        println(name + " saw Godot arrive! " + threadid)
}
Now, surely this must be some "special form" syntax with associated magic?

To start with, there is nothing magic with the method "act". It is just a choice of name for the code where the actor has its code for "this is what you do". In a way, it is just like the "run" method in the Java interface "Runnable" - just a way to point out what code to run in a separate thread when the actor is started.

When it comes to the recieve-block it turns out to be no magic there either. In fact, "receive" is just a utility-method in the framework class "actors.Actor". Remember that we created the actor by letting the class Vladimir subclass "actors.Actor".
class Vladimir(id : Int) extends actors.Actor {
Thus our class Vladimir contains all the methods that are in Actor. And one of those methods is "receive". In the code-block it looks like "receive" is a keyword, but it is just a call to the method "receive" defined in the super-class. We can make this a little bit clearer by writing that explicitly.
this.receive {
    case Godot =>
        println(name + " saw Godot arrive! ")
}
Now the code-block with the case clause seems to be hanging into thin air. But it does not. According to the syntax of Scala it is the argument for the method. We can make that clearer by putting the argument inside parentheses.
this.receive ({
    case Godot =>
        println(name + " saw Godot arrive! ")
})
Still looks a little bit weird?

Well, remember that in Scala a function (like "sqr") is also a value. And, as such, it can be passed as an argument to a method (like the list-method "map" for example). Now, the code-block we see is just a function defined "in-line". We can make things a little bit less convoluted by giving that function a name through a "val"-declaration.
val messagehandler: PartialFunction[Any, Unit] = {
    case Godot =>
        println(name + " saw Godot arrive! ")
}
this.receive (messagehandler)
I agree that the type of “messagehandler” becomes pretty weird – those partial functions are not always trivial to wrap your head around. But the important point is that the “messagehandler” is just a Scala value.

Let us revise the last line to have a look at what has happened to the “receive-block” that looked like a special form.
this.receive (messagehandler)
Viewed this way we see that "recieve" is just a method of the actor class. That method scans the mailbox for messages and does something when it finds a match. Now, to do this, the receive method must be told what to match and what to do about matches - an that is exactly what the (partial) method "messagehandler" does.

So the “special form” suspect seems to be just an ordinary method call. Just to ensure there is no magic, let us sketch if we could implement it ourselves.

Looping through the mailbox (a list) and searching for matches does not sound to complicated. Basically we just try to apply the message-handler and let it do its job. Of course there might be tricky corners, but there does not need to be any magic involved.

The only remaining magic is what "receive" does when there is no match. In that case it waits until there arrives a message, and then it tries to match the message-handler onto the newly arrived message.

Well, we can build even that part ourselves. If the mailbox is empty (or there are no matches) we can suspend the thread in a wait-state where we poll the mailbox at regular intervals. Even better, we can use the "notify" mechanism in the JVM where "receive" can call "mailbox.wait()" using the Java API. To wake up the thread we can let "send" (i e method "!") contain a "mailbox.notify()" so that the actor will resume its thread and rescan the mailbox.

And, basically, this is what the "thread-based actors" do. It is just someone else that has written the code for us.


No Magic at All 

The syntax for sending messages ("!") and for receiving and reacting upon them "recieve {…}" looked like a lot of magic. However, it turned out that they where just ordinary methods defined on an ordinary class - the class "actors.Actor".

This is what is meant when people say that "Scala actors is not a language feature, it is a library". In other words: Scala actors are just ordinary code. The only thing that is special with that code is that it is pre-packaged together with the Scala download. However, it has no special status over any code written by you and me.

This design is actually very lucky for Scala. It has opened up for alternative actor frameworks to be written. The most prominent is Akka (champed by Jonas Bonér), and it has become so successful that the plan is to replace the standard actor library with Akka instead.

Of course, Akka is not magical either. It is just code which is very well designed and written. But in principle it is not different from code that could have been written by you and me. No magic involved.

Yours

Dan

ps Even if the Scala actors are not magic they are pretty cool anyway; cool enough to understand the execution models.

Friday 27 May 2011

How Heavy is Estragon - Event-Based Scala Actor

Dear Junior

As we have seen thread-based actors are quite an intuitive model, but is not very efficient in conserving resources. Its main drawback is that each actor needs a thread each, and each thread will take some memory - about 60-70kB.

Also, this amount of threads seem unnecessary as the thread is only used a potion of the time - so the threads could be pooled instead.

So, let us create the Estragon version - an actor that drowses off and takes a nap whenever he is not actively needed. Thus the thread can be used by some other actor that wants to be active at the moment.

class Estragon(id : Int) extends Actor {
  def threadid = {
    "T" + Thread.currentThread.getId
  }

  def name: String = {
    "Estragon" + id
  }

  def act() = {
    println(name + " is waiting " + threadid)
    react {
      case Godot =>
        println(name + " saw Godot arrive! " + threadid)
    }
    println(name + "'s wait is over " + threadid)
  }
}

In the code the difference from thread based Vladimir is really small - we use the method "react" instead of "receive". The rest of the API is the same: we subclass from Actor, we define an "act"-method. The only difference to get event-based actors (using a thread pool) instead of thread-based actors (having one thread each) is to switch from "receive" to "react".

Simple? Yes, until we get into details and subtleties later.

Now let's run the script for the play putting Estragon and Vladimir on stage and starting them.

object waitingforgodot extends Application {
  println("Setting the stage " +
    Thread.currentThread.getId)
  val estragon = new Estragon(1)
  val vladimir = new Vladimir(2)
  println("Starting the play")
  estragon.start
  vladimir.start
  println("Main script is over")
}


danbj$ scala godot.waitingforgodot
Setting the stage 1
Starting the play
Main script is over
Estragon1 is waiting T10
Vladimir2 is waiting T11
^C

Well not very exiting. Both Vladimir and Estragon start and enters into waiting-state, waiting for Godot. Only difference is their respective ways of waiting.

Vladimir keeps his thread, just putting it into a waiting state. Deep down under this is implemented through some "object.wait()". So whenever he gets out of his waiting state (when the message Godot finally arrive), the same thread can process the message-handler i e the receive-block.

Estragon on the other hand discards his thread when going into "react". As he has no immediate use of the thread, it is given back to the pool to serve some other actor that need to run. So whenever Godot arrives to Estragon the message-handler will be activated and run by some thread, not necessarily the same as earlier.

Nevertheless, the point is that when Estragon enters "react", the thread is no longer occupied but be used by other actors. So even if we have a lot of actors we still can manage with just a few threads.

This becomes more obvious if we create lots of Estragons

object estragongalore extends Application {
  override def main(args: Array[String]) {
    val actors = Integer.parseInt(args(0));
    val ids = 0 until (actors)
    val estragons = ids map (id => new Estragon(id))
    println((actors) + " actors on stage")
    for(estr <- estragons) { estr.start() }
  }
}

scala godot.estragongalore 10
10 actors on stage
Estragon0 is waiting T10
Estragon1 is waiting T11
Estragon3 is waiting T12
Estragon2 is waiting T13
Estragon5 is waiting T10
Estragon7 is waiting T10
Estragon8 is waiting T10
Estragon9 is waiting T10
Estragon6 is waiting T11
Estragon4 is waiting T12
^C

Estragon 0, 1, 2, and 3 where started in separate threads. But, the next actor to start (Estragon5) could use thread T10 that had been used by Estragon0 and returned to the pool. So to create 10 actors we only needed four actor threads.

That should conserve a lot of resources.

Remembering that my poor laptop cringed when we put 2500 Vladimirs on stage? Let us see how many Estragon we can put on stage. It ought to be more as this model reuses the threads. What about 10 000 actors?

danbj$ scala godot.estragongalore 10000 
10000 actors on stage
Estragon3 is waiting T12
Estragon2 is waiting T13
Estragon0 is waiting T10
Estragon1 is waiting T11
Estragon4 is waiting T11
…
Estragon9998 is waiting T12
Estragon9935 is waiting T11
Estragon9989 is waiting T13
Estragon9956 is waiting T10
Estragon9999 is waiting T12
^C
10 000? No problem. Let us tenfold that.
danbj$ scala godot.estragongalore 100000 
100000 actors on stage
Estragon0 is waiting T10
Estragon3 is waiting T13
Estragon2 is waiting T12
…
Estragon99991 is waiting T10
Estragon99999 is waiting T11
Estragon99998 is waiting T12
Estragon99996 is waiting T13
^C
100 000 worked fine. What about a million actors on stage?
danbj$ scala godot.estragongalore 1000000 
1000000 actors on stage
Estragon1 is waiting T13
Estragon0 is waiting T10
Estragon3 is waiting T12
…
Estragon163350 is waiting T10
Estragon163351 is waiting T10
Estragon163352 is waiting T10
java.lang.OutOfMemoryError: Java heap space
 at scala.concurrent.forkjoin.LinkedTransferQueue.xfer(LinkedTransferQueue.java:187)
 at ...
 at scala.actors.Scheduler$.execute(Scheduler.scala:21)
 at scala.actors.Reactor$class.dostart(Reactor.scala:222)
 at ...
 at godot.Estragon.start(waitingforgodot.scala:42)
 at ...
At last we got an OutOfMemoryError. This time we did not get it when trying to start a new thread, but somewhere in the scheduler instead. It turns out that 900 000 actors is just short of what a 256M heap can handle.
danbj$ scala godot.estragongalore 900000
900000 actors on stage
Estragon0 is waiting T11
Estragon2 is waiting T10
…

danbj$ ps -m -O rss
  PID    RSS   TT  STAT      TIME COMMAND
18186 345524 s001  S+     0:37.07 /usr/bin/java -Xmx256M …
RSS is "real memory" in kB and we note again that the scala startup script restricts heap to 256M per default. Let us plot memory use for some different number of actors.
actors    RSS
     1    61516
  1000    62044
  2500    61460
 10000   71452
100000  101588
900000  345524
So memory consumption is roughly 60MB in startup and 3kB per actor. That is a lot better than 60kB per actor for thread-based. That is pretty good. Now we can structure our systems using a lot of actors because the "overhead payload" of using an actor is not overwhelming. This is by the way an area where the framework Akka excels. In short: switching from thread-based actors to event-based actors is not more complicated than changing from "receive" to "react". Not at a syntactic code level at least.
However, we also change execution model. We are no longer guaranteed that our actor is run by the same thread. In fact, the thread that registers the message-handler by running "react" might not be the same thread that later runs the message-handler itself. And that might give us a clue to Estragon's bad memory. Yours Dan

ps Estragons bad memory might be explained if we dissect how the two execution models work.