Tuesday, July 01, 2008

What else you need to consider for designing well-behaved Erlang APIs

When you design a framework or a library in an implementation language, you need to play to the rules of the game. While implementing a library for a functional language, you need to design functions that are referentially transparent, side-effect-free and easily composable with the existing platform. Similarly for an object oriented implementation, there are well-published idioms and design patterns (factories, builders etc.) that the language espouses, which your contracts need to honor. e.g. designing a module in a statically typed language like Java, you need to design abstractions based on interfaces that can be easily injected using dependency injection. While DI is not a particularly forceful consideration for implementing the same functionality in Ruby that offers classes as soft abstractions that can be gutted out very easily at runtime.

The more succinct a language is, the more subtle it is to design usable APIs, conforming to the best practices of the language. And it becomes increasingly difficult when the language starts offering more and more intellectual surface area along multiple orthogonal axes of variability. You need to think in terms of adapting your APIs to all the axes and try to make them extensible across all of them.

Erlang is one language that makes you think in at least one additional dimension while designing new functionalities. Being a functional language, dynamically typed, you need to think of all the usual stuff (as mentioned above) to come up with an extensible design of your module. Like other functional languages, functions are the basic units of abstraction in Erlang. But there is also a dynamic view of how functions live within processes in Erlang. Functions are static entities that get their lives when passed around in graphs of linked processes spawned by the Erlang virtual machine. Just as in an OO language like Java, you have objects as the means to encapsulate state and identity of the abstraction, in Erlang you design processes that provide encapsulation of states through published message interfaces. And processes can run anywhere - in your local machine, in another virtual node in the same machine, in another machine on the same subnet / domain, or in some place else anywhere within the internet.

When you design APIs in Erlang, you need to think of distribution and concurrency as well. This is because transparency of distribution is baked into the language itself. And your module needs to be compliant with all the characteristics of a distributed ecosystem.

Designing APIs in Erlang needs you to make the additional consideration for transparent distribution.

I was going through this thread in the Erlang mailing list over the last weekend. The thread discusses alternatives for designing a mocking framework in Erlang to improve testability of Erlang programs.

In Erlang the basic unit of execution is a process, which is supposed to be an ultra-lightweight abstraction that can be spawned in millions on your commodity hardware. A typical program uses spawning of processes as ..

Pid = spawn_link(fun banana:loop/0).

Erlang supports hot code swapping, whereby you can change your code and immediately test impact of your code changes, without recompilation and without a millisecond of server downtime. The running process receives new messages on the fly, loads the new code and manages multiple versions simultaneously - a feature of the Erlang virtual machine that is fairly unique, and will remain so, till we see the OSGi stuff maturing into a comparable offering.

But the problem with the above is that the function name is hardcoded while spawning the process. Here the function name acts as the interface that scaffolds native Erlang message processing within it. Hot swapping of code is possible by sending new messages to processes and the VM will allow dynamic loading of the new code to perform the new task. But the interface module hiding the message processing makes it impossible to change the function name and replace it with a mock.

One approach discussed is to purge the existing real banana module and replace it with the mock using code:load_abs/1. However that will not allow a generic enough mock and will have to be another banana. And the approach will suffer from the downside that tests can't be run safely in parallel with the real module in the same node, since module namespace is node-wide. Finally Christian suggested implementing a module having an interface similar to erlang:error_handler and redefined undefined_function and undefined_lambda to set the mock module into process dictionary and load it on error (similar to method_missing of Ruby). Have a look at the thread for the gory details ..

Anyway, this post is not about hot code swapping in Erlang. The interesting bit about the above mentioned thread is how distribution-transparency features as a concern in a discussion that apparently does not seem to be related to mocking abstractions to improve testability.

In languages like Java, distribution is a totally extraneous concern to the language features and is addressed as a separate deployment concern using grid computing frameworks. In some respect, the POJO model of Java development shines in the fact that you need to consider only the domain problem at hand viz. mocking, in this case. Additional frameworks take care of distribution and concurrency issues - a good example of separation of concerns. But, at the same time, introducing new frameworks has its own downsides, the impedance mismatch that it brings and makes distribution concerns look like bolted from outside. With Erlang, it is a unified thought process, since the language is designed for distribution and reliability, and with a little thinking ahead, you can design robust, distributable frameworks *only* with the help of the native language features and libraries.

Still thinking, which one is better ..


Cedric said...

Java is based on the idea that "everything is local", Erlang, "everything is remote".

Both concepts are flawed, but at least, Java can evolve into a world that is a mix of local and remote, while Erlang will be forever stuck in its "all remote" mode, which, according to Joe Armstrong himself, makes it impossible to write code that performs well (http://is.gd/KIl).

Unknown said...

Cedric -

The difference between the 2 paradigms is that in case of Erlang, extreme remoting becomes feasible and mathematically viable, because of the inherent statelessness of the architecture. There are no shared states that we need to replicate across servers. This is the innate strength of the message passing paradigm. While in the Java world, we are still looking for the holy grail of clustering solution that will give me performant and scalable replication of states across JVMs. The combination of Scala actors and Terracotta clustering may point towards the right direction, but still all of existing Java libraries never guarantee immutability and that is where the combination falls flat. We need the OTP equivalent platform for the JVM.