Monday, April 20, 2009

Towards Combinator based API design

I was listening to the presentation by Alex Payne, the Twitter API lead, that he delivered at Stanford very recently. A nice presentation that sums up the philosophies and practices that they followed in the evolution of APIs for Twitter. In case you are interested in API design, Josh Bloch also has a great video up as part of a Javapolis interview that discusses in details all the nuances of designing good and robust APIs.

These days, we are getting more and more used to interesting programming languages, which, apart from being powerful themselves, also mostly happen to share the wonderful ecosystem of common runtimes. As Ola Bini noted sometime back on his blog, it is a time when we can design the various architectural layers of our application in different languages, depending on the mix of robustness, type-safety and expressiveness that each of them demands. Client facing APIs can be focused more towards expressiveness, being more humane in nature, a pleasure to use, while at the same time offering modest error handling and recovery abilities. But whatever the layer, APIs need to be consistent, both in signature and in return values.

One of the very important aspects of API design is the level of abstraction that it should offer to the user. The ideal level comes out only after a series of exploratory evolutions, refactorings and user implementations, and can often lead to loss of backward compatibility within the existing user community.

One of the very powerful ways of API design that many of today's languages offer is the use of combinators. I have blogged on uses of combinator in concatenative languages like Joy - it is truly a great experience as a user to use such compositional API s as part of your application design. Here is one from my earlier post on Joy combinators. It finds the arithmetic mean of a list of numbers ..

dup  0  [+]  fold  swap  size  /


The API is beautiful in the sense that it evolves the intention ground up and makes use of smaller combinators in building up the whole. This is beautiful composition.

In one of the comments to my earlier post, James Iry mentioned "I remain unconvinced that concatenative languages are really buying much over applicative languages, as interesting as they are". Since then I have been dabbling a bit with Haskell, a pure functional language that does many things right with the notion of static typing, offering powerful point free capabilities along with rich combinator composition ..

A simple pointfree sum ..

sum = foldr (+) 0


and a more complex map fusion ..

foldr f e . map g == foldr (. g) e


The main point is to seek the beauty of API design in expressiveness of the contract through effective composition of smaller combinators. The biggest advantage of combinators is that they offer composability i.e. the value of a bigger abstraction is given by combining the values of its sub-abstractions. And the power of composability comes from the world of higher order functions and their ability to combine them in your programming language, just as you would do the same in mathematics.

Object orientation is not so blessed in this respect. Composition in OOP is mostly confined to designing fluent interfaces that make expressive APIs but can be made useful only in a limited context and of course, without the purity that functional abstractions espouse. The Builder design pattern is possibly the most famous compositional construct in object oriented languages, and often lead to sleak APIs. Here is a great example from Google Collections MapMaker API ..

ConcurrentMap<Key, Graph> graphs = 
  new MapMaker()
    .concurrencyLevel(32)
    .softKeys()
    .weakValues()
    .expiration(30, TimeUnit.MINUTES)
    .makeComputingMap(
      new Function<Key, Graph>() {
        public Graph apply(Key key) {
          return createExpensiveGraph(key);
        }
      }
    );



But Java, being a language that is not known to offer the best of functional features, it is often quite clumsy to compose abstractions in a fluent way that can offer consistent, rich and robust APIs that match the elegance of functional combinators.

Scala is not a particularly rich language for pointfree programming. But Scala offers great library support for combinators. Of course the secret sauce to all these is the rich support of functional programming that Scala offers. Parser combinators that come with Scala standard library help design external DSL s with enough ease and convenience. Quite some time back I had blogged on designing a combinator based DSL for trading systems using Scala parser combinators.

The main power of combinators come from the fact that they are compositional, and it is the presence of non composable features that make combinators hard in some languages. And one of them is shared mutable state. Paul Johnson had it absolutely right when he said "Mutable state is actually another form of manual memory management: every time you over-write a value you are making a decision that the old value is now garbage, regardless of what other part of the program might have been using it". Languages like Erlang enforces confinement of mutable state within individual processes, Scala encourages the same through programming practices, while Haskell, Clojure etc. offer managed environments for manipulating shared state. Hence we have composability in these languages that encourage combinator based API design.

Combinator based API design is nothing new. It has been quite a common practice in the worlds of Haskell and other functional languages for quite some time. Simon Peyton Jones described his experience in ICFP 2000 in applying combinator based API design while implementing a financial system for derivative trading. It was one of those trend setter applications in the sense that "the ability to define new combinators, and use them just as if they were built in, is quite routine for functional programmers, but not for financial engineers".

Monday, April 13, 2009

Objects as Actors ?

Tony Arcieri, creator of Reia, recently brought up an interesting topic on unifying actors and objects. Talking about Scala and his disliking towards Scala's implementation of actors as an additional entity on top of objects, he says, it would have been a more useful abstraction to model all objects as actors. Doing it that way would eschew many of the overlapping functions that both of the object and actor semantics have implemented today. In Reia, which is supposed to run on top of BEAM (the Erlang VM), he has decided to make all objects as actors.

The way I look at it, this is mostly a decision of the philosophy of the language design. Scala is targetted to be a general purpose programming language, where concurrency and distribution are not the central concerns to address as part of the core language design. The entire actor model has hence been implemented as a library that integrates seamlessly with the rest of Scala's core object/functional engineering. This is a design decision which the language designers did take upfront - hence objects in Scala, by default, bind to local invocation semantics, that enable it to take advantage of all the optimizations and efficiencies of being collocated in the same process.

The actor model was designed primarily to address the concerns of distributed programming. As Jonas Boner recently said on Twitter - "The main benefit of the Actor model is not simpler concurrency but fault-tolerance and reliability". And for fault tolerance you need to have at least two machines running your programs. We all know the awesome capabilities of fault tolerance that the Erlang actor model offers through supervisors, linked actors and transparent restarts. Hence languages like Erlang, which address the concerns of concurrency and distribution as part of the core, have decided to implement actors as their basic building block of abstractions. This was done with the vision that the Erlang programming style will be based on simple primitives of process spawning and message passing, both of which implemented as low overhead primitives in the virtual machine. The philosophy of Scala is, however, a bit different. Though still it is not that difficult to implement the Active Object pattern on top of the Scala actors platform.

Erlang allows you to write programs that will run without any change in a regular non-distributed Erlang session, on two different Erlang nodes running on the same computer and as well on Erlang nodes running on two physically separated computers either in the same LAN or over the internet. It can do this, because the language designers decided to map the concurrency model naturally to distributed deployments extending the actor model beyond VM boundaries.

Another language Clojure, which also has strong concurrency support decided to go the Scala way addressing distribution concerns. Distribution is not something that Rich Hickey decided to hardwire into the core of the language. Here is what he says about it ..

"In Erlang the concurrency model is (always) a distributed one and in Clojure it is not. I have some reservations about unifying the distributed and non-distributed models [..], and have decided not to do so in Clojure, but I think Erlang, in doing so, does the right thing in forcing programmers to work as if the processes are distributed even when they are not, in order to allow the possibility of transparent distribution later, e.g. in the failure modes, the messaging system etc. However, issues related to latency, bandwidth, timeouts, chattiness, and costs of certain data structures etc remain."

And finally, on the JVM, there are a host of options that enable distribution of your programs, which is yet another reason not to go for language specific solutions. If you are implementing your language on top of the Erlang VM, it's all but natural to leverage the awesome power of cross virtual machine distribution capabilities that it offers. While for JVM, distribution can better be left to specialized frameworks.

Sunday, April 05, 2009

Framework Inertia, CouchDB and the case of the missing R

There is nothing wrong in using frameworks, so long I can justify the cause. A language like Java leaves a lot for frameworks to implement. While Java offers idioms that promote separation of interfaces from implementation, yet we need dependency injection frameworks to get around the problems of wiring concrete implementations within the application. This is an encouraged practice in the Java world that leads to codebase which is more flexible and unit testable. Nothing wrong with it .. Similarly there are lots of other frameworks which are seen as essential components of your toolbox so long you continue your adventures within the Java world.

However, the real problem with using frameworks is the inertia which your particularly favorite framework brings on to you. You would almost wish that you had it in every possible programming language that you use. You like the conciseness of the framework, you like the style that it offers in composing abstractions and you love the way it allows you to develop applications. In short, you get virtually married to it, so much so, that you become myopic to the fact that the new programming paradigm actually makes your pet framework irrelevant.

A classic example is the inertia that dependency injection frameworks bring upon us. While DI is an integral part of your development ecosystem in Java, it becomes virtually useless in languages like Ruby that offer the power of open classes, dynamically changeable at runtime with class names being just labels posted on objects. Jim Weirich did a great presentation in OSCON 2005 that answers this question in a greater detail. Jamis Buck had an interesting presentation in RubyConf 2008, where he described his initial passion towards dependency injection and how it took time and experience in moving away from patterns and habits formed in Java and embrace Ruby's idioms to rediscover the joy of programming. Both presentations worth watching to get a feel of the inertia that frameworks can bring upon us.

But I digress .. The real intention of this post is to add my 2 cents to the ongoing debate on the propensity of ActiveRecord like frameworks being used to abstract CouchDB based applications, which I think is yet another instance of framework inertia.

Object Relational Mapping is a layer of abstraction that is supposed to abstract away the impedance mismatch between your object oriented domain model and the relational persistence model. This post is not about whether they actually offer the benefits that they claim and allow truly parallel evolution of the domain and the persistence model. This is more about how we developers tend to carry the inertia of our favorite ORMs into every paradigm that we implement, irrespective of its applicability.

ActiveRecord is one of our favorite ORM frameworks that has become a household name in every Rails shop. It is terse, it is intuitive, makes explosive use of Ruby meta-programming resulting in magically concise domain models. How much we love when we can write the following three lines of ActiveRecord code and find the entire Product model, along with its business validations, being weaved in the magic of Rails goodness ..

class Product < ActiveRecord::Base
  validates_presence_of :title, :description, :image_url
end


More we use ActiveRecord, more infatuated we tend to be in its goodness and convenience. And finally we lose the ability to judge whether the solution offered by the framework correctly maps to the problem at hand. Steve Vinoski once mentioned about Convenience over Correctness in one his columns on Internet Computing. He was referring to our urge of trying to force fit the fundamentally flawed RPC oriented paradigms simply because of easy availability of those abstractions in popular general purpose programming languages. He mentions in his blog .. "Making a function or method call to a remote or distributed function, object, or service appear just like any other function or method call allows such developers to stay within the comfortable confines of their language. Those who choose this approach essentially decide that developer convenience and comfort is more important than dealing with hard distribution issues like latency, concurrency, reliability, scalability, and partial failure".

Recently there has been a debate on the applicability of ActiveRecord like frameworks as frontending CouchDB. Alexander Lang had a great presentation at Scotland On Rails that dealt with the applicability of Ruby frameworks with CouchDB. The debate started with his views where he recommended against frameworks that think in terms of records and associations. And encouraged thinking in terms of documents and views, which is absolutely in line with the philosophy that CouchDB espouses. We developers have been too strongly indoctrinated in the teachings of relational databases, third normal forms, joins, constraints and ACID based principles. It is extremely difficult to come out of such mindset, particularly when we are dealing with disruptive technologies like CouchDB or MongoDB. But unless we are able to get into the train of thoughts that these systems encourage, we will never be able to derive fully the benefits that they offer. Along the lines of the above discussion, we cannot afford to choose convenience over correctness or surrender to the temptations of adhering to the confines of our known paradigms.

There is really no *R* in the ORM layer that can possibly sit between CouchDB and your application. CouchDB stores data as schemaless JSON documents - hence the mapping layer should really be a thin wrapper that converts between JSON documents and your languages' objects. Want to include helper methods for writing validation logic within your framework ? Cool stuff, use Ruby's meta-programming abilities and allow users to write validates_presence_of. This one absolutely belongs to the domain, and has no strings attached to the relational paradigm whatsoever. But want to handle lazy loading of associations or want to do caching like ActiveRecord within the framework ? CouchDB offers REST interface that comes with all the caching abilities that http offers - handling it within the framework will be playing against the rules of the game. Playing along the rules of REST, CouchDB documents and views represent resources - all queries are mapped to CouchDB views, which can be generated lazily and updated incrementally using the map/reduce paradigms. Instead of restricting queries within a group of ActiveRecord style finder methods, allow users the power to write custom views in the same map/reduce style that the CouchDB way encourages ..

Here is an example custom view from the CouchRest distribution ..

word_count = {
  :map => 'function(doc){
    var words = doc.text.split(/\W/);
    words.forEach(function(word){
      if (word.length > 0) emit([word,doc.title],1);
    });
  }',
  :reduce => 'function(key,combine){
    return sum(combine);
  }'
}


For a 1..n association there is only one relational way of doing things, while CouchDB allows programmers to model data closer to the domain, much like the way the data will be used in the actual use case. This I discuss next in the context of managing associations. The main point, here, is that, if data organization is best left to the user, fetch strategies should also not be bounded within canned finder methods. And this is where the programmer needs flexibility to organize his fetch logic around custom maps and reduces. Its only the practice of empowering the users more to harness the full power of CouchDB engine and philosophy.

And the Associations ..

Some of the ActiveRecord like frameworks for CouchDB allows developers to model associations as follows ..

class BlogPost < CouchXXX::Base  
  has_many :comments  
end


While this is the only way of modeling a 1..n association in ActiveRecord, it has a definite rationale in a relational database system. This will be modeled optimally in an RDBMS as a combination of 2 tables with a foreign key relationship. In case of CouchDB modeling, we can have various options to model this association :

  1. Inline comments, where the comments are stored in the same document as the post

  2. Separate documents for comments, one document per comment, each having a backlink to the post it belongs to

  3. Use the power of view collation where the blog post and the associated comments are grouped together in a view by using complex keys with arbitrary JSON values. This is best explained in this post.


The option of modeling that the user would like to adopt depends upon the usage in the domain. Unlike the relational paradigm, we have multiple options of storage here and the best way to handle it is to leave it upon the user to decide the strategy and implementation. Hence has_many makes sense in the ActiveRecord world, not in CouchDB paradigm. In the CouchDB world, associations are best modeled outside your framework using native CouchDB practices.

I am all for liberating the programmers with more power, if that helps, instead of restricting them within the confines of convenience. Unfortunately frameworks often prove too limiting for technologies that attempt to do things differently ..