Friday, August 27, 2010

Random thoughts on Clojure Protocols

Great languages are those that offer orthogonality in design. Stated simply it means that the language core offers a minimal set of non-overlapping ways to compose abstractions. In an earlier article A Case for Orthogonality in Design I discussed some features from languages like Haskell, C++ and Scala that help you compose higher order abstractions from smaller ones using techniques offered by those languages.

In this post I discuss the new feature in Clojure that just made its way in the recently released 1.2. I am not going into what Protocols are - there are quite a few nice articles that introduce Clojure Protocols and the associated defrecord and deftype forms. This post will be some random rants about how protocols encourage non intrusive extension of abstractions without muddling inheritance into polymorphism. I also discuss some of my realizations about what protocols aren't, which I felt was equally important along with understanding what they are.

Let's start with the familiar Show type class of Haskell ..

> :t show
show :: (Show a) => a -> String

Takes a type and renders a string for it. You get show for your class if you have implemented it as an instance of the Show type class. The Show type class extends your abstraction transparently through an additional behavior set. We can do the same thing using protocols in Clojure ..

(defprotocol SHOW 
  (show [val]))

The protocol definition just declares the contract without any concrete implementation in it. Under the covers it generates a Java interface which you can use in your Java code as well. But a protocol is not an interface.

Adding behaviors non-invasively ..

I can extend an existing type with the behaviors of this protocol. And for this I need not have the source code for the type. This is one of the benefits that ad hoc polymorphism of type classes offers - type classes (and Clojure protocols) are open. Note how this is in contrast to the compile time coupling of Java interface and inheritance.

Extending java.lang.Integer with SHOW ..

(extend-type Integer
  SHOW
  (show [i] (.toString i)))

We can extend an interface also. And get access to the added behavior from *any* of its implementations .. Here's extending clojure.lang.IPersistentVector ..

(extend-type clojure.lang.IPersistentVector
  SHOW
  (show [v] (.toString v)))

(show [12 1 4 15 2 4 67])
> "[12 1 4 15 2 4 67]"

And of course I can extend my own abstractions with the new behavior ..

(defrecord Name [last first])

(defn name-desc [name]
  (str (:last name) " " (:first name)))

(name-desc (Name. "ghosh" "debasish")) ;; "ghosh debasish"

(extend-type Name
  SHOW
  (show [n]
    (name-desc n)))

(show (Name. "ghosh" "debasish")) ;; "ghosh debasish"

No Inheritance

Protocols help you wire abstractions that are in no way related to each other. And it does this non-invasively. An object conforms to a protocol only if it implements the contract. As I mentioned before, there's no notion of hierarchy or inheritance related to this form of polymorphism.

No object bloat, no monkey patching

And there's no object bloat going on here. You can invoke show on any abstraction for which you implement the protocol, but show is never added as a method on that object. As an example try the following after implementing SHOW for Integer ..

(filter #(= "show" (.getName %)) (.getMethods Integer))

will return an empty list. Hence there is no scope of *accidentally* overriding some one else's monkey patch on some shared class.

Not really a type class

Clojure protocols dispatch on the first argument of the methods. This limits its ability from getting the full power that Haskell / Scala type classes offer. Consider the counterpart of Show in Haskell, which is the Read type class ..

> :t read  
read :: (Read a) => String -> a

If your abstraction implements Read, then the exact instance of the method invoked will depend on the return type. e.g.

> [1,2,3] ++ read "[4,5,6]"
=> [1,2,3,4,5,6]

The specific instance of read that returns a list of integers is automatically invoked here. Haskell maintains the dispatch match as part of its global dictionary.

We cannot do this in Clojure protocols, since it's unable to dispatch based on the return type. Protocols dispatch only on the first argument of the function.


6 comments:

Anonymous said...

Excellent article.

It's worth noting that Scala implicits and Haskell typeclass instances are dispatched statically - that is to say the compiler statically picks an implicit/typeclass instance based on static analysis of static types.

Clojure is dynamically typed so it must dispatch to a particular type extension based on a dynamic type tag associated with a value. Things like Haskell's read are out of the question because read creates a value of the target type rather than taking a value of that type.

Perhaps some day a dynamically typed language will be able to expose the full power of a typeclass-like mechanism, but I don't think anybody has figured out a way to do it efficiently.

Anonymous said...

Note that Clojure added protocols and types to make it possible to do Clojure-in-Clojure. If you want multible dispatch you have to use multi methods but those are to slow to implement Clojure in itself (meaning the mainly the data structures).

Meikel said...

I can extend an existing type with the behaviors of this protocol.

I think this a common misunderstanding. It's actually the other way around! The protocol is extended to the type. Seeing it this way makes it also easier to understand the behaviour eg. when a protocol is redefined, etc.

Unknown said...

@meikel Good catch. I actually mentioned it in the section No Object Bloat. But the phrase that a protocol extends an existing type is strictly not true. The protocol methods are never installed as methods of the class / record / type on which it's applied.

Vsevolod Dyomkin said...

Clojure protocols are just a poor twist on Lisp's generic functions. It seems, that it turned out, that nobody uses defmulti — which, I suppose, were thought to be an extensions and a replacement of Lisp's defgeneric — it proved to general-purpose. Yet defgeneric offers even more freedom, than Haskell's typeclasses, because it also supports inheritance, before/after/around and custom decorations and eql-comparison. The only thing missing out-of-the-box is the ability to unite multiple generic functions under one roof. But it's just a macro away...

Olivier said...

I think you may have misunderstood Meikel's comment: he did not say that the protocol extends a type but is extended to a type. It seems to me (based on your own post) that one can think of the protocol as a kind of switch statement that is initially empty and is then progressively (one declaration at at time) populated with cases of types for which the protocol's operation makes sense.