Friday, September 03, 2010

Towards generic APIs for the open world

In my last post on how Clojure protocols encourage open abstractions, I did some quick rounds between type classes in Haskell and protocols in Clojure. At the end in the section titled "Not really a type class", I mentioned about the read function of Haskell's Read type class. read takes a String and returns a type - hence it doesn't dispatch on the function argument, but rather on the return type. Clojure protocols can't do this, I am not aware of any dynamic language that can do this. Check out James Iry's insightful comment on this subject on the post.

With type classes all dispatch is static - the dispatch map is passed as a dictionary of types and inferred by the compiler. What benefit does this bring on to us ? Do we really get anything special when the language supports APIs like the read method of Haskell's Read type class ?

In this post I try to explore how type classes help design generic APIs that are open and can work seamlessly with abstractions that you implement much later in timeline than the type class itself. This is in contrast to subtype polymorphism where all subtypes are bound by the contracts that the super type exposes. In this sense subtype polymorphism is closed.

This post is inspired in part by the excellent article Generalizing APIs by Edward Z. Yang. For this post I will use Scala, my current language of choice for most of the things I do today.

My generic API

I want to implement a read API like the one in Haskell encoded in a Scala type class .. Let's make it generic in the type that it returns ..

// type class
// reads a string, returns a T
trait Read[T] {
  def read(s: String): T

For the open world

We can define instances of this type class by instantiating the trait as objects. Type classes are implemented in Scala using implicits. In case you're not familiar with the concept, here's what I wrote about them some time back.

// instance for Int
implicit object IntRead extends Read[Int] {
  def read(s: String) = s.toInt

// instance for Float
implicit object FloatRead extends Read[Float] {
  def read(s: String) = s.toFloat

These are very much like what you would do with type class instances in Haskell. You can even create instances for your own abstractions ..

case class Name(last: String, first: String)

object NameDescription {
  def unapply(s: String): Option[(String, String)] = {
    val a = s.split("/")
    Some((a(1), a(0)))

// instance for Name
import NameDescription._
implicit object NameRead extends Read[Name] {
  def read(s: String) = s match {             
    case NameDescription(l, f) => Name(l, f)
    case _ => error("invalid")

So the Read type class in Scala is generic enough to be instantiated for all kinds of abstractions. Note that unlike interfaces in Java, the polymorphism is not coupled with inheritance hierarchies. With interface, your abstraction needs to implement the interface statically, which means that the interface has to exist before you design your abstraction. With type classes, the abstractions for Int and Float existed well before we define the Read type class.

Now if we have a generic function that takes a String, we can make it return an instance of the type it is generic on.

def foo[: Read](s: String) = implicitly[Read[T]].read(s)

foo[Int]("123") // 123
foo[Float]("123.0") // 123.0
foo[Name]("debasish/ghosh") // Name("ghosh", "debasish")

Ok .. so that was our generic read API adapting violently to already existing abstractions. In this case it's exactly the Scala variant of how simple type class instances behave in Haskell. The authors of Real World Haskell uses the term open world assumption to describe this feature of the type class system.

Context for selecting the API instance

When the function foo is invoked, the compiler needs to find out the exact instance of the Read type class from the method dictionary in case of Haskell and from the list of available implicit conversions in case of Scala. For this we specify the context bound of the generic type T as T : Read. This is same as the context of the type class that we have in Haskell.  It specifies that the method foo can return any type T provided the type is an instance of the type class Read. Apart from using the context bound, in Scala you can also use view bounds to implement context of a type class. The Haskell equivalent is ..

foo :: Read a => String -> a

Irrespective of Haskell or Scala, our API becomes hugely expressive through such constraints that the static type system allows us to write. And all these constraints are checked during compile time.

Context in implementing specific instances

When defining a generic API, you can also set up a context for specific instances of the type class. Consider our read method for a List datatype in Scala. Haskell defines the instance as ..

instance Read a => Read [a] where ..

Note the context Read a following the instance keyword. This is called the context of the type class instance which says that we can read a List of a only if all individual a's also implement the Read type class. 

We do this in Scala using conditional implicits as ..

implicit def ListRead[A](implicit r: Read[A]) = 
  new Read[List[A]] {
    def read(s: String) = {
      val es = s.split(" ").toList

The implicit definition itself takes another implicit argument to validate during compile time that the individual elements of the List also are instances of the type class. This is similar to what the context does in case of Haskell's type class instantiation.

foo[List[Int]]("12 234 45 678") // List(12, 234, 45, 678)
foo[List[Float]]("12.0 234.0 45.0 678.0") // List(12.0, 234.0, 45.0, 678.0)
foo[List[Name]]("debasish/ghosh maulindu/chatterjee nilanjan/das")
  // List(Name("ghosh", "debasish"), Name("chatterjee", "maulindu"), Name("das", "nilanjan"))

As part of common extensions of GHCI, Haskell also provides support for overlapping instances of type classes ..

instance Read a => Read [a] where ..
instance Read [Int] where ..

In such cases although there are two possible matches for [Int], the compiler can make an unambiguous decision and select the most specific instance. With Scala, there is no such ambiguity to be resolved since Scala anyway allows multiple implementations of the same type class and it's up to the user to import the specific one into the module.

In this post I discussed the power that you get with type class based generic API design. In functional languages like Haskell, type classes are the most potent way to implement extensible APIs for the open world. Of course in object functional languages like Scala, you also have the power of subtyping, which comes good in many circumstances. It will be interesting to come up with a comparative analysis of situations when we prefer one to the other. But that's up for some other day, some other post ..


Henrik Huttunen said...

Just an interesting remark, since you mentioned dynamic languages. Gilad Bracha has a belief that static stage should not affect the runtime behaviour of the program, as in type classes case it does. (I think he said this in Microsoft's Channel 9 video discussion). If I understood correctly, he has no obligation against static verification but I guess he doesn't like that what you get is not what you see.

Have you opinion on that?

Pepe Iborra said...

I am not sure that your remark about overlapping instances in Scala. Is it really the case that Scala has something equivalent?

How would you express the following in Scala: a generic Read instance for lists, and a particular instance for lists of Chars, which is how we model Strings in Haskell:

Instance Read a => Read [a] where read =

instance Read [Char] where read = <special case for Haskell Strings]

Unknown said...

@Pepe Iborra ..

Anonymous said...

Your claim that "I am not aware of any dynamic language that can do this," is off the mark.

Now that I've finally worked out what you mean by dispatch on the return type (which really means dispatch on the requested returnType which you've explicitly or implicitly via some type inference provided to the function call) I think I'm right in assuming that what you feel is missing in most dynamic languages is the ability to define a generic function

foo( TypeVar: t, String: s)

which will return something of type t and select different implmentations based on the type t. However, this can be done fairly straightforwardly with generic functions in lisp as they can specialize either on the classes of the arguments or particular values. Thus one can specialize on every particular value of the type you wish to define the function for. Alternatively you could just pass an object of that type rather than the type itself for the first arg.

Indeed, this is even more flexible than Haskell's approach as you could do something silly like this:

progGreeting( employeeID, Time)

which prints out the personalized greeting from employee with ID employeeID each of which is defined independently of any other programmer's greeting function.

I agree, however, that most dynamic languages don't have the optimal namespace primitives to allow easy after the fact specialization without risking collisions.


Ultimately though it's indisputable that Haskell's type system and features derived from it provide both advantages for code reusability and disadvantages. Whatever else you might say dealing with types takes time and sometimes it impedes reuse or problem solving when thorny type issues need to be mulled out.

The lesson from this is that you simply can't establish Haskell's approach as better or worse even in a particular domain from the armchair. Only empirical results about what actually increases productivity the most can really shed light on it.