Sunday, January 04, 2009

Higher Order abstractions in Scala with Type Constructor Polymorphism

Abstractions at a higher level through type constructor polymorphism. Good type systems are expressive enough to conceal the implementation complexity, and expose *only* what it matters to the developer. People often cringe about the complexity of Scala's type system and how it serves as a barrier to the entry point in mainstream programming. As Michael Feathers recently noted on Tweeter, the unfortunate fact is that people often jump at the esoteric parts of a language before looking at the simpler subset, which he will be using 90% of the time. And, I think Scala has that sweet spot, where you need not mess around too much with variances, implicits and existentials and yet come up with a nice, concise and functional codebase.

In this post, I discuss the already introduced intimidating phrase "Type Constructor Polymorphism" through a series of code snippets ranging from toys to some real-world stuff. The idea is, once again, not to evangelize type theory intricacies, but share some of the experiences of how this feature in Scala's type system can help write idiomatic code, while staying away from the complexities of its underlying implementation.

Jump on ..

We have a list of Option[String] that we need to abstract over and compute some value. Say, for the sake of keeping the example simple, we will calculate the sum of lengths of all the strings present ..

val employees: List[Option[String]] =
  List(Some("dave"), None, Some("john"), Some("sam"))

val n: List[Int] =
  employees.map { x =>
    x match {
      case Some(name) => name.length
      case None => 0
    }
  }.elements.reduceLeft[Int](+ _)
println(n)


Let us take another problem that needs to abstract over a different List structure, a List of List of Strings, and compute the same result as before, i.e. the sum of lengths of all the strings encountered in the collection ..

val brs: List[List[String]] =
  List(List("dave", "john", "sam"), List("peter", "robin", "david"), List("pras", "srim"))

val m: List[Int] =
  brs.flatMap {=> x.map {_.length}}
     .elements.reduceLeft[Int](+ _)
println(m)


Do you see any commonality in the solution structure in the above snippets ? After all, the problem space has a common generic structure ..

  1. we have a List with some String values abstracted in different forms

  2. need to iterate over the list

  3. do some stuff with elements in the list and compute an Int value


Unfortunately the actual solution structures look quite different and have to deal a lot digging into the abstractions of the underlying representations within the collection itself. And this is because, we cannot abstract over the type constructor (the List in this case) that takes another type constructor as an argument (Option[String] in the first case and List[String] in the second case).

Enter type constructor polymorphism.

Sounds intimidating ? Maybe, but ask the Haskellers .. they have been using typeclasses ever since so successfully in comprehensions, parser combinators and embedded DSLs and programming at a different level of abstraction.

Scala supports type constructor polymorphism since 2.5, and the details have been discussed in a recent paper by Adriaan Moors et al in OOPSLA 2008.

Here is a snippet of the Scala code that works seamlessly for both of the above cases ..

val l: List[Int] = employees.flatMapTo[Int, List]{_.map{_.length}}
val sum: Int = l.elements.reduceLeft[Int](+ _)
println(sum)


The above code abstracts over List through higher order parametric polymorphism, i.e. independent of whether the List parameter is an Option[] or another List[]. Incidentally both of them (List and Option) are monads and flatMapTo abstracts a monadic computation, hiding all details of type constructor polymorphism from the developer.

Now here is some real life example (elided for simplicity) ..

Here are the simple domain models for Customer, Instrument and Trade, used for modeling a use case where a Customer can order for the Trade of an Instrument in an exchange.

case class Customer(id: Int, name: String)
case object nomura extends Customer(1, "NOMURA")
case object meryll extends Customer(2, "MERYLL")
case object lehman extends Customer(3, "LEHMAN")

case class Instrument(id: Int)
case object ibm extends Instrument(1)
case object sun extends Instrument(2)
case object google extends Instrument(3)

case class Trade(ref: Int, ins: Instrument, qty: Int, price: Int)


And we fetch the following list through a query from the database. It is a List of tuples where each tuple consists of a Customer and a trade that has been executed based on the Order he placed at the exchange. And here is the snippet of the code that computes the sum total of the values of all trades executed in the day for all customers.

val trades: List[(Customer, Option[Trade])] =
  List((nomura, Some(Trade(100, ibm, 20, 12))),
       (meryll, None), (lehman, Some(Trade(200, google, 10, 10))))

val ts: List[Option[Trade]] = trades.map(_._2)
val t: List[Int] = ts.flatMapTo[Int, List]{_.map{=> x.qty * x.price}}
val value = t.elements.reduceLeft[Int](+ _)
println(value)


Really not much different from the above simple cases where we were dealing with toy examples - isn't it ? The structure of the solution is the same irrespective of the complexity of data being stored within the collections. The iteration is being done at a much higher level of abstraction, independent of the types stored within the container. And as I mentioned above, flatMapTo is the secret sauce in the above solution structure that abstracts the building of the new container hiding the inner details. To get more into the innards of flatMapTo and similar higher order abstractions, including the new form of Iterable[+T], have a look at the OOPSLA paper referenced above.

Postscript: In all the snippets above, I have been explicit about all type signatures, just for the sake of easier understanding of my passionate reader. In reality, almost all of these will be inferred by Scala.

12 comments:

Fanf said...

I believe I never let a comment here before, so this is a global "thank you" for all your posts, they are always really pleasant to read.

Arrgh said...

Note that the map/match in the first example could be done more concisely like this:

val lengths = for (Some(name) <- employees) yield name.length

I'm sure we could find a way to get rid of the reduce too, but I'll leave that as an Exercise For The Reader. :)

Debasish said...

Arrgh -

True, but that does not help abstracting over the type constructor parameter of List[] in both the cases. You still need to write separate plumbing code for dealing with Option[] and List[] in the 2 cases.
With tcploy, you achieve a higher level of abstraction.

Cheers.

davidsunglee said...

As an aside....

In your first two examples, you can use the same (non-tc polymorphic) code to deal with List[Option[String]] and List[List[String]]:

val l = x.flatMap(_.map (_.length)).elements.reduceLeft(_ + _)

where x is either type of List.

davidsunglee said...

Whoops--no need for the redundant elements call either:

val l = x.flatMap(_.map(_.length)).reduceLeft(_ + _)

Debasish said...

davidsunglee -

That's the precise point of the post. You need not use different plumbing code when you are dealing with type constructors as arguments to another type constructor. The flatMap pattern will achieve this. And the reason you can use Option[] seamlessly with List[] is because of the following definition ..

object Option {
  /** An implicit conversion that converts an option to an iterable value
  */
  implicit def option2Iterable[A](xo: Option[A]): Iterable[A] = xo.toList
}

It has an implicit conversion from Option to Iterable. You can also design your own custom higher order data structures parameterized over polymorphic data types. Scala examples distribution contains a sample implementation of HOSeq.

I have used flatMapTo, since it is the abstraction that I found in the OOPSLA paper and thought people can relate to it while reading the paper. It very much works with flatMap from Scala 2.5.

Cheers.

davidsunglee said...

I only skimmed the beginning of the linked paper, so apologies in advance for silly questions :)

What about the motivating example requires higher kinds for abstraction? As you point out, an implementation that relies on flatMap needs to leverage implicits, but isn't this orthogonal to higher kinds?

Put another way, does anything prevent one from expressing a similar abstracted solution in a language with only first-order parametric polymorphism, provided it supports a facility similar to implicits?

Your flatMapTo solution is cool and clearly does require higher kinds. However, (and final questions, scout's honor): doesn't the utility of the higher kinds in flatMapTo rest in the ability to decouple the type of the initial collection from the resulting one? Since the initial collection in this case is List[_] and the resulting one is List[_], is anything gained over flatMap? Would flatMapTo-ing a List[_] to a Set[_] be more provocative?

Type jibber-jabber mumbo-jumbo aside, kudos for another of your usual interesting posts.

Debasish said...

davidsunglee -

As you point out, an implementation that relies on flatMap needs to leverage implicits, but isn't this orthogonal to higher kinds?

Put another way, does anything prevent one from expressing a similar abstracted solution in a language with only first-order parametric polymorphism, provided it supports a facility similar to implicits?


I was not talking about the current implementation of flatMap that is there in Scala. I was only suggesting using the same name and have an implementation that abstracts on the Container type. Have a look at HOSeq, where flatMap is defined as

def flatMap[resColl[x] <: Iterable[resColl, x], s](f: t => resColl[s])(implicit buf: Accumulator[resColl, s]): resColl[s] = {
//..
}

This takes an Accumulator[], which abstracts the source container type. And it is very much similar to flatMapTo that I used in the code snippet. The detailed implementation of flatMapTo can be found in Adriaan Moor's page attahced with the OOPSLA paper.

doesn't the utility of the higher kinds in flatMapTo rest in the ability to decouple the type of the initial collection from the resulting one?

Sure .. that's the purpose of having the Container as an additional argument to flatMap, which can, however be defaulted away using the tricks of implicits in Scala.

Cheers.

Debasish said...

davidsunglee -

Would flatMapTo-ing a List[_] to a Set[_] be more provocative?

Sure .. here it is ..

import scala.collection.Set

implicit object SetIsBuildable extends Buildable[Set] {
  def build[El]: Builder[Set, El] = new Builder[Set, El] {
    var res = scala.collection.mutable.Set.empty[El]
    def +=(el: El): Unit = { println("in += " + el); res += el }
    def finalise(): Set[El] = res.readOnly
  }
}

val s = employees.flatMapTo[Int, Set]{_.map{_.length}(OptionIsBuildable)}(SetIsBuildable)
val ssum = s.elements.reduceLeft[Int](_ + _)

Debasish said...

Actually you do not need the implicit arguments ..

val s = employees.flatMapTo[Int, Set]{_.map{_.length}}

I showed it just for clarity of the use case.

Steven Shaw said...

flatMapTo has gone now (Scala 2.9.2). Any idea how to do it with the new collections api?

irishjava said...

val employees: List[Option[String]] =
List(Some("dave"), None, Some("john"), Some("sam"))

val a = employees.map { x =>
x match {
case Some(name) => name.length
case None => 0
}
}.iterator

val n: Int = a.reduceLeft[Int](_ + _) ///(a:Int,b:Int) => a + b
println(n)

val brs: List[List[String]] =
List(List("dave", "john", "sam"), List("peter", "robin", "david"), List("pras", "srim"))

val m: Int =
brs.flatMap { x => x.map { _.length } }
.iterator.reduceLeft[Int](_ + _)
println(m)

The reduceLeft's don't return List[Int], they return Int.

Perhaps I'm missing something.