Monday, January 10, 2011

Iteration in Scala - effectful yet functional

One of the papers that influenced me a lot in 2010 was The Essence of the Iterator Pattern by Jeremy Gibbons and Bruno C. d. S. Oliveira. It builds upon where McBride and Paterson left in their treatise on Applicative Functors. Gibbons' paper discusses the various aspects of building traversal structures in the presence of effects.

In this post I look at some of the traversal patterns' functional implementations using scalaz. In the paper on applicative functors, McBride and Paterson defines traverse as an applicative mapping operation ..

traverse :: Applicative f => (-> f b) -> [a] -> f [b]

Gibbons et. al. uses this abstraction to study various traversal structures in the presence of effects. The paper starts with a C# code snippet that uses the syntax sugar of foreach to traverse over a collection of elements ..

public static int loop<MyObj> (IEnumerable<MyObj> coll){
  int n = 0;
  foreach (MyObj obj in coll){
    n = n+1;
  return n;

In the above loop method, we do two things simultaneously :-

  1. mapping - doing some operation touch() on the elements of coll with the expectation that we get the modified collection at the end of the loop
  2. accumulating - counting the elements, which is a stateful operation for each iteration and which is independent of the operation which we do on the elements
And in the presence of mutation, the two concerns are quite conflated. Gibbons et. al. uses McBride and Paterson’s applicative functors, the traverse operator which they discuss in the same paper, to come up with some of the special cases of effectful traversals where the mapping aspect is independent of accumulation and vice versa.

Over the last weekend I was exploring how much of these effectful functional traversals can be done using scalaz, the closest to Haskell you can get with Scala. Section 4.2 of the original paper talks about two definite patterns of effectful traversal. Both of these patterns combine mapping and accumulation (like the C# code above) but separates the concerns skillfully using functional techniques. Let's see how much of that we can manage with scalaz functors.

Pattern #1

The first pattern of traversal accumulates elements effectfully, but modifies the elements of the collection purely and independently of this accumulation. Here's the scalaz implementation of collect (see the original paper for the haskell implementation) ..

def collect[T[_]:Traverse, A, B, S](f: A => B, t: T[A], g: S => S) =
  t.traverse[({type λ[x] = State[S,x]}), B](=> state((s: S) => (g(s), f(a))))

To the uninitiated, the type annotation in traverse looks ugly - it's there because scalac cannot infer partial application of type constructors, a problem which will be rectified once Adriaan fixes issue 2712 on the Scala Trac.

Traverse is one of the typeclasses in scalaz similar to the model of Data.Traversable in Haskell.

trait Traverse[T[_]] extends Functor[T] {
  def traverse[F[_] : Applicative, A, B](f: A => F[B], t: T[A]): F[T[B]]

  import Scalaz._

  override def fmap[A, B](k: T[A], f: A => B) = traverse[Identity, A, B](f(_), k)

and scalaz defines implementations of the Traverse typeclass for a host of classes on which you can invoke traverse.

The above implmentation uses the State monad to handle effectful computations. For an introduction to the State monad in scalaz, have a look at this post from Tony Morris.

Note, f is the pure function that maps on the elements of the collection, g is the function that does the effectful accumulation through the State monad. Using collect, here's a version of the C# loop method that we did at the beginning ..

val loop = collect((a: Int) => 2 * a, List(10, 20, 30, 40), (i: Int) => i + 1)
loop(0) should equal((4, List(20, 40, 60, 80)))

Now we have the effectful iteration without using any mutable variables.

Pattern #2

The second pattern of traversal modifies elements purely but dependent on some state that evolves independently of the elements. Gibbons et. al. calls this abstraction disperse, whose scalaz implementation can be as follows ..

def disperse[T[_]: Traverse, A, S, B](t: T[A], s: A => State[S, B]) =
  t.traverse[({type λ[x] = State[S,x]}), B](s)

Note how the elements of the collection are being modified through the State monad. Using disperse, we can write a labeling function that labels every element with its position in order of traversal ..

def label[T[_]: Traverse, A](t: T[A]) = 
  disperse(t, ((a: A) => state((i: Int) => (i+1, i)))) ! 0

label(List(10, 20, 30, 40)) should equal(List(0, 1, 2, 3)) 

disperse can also be used to implement the wordCount example that ships with scalaz distribution. Actually it counts the number of characters and lines in a stream.

def charLineCount[T[_]:Traverse](t: T[Char]) =
  disperse(t, ((a: Char) => state((counts: (Int, Int)) =>
    ((counts._1 + 1, counts._2 + (if (== '\n') 1 else 0)), (counts._1, counts._2))))) ! (1,1)

charLineCount("the cat in the hat\n sat on the mat\n".toList).last should equal((35, 2))


Jason Zaugg said...

s/Traverse is one of the functors/Traverse is one of the type classes/

Unknown said...

Thanks for pointing out .. Fixed!

omd said...

Thank you my friend for this blog. I found it enlightening and it helped me in finishing my code for the upcoming post.