Ruminations of a Programmer: algorithm

Showing posts with label algorithm. Show all posts

Monday, October 13, 2008

Map Composition in Scala (or The Virtues of Laziness)

I have been dabbling around with some lazy optimization techniques in designing functional data structures. I am using Scala to implement some of the functional data structures of Okasaki .. hence thought it would be appropriate to think aloud of some of the lazy optimization that Scala offers.

Scala is a big language with myriads of features and some unfortunate syntactic warts that will definitely rub you the wrong way till you get comfortable with them. Some time back, a popular joke on twitter was related to the number of meanings and interpretations that the character "_" has in Scala. Indeed there are many, and not all of them are intuitive enough .. :-( ..

Now, on to real some Scala snippets ..

Do you know all the differences in interpretation of the following variants of function definition and application ?

// call by value
def foo(x:Bar) = {
  val a = x
  val b = x
}

// call by name
def foo(x: => Bar) = {
  val a = x
  val b = x
}

// no argument function implemented as thunks
def foo(x:() => Bar) = {
  val a = x
  val b = x
  val c = x()
}

// call by name to call by need
def foo(x: => Bar) = {
  lazy val y = x
  //...
}

This post is not about discussing the details of the above 4 cases .. Scala wiki has an excellent FAQ entry that describes each of them with sufficient rigor and detail .. go get it ..

Lazy evaluation is one of the techniques of performance optimization when dealing with large functional data structures. In Scala, Seq[A].Projection is an abstraction that makes operations lazy.

val l = List(...) // a big list
l.map(_ * 4).map(_ + 2).filter(_ % 6 == 0)

In the above snippet, every operation on the list creates a separate list, which gets chained on to the next operation in line. Hence if you are dealing with large collections, it always sucks in performance. Go lazy and you have savings both in memory requirements and in elapsed time ..

l.projection.map(_ * 4).map(_ + 2).filter(_ % 6 == 0)

This will return a Stream, which will do a lazy evaluation on demand. I found the term Stream first in SICP, where Abelson and Sussman introduce it as a data structure for delayed evaluation, which enables us to represent very large (even infinite) sequences.

In Scala, a Stream is a lazy list, and follows the semantics of SICP streams, where elments are evaluated only when they are needed.

Making your custom collection lazy ..

I was recently working with a custom recursive tree-like data structure in Scala, which, for simplicity, let us assume is a binary tree. And since I was fetching records from a database and then loading up data structures in memory, I was working with a really big sized collection. Let us see how we can implement Projection on my custom data structure and make things lazy on my own. Scala, unlike Haskell, is not an inherently lazy language, and abstractions like Projection, help implement laziness in evaluation. Eric Kidd wrote a great post on Haskell rewrite rules to implement declarative fusion of maps using compiler directives. This post has some inspiration from it, through the looking glass of Scala, an admittedly more verbose language than Haskell.

trait Tree[+A] {
  def map[B](f: A => B): Tree[B]
}
/**
 * Non empty tree node, with a left subtree and a right subtree
 */
case class Node[+A](data: A, left: Tree[A], right: Tree[A]) extends Tree[A] {
  override def map[B](f: A => B): Tree[B] = Node(f(data), left.map(f), right.map(f))
}
/**
 * Leaf node
 */
case class Leaf[+A](data: A) extends Tree[A] {
  override def map[B](f: A => B): Tree[B] = Leaf(f(data))
}
/**
 * Empty tree object
 */
case object E extends Tree[Nothing] {
  override def map[B](f: Nothing => B): Tree[B] = throw new Exception
}

We have a map operation defined for the tree, that uses a recursive implementation to map over all tree nodes. The map operation is a strict/eager one, much like List.map ..

val t = Node(7, Node(8, Leaf(9), Leaf(10)), Node(11, Leaf(12), Leaf(13)))
t.map(_ * 2).map(_ + 1)

will result in a new tree that will have both the map operations done in succession. And in the process will generate intermediate tree structures, one for each of the map operations in chain. Needless to say, for a large collection, both space and time will hit you.

Getting rid of the intermediate trees ..

Implement laziness .. make evaluations lazy, so that effectively we have one final tree that evaluates it's nodes only when asked for. In other words, lift the operation from the collection to an iterator, which gets evaluated only when asked for.

Here is a sample bare bone unoptimized iterator implemented via scala.collection.mutable.Stack ..

class TreeIterator[A](it: Tree[A]) extends Iterator[A] {
  import scala.collection.mutable.Stack
  val st = new Stack[Tree[A]]
  st push it

  override def hasNext = st.isEmpty == false
  override def next: A = st.pop match {
    case Node(d, l, r) =>
      st push r
      st push l
      d
    case Leaf(d) =>
      d
  }
}

Using this iterator, we define the Projection for Tree with the lazy map implementation and integrate it with the main data structure through a projection method ..

Here is the modified Tree[+A] ..

trait Tree[+A] {
  def elements: Iterator[A] = new TreeIterator(this)
  def map[B](f: A => B): Tree[B]
  def projection : Tree.Projection[A] = new Tree.Projection[A] {
    override def elements = Tree.this.elements
  }
}

and the Projection trait in an accompanying singleton for Tree ..

object Tree {
  trait Projection[+A] extends Tree[A] {
    override def map[B](f: A => B): Projection[B] = new Projection[B] {
      override def elements = Projection.this.elements.map(f)
    }
  }
}

Now I can use my data structure to implement lazy evaluations and fusion of operations ..

val t = Node(7, Node(8, Leaf(9), Leaf(10)), Node(11, Leaf(12), Leaf(13)))
t.projection.map(_ * 2).map(_ + 1)

Eric Kidd reported on making Haskell maps 225% faster through fusion and rewrite rules. In Scala, implementing laziness through delayed evaluation (or Projection) can also lead to substantial reduction in memory usage and elapsed time.

Friday, October 03, 2008

Functional List with fast Random Access

I needed a List with fast random access capabilities. Standard implementations of a List takes O(i) to access the ith element. I am using Scala and arrays do not really cut as a well-behaved functional data structure. Besides I needed dynamic resizing, persistence and good worst case complexity. Anyway I was trying to justify implementing Okasaki's Purely Functional Random Access Lists ..

The data structure provides O(log n) random access along with O(1) list primitives (head, tail and cons). The data structure uses a collection of complete binary trees as the representation of the list. Access is through preorder traversal, allowing O(1) head. And the number of trees is determined by a skew binary decomposition of integer n, for n being the size of the list. A skew binary decomposition is a collection of skew binary terms, where a skew binary term is of the form (2^k - 1). e.g. the skew binary decomposition of 43 is (1 + 1 + 3 + 7 + 31). If the list has 43 elements, we will represent it using a collection of 5 binary trees of sizes 1, 1, 3, 7 and 31.

For a random lookup, we first need to locate the tree, a log n operation and then search within the tree, another log n operation. Okasaki's paper has all the details of analysis. It is straightforward, yet elegant ..

Two implementations are given ..

the first one uses a list of tuples (List[(Int, Tree[T])]) to implement RAList[T]

object RAList {
  def empty[T]: RAList[T] = RAList.Nil

  private [immutable] case object Nil extends RAList[Nothing] {
    override def equals(that : Any) = this eq that.asInstanceOf[AnyRef]
  }

  private [immutable] case class Root[+T](trees: List[(Int, Tree[T])])
    extends RAList[T] {
  }
}

and

the second one uses a recursive data structure to implement the same. This is a better implementation than the first one and proved to be faster as well ..

object RandomAccessList {
  def empty[T]: RandomAccessList[T] = RandomAccessList.Nil

  private [immutable] case object Nil extends RandomAccessList[Nothing] {
    override def equals(that : Any) = this eq that.asInstanceOf[AnyRef]
  }

  private [immutable] case class Root[+T](size: Int, tree: Tree[T], rest: RandomAccessList[T])
    extends RandomAccessList[T] {
  }
}

I have setup a Google project, which I named pfds (Purely Functional Data Structures), containing both the implementations. The implementations come with some unit tests using specs. Everything is quite rudimentary at this stage, but I plan to enrich the project with more functional data structures .. Meanwhile you can browse through the source code, critique the implementation with suggestions to make them more Scalaesque ..

Monday, September 22, 2008

More on Functional Queues in Scala .. More Memoization

In my last post, I had talked about a Functional Queue implementation in Scala that used the laziness of Streams to improve upon the worst case bound of the dequeue operation. In reality, with the initial benchmarks, the algorithm didn't perform as well, and was, in fact, in some cases, slower than the one that used the strict List based implementation. However, the figures that came out from the benchmarks indicated that the strict List based implementation suffered from the rough edges due to the O(n) reverse operations that were performed on the List for the dequeue. The performance curves of the lazy implementation was much smoother.

However, as I mentioned in the post, the lazy implementation also has overheads, out of hidden constants, which came out in benchmarking with large numbers. This is mainly due to increased heap usage from thunk creation, which can be quite large. And this, in turn can eat up the heap and kick off garbage collection, thereby affecting the overall runtime of the algorithm.

Still the lazy implementation, being O(log n) worst case and O(1) amortized for dequeue proved to have more predictable numbers.

In the same paper that I referred to in the last post, Okasaki goes on with an improved version of the lazy algorithm taking more advantage of memoization. The trick is to ensure that whenever we have a call of the form rotate(xs.tail, ..), the referred tail should have already been pre-evaluated and memoized. This leads to an O(1) worst case for the dequeue operation as well.

Incrementally .. Again ..

Once again we turn back to incremental operation. We introduce one more Stream into the implementation ..

class MemoizedQueue[+A] private (front: Stream[A], rear: List[A], pending: Stream[A]) {
  //..
  //..
}

pending points to the portion of front, whose tail is yet to be evaluated. That is, when pending is Nil, it implies that every tail of front has been evaluated and memoized. How do we take advantage of this ?

Assume we have just had a rotate ..

front points to a non Nil List

rear points to a Nil List

pending points to the same List as front

Till the next rotate is called, everytime we call makeQ, we walk down front, evaluate each tail incrementally and advance pending, thereby ensuring that every tail of front has been pre-evaluated and memoized by the time of the next rotate. Here is the modified makeQ ..

class MemoizedQueue[+A] private (front: Stream[A], rear: List[A], pending: Stream[A]) {

  //..

  protected def makeQ[A](f: Stream[A], r: List[A], p: Stream[A]): MemoizedQueue[A] = {
    if (p isEmpty) {

      // pending is empty, i.e. all tails evaluated
      // time for the next rotate

      val fr = rotate(f, r, Stream.empty)
      new MemoizedQueue[A](fr, Nil, fr)
    } else {

      // evaluate one tail in pending
      new MemoizedQueue[A](f, r, p.tail)
    }
  }
  //..
  //..
}

An O(1) worst case Functional Queue in Scala

Since front.tail is incrementally evaluated and memoized, the cost of dequeue comes down to O(1) worst case. Here is the final version of the implementation that does every queue operation in worst case O(1) ..

object MemoizedQueue {
  val Empty: MemoizedQueue[Nothing] =
    new MemoizedQueue(Stream.empty, Nil, Stream.empty)
}

class MemoizedQueue[+A] private (front: Stream[A], rear: List[A], pending: Stream[A]) {

  def this() {
    this(Stream.empty, Nil, Stream.empty)
  }

  private def rotate[A](xs: Stream[A], ys: List[A], rys: Stream[A]): Stream[A] = ys match {
    case y :: ys1 =>
      if (xs isEmpty) Stream.cons(y, rys)
      else
        Stream.cons(xs.head, rotate(xs.tail, ys.tail, Stream.cons(y, rys)))
    case Nil =>
      throw new NullPointerException("ys is null")
  }

  protected def makeQ[A](f: Stream[A], r: List[A], p: Stream[A]): MemoizedQueue[A] = {
    if (p isEmpty) {

      // pending is empty, i.e. all tails evaluated
      // time for the next rotate

      val fr = rotate(f, r, Stream.empty)
      new MemoizedQueue[A](fr, Nil, fr)

    } else {

      // evaluate one tail in pending
      new MemoizedQueue[A](f, r, p.tail)
    }
  }

  def isEmpty: Boolean = front isEmpty

  def enqueue[B >: A](elem: B) = {
    makeQ(front, elem :: rear, pending)
  }

  def dequeue: (A, MemoizedQueue[A]) = {
    if (front isEmpty) throw new Exception("empty queue")
    (front.head, makeQ(front.tail, rear, pending))
  }

  def length = (pending.length + 2 * rear.length)
}

Monday, September 15, 2008

Lazy Functional Queues in Scala

Scala has a nice functional Queue implementation as part of scala.collection.immutable. It implements the standard technique of using a pair of lists [L, R], where L is the first portion of the queue and R is the rear portion in reverse order. As a result, the head of L is the first element of the queue and head of R points to the last element of the queue. Hence both can be fetched in O(1) time. Items are enqueued from the rear end, i.e. a O(1) insert in R, while they are dequeued from the front. Again a (supposedly) O(1) removal from L.

A nice trick indeed .. Here is the snippet from the Scala library ..

class Queue[+A](elem: A*) extends Seq[A] {

  protected val in: List[A] = Nil
  protected val out: List[A] = elem.elements.toList

  //..
  //..

  def enqueue[B >: A](elem: B) = mkQueue(elem :: in, out)

  def dequeue: (A, Queue[A]) = {
    val (newOut, newIn) =
      if (out.isEmpty) (in.reverse, Nil)
      else (out, in)
    if (newOut.isEmpty) throw new NoSuchElementException("queue empty")
    else (newOut.head, mkQueue(newIn, newOut.tail))
  }
  //..
  //..
}

Now have a close look at dequeue. When we try to do a dequeue, with out being empty, in needs to be reversed, a definite O(n) operation. But, for a list of length n, this occurs only when there has been n insertions since last reversal. Hence dequeue still remains an amortized O(1) operation. The point is whether you can afford an occasional O(n) ..

In his paper "Simple and Efficient Purely Functional Queues and Deques", Okasaki discusses a technique that uses lazy lists and incremental computation to improve the above worst case complexity of dequeue to O(log n). In fact, later in the paper, he also implements an improvement of this version using full memoization and lazy languages to implement a worst case O(1) dequeue. Okasaki has written a beautiful thesis on functional data structures and their analyses techniques, which later has been published as a book. A highly recommended piece for the interested ones ..

Being Lazy

The main trick of Okasaki's O(log n) dequeue algorithm is to make L a lazy list. Instead of doing the above reverse in a single step as a strict operation, we can have [L, R] incrementally changed to [L ++ reverse(R), []], so that when the actual need for the reverse comes, we already have R reversed and appended to L. The meat lies in the reverse operation being done incrementally and the ++ operation being done lazily. Okasaki's paper describes the algorithm and it's analysis in great detail.

Using Scala Streams

Scala offers lazy lists in the form of Stream[A], where Stream.cons is defined as ..

object Stream {
  object cons {
    def apply[A](hd: A, tl: => Stream[A]) = new Stream[A] {
      //..
    }
    //..
  }
  //..
}

and offers a lazy concatenation ..

How about using Stream[A] for the list L and try out Okasaki's algorithm to get a faster functional persistent Queue in Scala ? Name it OQueue[+A] ..

class OQueue[+A] {
  protected val front: Stream[A] = Stream.empty
  protected val sizef: Int = 0
  protected val sizer: Int = 0
  protected val rear: List[A] = Nil
  //..
  //..
}

Okasaki calls the incremental adjustment of [L, R] operation as a rotate. Here is the implementation of rotate in OQueue, which gets called during construction of the OQueue. Note that OQueue (and Scala's Queue), being functional, is also persistent - hence every mutating operation returns a new version of the data structure. This is unlike imperative data structures that implement destructive updates and maintain only the latest version. Hence rotate gets called as part of OQueue construction. However, rotate is not called for every call of the constructor - it is called only when |R| = |L| + 1 and we need to start the process of incremental reversal. And the trick is to start the process of reversing R just at the moment R gets longer than L and interleave the append to L with reverse of R. The details are too gory for a blog post and can be found with all it's complexities in the original paper.

class OQueue[+A] {
  //..
  //..
  protected def rotate[A](xs: Stream[A], ys: List[A], rys: Stream[A]): Stream[A] = ys match {
    case y :: ys1 =>
      if (xs isEmpty) Stream.cons(y, rys)
      else
        Stream.cons(xs.head, rotate(xs.tail, ys.tail, Stream.cons(y, rys)))
    case Nil =>
      //..
  }
  //..
}

and now the constructor / factory method for constructing an OQueue[A] ..

class OQueue[+A] {
  //..
  //..
  protected def makeQ[A](f: Stream[A], sf: Int, r: List[A], sr: Int): OQueue[A] = {
    if (sr <= sf) {
      new OQueue[A]() {
        override protected val front = f
        override protected val sizef = sf
        override protected val rear = r
        override protected val sizer = sr
      }
    } else {
      new OQueue[A]() {
        override protected val front = rotate(f, r, Stream.empty)
        override protected val sizef = sf + sr
        override protected val rear = Nil
        override protected val sizer = 0
      }
    }
  }
  //..
}

The other methods are fairly trivial and nicely fall in place ..

class OQueue[+A] {
  //..
  //..
  def isEmpty: Boolean = sizef == 0

  def enqueue[B >: A](elem: B) = {
    makeQ(front, sizef, elem :: rear, sizer + 1)
  }

  def dequeue: (A, OQueue[A]) = {
    (front.head, makeQ(front.tail, sizef - 1, rear, sizer))
  }

  def elements: Iterator[A] = (front.force ::: rear.reverse).elements

  def length = (sizef + sizer)
  //..
  //..
}

Fast ?

Analyzing lazy, persistent functional data structures is hard and so is benchmarking them. I did not do an elaborate benchmarking of OQueue with scala.collection.immutable.Queue. But from whatever I did, I could not get any speedup over the Scala library implementation. Scala, unlike Haskell, is not a lazy language. And lazy languages offer memoization of function arguments i.e. arguments are evaluated only when they are needed and once evaluated, they are cached for future reuse. In the above implementation of OQueue, we need to evaluate front.tail a number of times - an effectively lazy language can make this operation O(1) through memoization. Also, Scala, being implemented on top of the JVM, is not the best platform for heavy recursion and the above implementation of rotate is also not tail recursive, that Scala could optimize. Okasaki implemented the algorithm in SML, a lazy functional language and possibly got effective speedups over the traditional paired list implementation of queues.

I am not a Scala expert, any suggestions or critiques regarding the above implementation will be most welcome. Of course, I plan to do some more micro-benchmarking. But I think there may be dominant constants that offset the asymptotic complexity figures of the above analysis. Besides, the above implementation does not take any advantage of memoization, which is yet another secret sauce behind speedups of functional implementations. Anyway, it was, I feel, a worthwhile exercise trying my hands on implementing a functional data structure.

Postscript

I just discovered that Rich Hickey addresses this problem in a different way in Clojure. He has used the traditional paired list implementation, but without R storing the rear end in a reversed manner. Instead he uses a PersistentVector as the rear - and his PersistentVector is based upon a trie based implementation that gives near constant time access to elements.