Monday, September 01, 2008

Breadth-First Numbering : Okasaki's Beautiful Algorithm

Over the weekend I was grokking functional data structures and bumped into a functional pearl by Chris Okasaki from the publications of ICFP 2000. The paper is titled Breadth-First Numbering: Lessons from a Small Exercise in Algorithm Design - it indeed unfolds a beautiful algorithm and explains in detail the philosophy and thought process that went into designing it the unconventional way that he did. The part which intrigued me most was that this ingenuous (and yet simple) design did seem to him a mostly straightforward answer, while almost all other functional programmers to whom he presented the problem over the next year, did come up with a baroque solution approach, in a completely different direction than his own. Okasaki calls this the communal blind spot, that was steering programmers away from what seemed to be a very natural solution.

The Problem

Quoting it straight from the paper ..

Given a tree T, create a new tree of the same shape, but with the values at the nodes replaced by the numbers 1 .. |T| in breadth first order

For example, given the following tree ..

breadth-first numbering should yield the tree ..

Solve It !

Traditional solution approach would be one based on an explicit level based traversal of the tree that makes multiple passes over each level computing the length, collecting the children and finally rebuilding the level. Nothing wrong with the approach .. only slightly baroque .. and uses the plain old lists as the data structure for processing the tree. So, nothing to be excited about, no aha! moments and nothing instrinsically beautiful to write about.

Okasaki's approach was based on the fact that there already exists a breadth-first search algorithm based on functional queues, that allows O(1) queue operations. Hence using the queue as the sequencing ADT allows constant time addition and removal of trees from the front and back of the queue. He simply extended it to compute the breadth-first numbering. And he started with a more generalized version of the problem of computing the breadth-first numbering for a forest and using it to solve the same for a single tree.

It actually took me very little time to implement the functional algorithm in Scala. In fact it took much more time to implement the traditional level based traversal algorithm, since it involved complex list manipulations. Once again a lesson that proper choice of data structures always lead to more elegant solutions.

Here is the Scala implementation ..

trait Tree[+T]
case class Node[+T](data: T, left: Tree[T], right: Tree[T]) extends Tree[T]
case object E extends Tree[Nothing]

def bfsNumForest[T](i: Int, trees: Queue[Tree[T]]): Queue[Tree[Int]] = {
  if (trees.isEmpty) Queue.Empty
  else {
    trees.dequeue match {
      case (E, ts) =>
        bfsNumForest(i, ts).enqueue[Tree[Int]](E)
      case (Node(d, l, r), ts) =>
        val q = ts.enqueue(l, r)
        val qq = bfsNumForest(i+1, q)
        val (bb, qqq) = qq.dequeue
        val (aa, tss) = qqq.dequeue
        tss.enqueue[org.dg.collection.BFSNumber.Tree[Int]](Node(i, aa, bb))

def bfsNumTree[T](t: Tree[T]): Tree[Int] = {
  val q = Queue.Empty.enqueue[Tree[T]](t)
  val qq = bfsNumForest(1, q)

The beauty of the algorithm is that the implementation looks exactly the same as the specification, so long you are implementing in a functional language. In a recent blog post, Okasaki has also come up with a visual representation of this algorithm. Take it side by side with the implementation .. and you have the "quality without a name" feeling.


Germán said...

It will take me some effort to grok this... but I gave it a go before looking at this solution. I'm glad that even as a functional newbie I managed to traverse the tree only once and avoided using a list. This is what I came up with:

case class Tree
case object NullNode extends Tree
case class TreeNode[T](value:T, left:Tree, right:Tree) extends Tree

def breadthNode(t:Tree, curVal:Int, nextLevelVal:Int):(Tree,Int) = t match {
case NullNode => (NullNode, curVal)
case tn:TreeNode[_] => {
val lVal = nextLevelVal
val rVal = tn.left match {
case rn:TreeNode[_] => lVal + 1
case _ => lVal
val (leftNode, nextRight) = breadthNode(tn.left, lVal, rVal + 1)
val (rightNode, nextAvail) = breadthNode(tn.right, rVal, nextRight)
(TreeNode[Int](curVal, leftNode, rightNode), nextAvail)

val (bfnTree, v) = breadthNode(tree, 1, 2)

Blue said...

I wish I could remember where I saw the link to your post. (Very nice.) I opened it (your post) Friday, but didn't get to it until this morning.

The algorithm you implement reminds me of the discussion in a paper on software abstraction that I gave last Spring in Leipzig. The essence of the idea is that breadth-first and depth-first are identical except in that with depth first one pushes subsidiary elements onto the front of a list (in effect making it a stack) and with breadth-first one tacks them on to the back (in effect making it a queue).

If you're interested, the paper ("Abstraction abstracted") is here: