Thursday, October 09, 2008

To Tail Recurse or Not

Today I am going to talk about maps, not the java.util.Map, but, map as in map-reduce or map as in scala.List.map. Of course all of us know what map is and map does, and how this powerful concept has been used in all functional languages that we use on a regular basis. I will talk maps in the context of its implementation, as we find in all the languages, which brings out some of the important principles of using tail recursion.

A couple of months back, there was a thread in the Erlang discussion list, where someone wondered why the implementation of map in Erlang stdlib Lists.erl is NOT tail recursive. Here it is, faithfully copied from Erlang stdlib ..


map(F, [H|T]) ->
    [F(H)|map(F, T)];
map(F, []) when is_function(F, 1) -> [].



Clearly not a tail recursive one ..

The thread of discussion explains the rationale behind such implementation. And it has a lot to do with the compiler optimizations that have been done in R12B. Here is a quote from the Erlang efficiency guide, which explains the myth that tail-recursive functions are ALWAYS much faster than body-recursive ones ..

"In R12B, there is a new optimization that will in many cases reduces the number of words used on the stack in body-recursive calls, so that a body-recursive list function and tail-recursive function that calls lists:reverse/1 at the end will use exactly the same amount of memory. lists:map/2, lists:filter/2, list comprehensions, and many other recursive functions now use the same amount of space as their tail-recursive equivalents."

Since a tail recursive map needs to do a reverse ..

  • the incremental space that it needs to keep both the lists makes it equally space consuming with the body-recursive version

  • it puts pressure on the garbage collector, since the space used by the temporary list cannot be reclaimed immediately



The general advice is that you need to measure the timings of your use case and then decide whether to tail recurse or not.

I was curious enough to check what the Scala library does for map implementation. Here is the snippet from scala.List ..


final override def map[B](f: A => B): List[B] = {
  val b = new ListBuffer[B]
  var these = this
  while (!these.isEmpty) {
    b += f(these.head)
    these = these.tail
  }
  b.toList
}



It is not a functional implementation at all. In fact it adopts a clever use of localized mutation to achieve performance. Using mutation locally is a perfectly valid technique and a definite area where hybrid non-pure languages score over the purer ones. The contract for map is purely functional, does not have any side-effect, yet it uses localized side-effects for performance. This would not have been possible in Erlang. Neat!

Just for fun, I cooked up a tail recursive version of map in Scala, as a standalone function ..


def tr_map[B](f: A => B, l: List[A]): List[B] = {
  def iter_map(curr: List[B], l: List[A]): List[B] = l match {
    case Nil => curr.reverse
    case _ => iter_map(f(l.head) :: curr, l.tail)
  }
  iter_map(Nil, l)
}



It performed very slow compared to the native librray version.

Scala also offers a reverseMap function which does not need the additional reverse, which the tail-recursive map would require. And not surprisingly, the implementation is based on tail recursion and pattern matching ..


def reverseMap[B](f: A => B): List[B] = {
  def loop(l: List[A], res: List[B]): List[B] = l match {
    case Nil => res
    case head :: tail => loop(tail, f(head) :: res)
  }
  loop(this, Nil)
}



So, how does the story end ? Well, as usual, benchmark hard, and then decide ..

13 comments:

alex said...

If tail call optimisation could be efficiently done in the JVM, Clojure would have it.

Ricardo Lima said...

Does the Scala compiler optimize for tail recursion? Because if it doesn't, using recursion may not be a good idea actually

Ricky Clarkson said...

The Scala compiler optimises out tail recursion if it's a tail call to the same method. Most other cases are unsafe because of separate compilation, so JVM support for it is important.

Chaddaï said...

>It is not a functional implementation at all. In fact it adopts a clever use of localized
> mutation to achieve performance. Using mutation locally is a perfectly valid technique and a
> definite area where hybrid non-pure languages score over the purer ones.


All pure languages have generally ways to do that too. For instance Haskell allows you to do mutation and local state within the ST monad, a clever use of the type system ensure that the result is always externally pure (contrary to non-pure language where the proof of that is left for the developer). If you need to do really unsavory stuff, you can always revert to unsafe IO to have unrestricted side-effect (of course in this case you have to prove the external purity of your code by yourself).

It could also be noted that the map in Haskell isn't written in tail-recursive style but still execute in constant stack space thanks to the evaluation model (lazy evaluation). This is true of a lot of other interesting functions.

Debasish said...

"All pure languages have generally ways to do that too."

Sure .. but it does not feel idiomatic to do side-effected programming in purer languages like Haskell. OTOH in non-pure languages like Scala, you don't feel out of the way to do localized mutation, as Scala.List does for most of its functions.

"map in Haskell isn't written in tail-recursive style but still execute in constant stack space thanks to the evaluation model (lazy evaluation)."

In Scala we have scala.Stream, which is a lzy List implementation. But Scala, unlike Haskell, is NOT lazy, by default.

Chaddaï said...

"but it does not feel idiomatic to do side-effected programming in purer languages like Haskell."

Sure, that's the point !! We want to encourage keeping your style pure.

That said, I think you're overestimating the cost of using something like ST monad : it's just a matter of putting "runST" before monadic code that looks very much like imperative code in a traditional language.

There are some algorithms that are best written with mutable state, but they mostly use localized state that can be restricted to a ST function properly, getting the performance benefits without compromising the purity of the rest of your code.

"But Scala, unlike Haskell, is NOT lazy, by default."

I didn't say anything about the lazyness of Scala, just that one other way to get the same memory performance as tail-recursion in situation where you can't use tail-recursion can be lazy evaluation without the need for local mutation.

Don't you prefer this version of map to the first Scala version :

map f [] = []
map f (x:xs) = f x : map f xs

Debasish said...

"I think you're overestimating the cost of using something like ST monad"

I think it is more of a mental setup working with Haskell - I have seen purists in Haskell world, frowning upon monad usage. But I do realize that the implementation cost is not much, and is statically verifiable.

"Don't you prefer this version of map to the first Scala version"

Sure I do and I like the terseness. Here is the corresponding Stream.map in Scala, not as terse, but still enough:

def map[B](f: A => B): Stream[B] =
  if (isEmpty) Stream.empty
  else Stream.cons(f(head), tail map f)

Ricky Clarkson said...

From my observation, only poor Haskell programmers frown at monadic code. That said, when there's no benefit from writing the imperative version, there's no benefit from writing the imperative version.

Debasish said...

"when there's no benefit from writing the imperative version"

Sure .. for Scala implementation, there were 2 forces at play :
1. JVM is not the best of the platforms for functional programming (at least not yet .. tail recursion .. will it come with Java 7 ?)
2. Extracting the max of performance with the container classes, for which they had to go the imperative way.

In that process however, they have lost on the concurrency issue. Functional implementations are more easily parallelized, e.g. pmap implementation of Joe in Erlang. Mutating ones are not. List.map in Scala uses ListBuffer - hence u cannot parallelize map easily in Scala.

Ricky Clarkson said...

As a parallel map will do things in a different order than a sequential map, it's not generally safe to take a possibly-side-effecting mapping and make it work in parallel.

Also, the fact that List.map uses a ListBuffer has no bearing on what ParallelList.map (or List.parallelMap) might use.

Debasish said...

"As a parallel map will do things in a different order than a sequential map, it's not generally safe to take a possibly-side-effecting mapping and make it work in parallel."

Are you talking about the input function of map being a side-effected one ? I think it is not a common practice - the function which you use in a map is strongly recommended to be side-effect-free. Otherwise map-reduce will not work.

"Also, the fact that List.map uses a ListBuffer has no bearing on what ParallelList.map (or List.parallelMap) might use."

The point I was trying to make is that had the implementation been functional (without any mutating data structure), I could have forked the function over the collection elements in parallel, possibly using actors. And then reduced them to a new collection. Refer to pmap implementation in Erlang. In the standard Scala List.map it is not possible, since it uses ListBuffer. It could have been possible if we used an Array instead, but in that case we would lose out on the performance, since Array.toList is O(n), as opposed to ListBuffer.toList, which is O(1).

Ricky Clarkson said...

From what I can gather, you can already do that forking. Unless the side effect escapes the implementation of map, it's quite safe to run it in parallel, i.e., it's reentrant.

Debasish said...

Ricky -
Only one point .. map is an order preserving transformation, i.e. the result collection needs to preserve the order of elements according to the original collection. With Scala library List.map implementation that uses ListBuffer, you cannot guarantee this order efficiently during the gathering phase of the parallel map implementation. This can be done more efficiently using Arrays. I had blogged on this before.