Monday, May 10, 2010

Laziness in Clojure - Some Thoughts

Some readers commented on my earlier post on Thrush implementation in Clojure that the functional way of doing stuff may seem to create additional overhead through unnecessary iterations and intermediate structures that could have been avoided using traditional imperative loops. Consider the following thrush for the collection of accounts from the earlier post ..

(defn savings-balance
  [& accounts]
  (->> accounts
       (filter #(= (:type %) ::savings))
       (map :balance)
       (apply +)))


Is it one iteration of the collection for the filter and another for the map ?

It's not. Clojure offers lazy sequences and the functions that operate on them all return lazy sequences only. So in the above snippet Clojure actually produces a composition of the filter and the map that act on the collection accounts in a lazy fashion. Of course with apply, everything is computed since we need to realize the full list to compute the sum. Let's look at the following example without the sum to see how Clojure sequences differ from a language with eager evaluation semantics ..

user> (def lazy-balance
      (->> accounts
           (filter #(= (:type %) ::savings))
           (map #(do (println "getting balance") (:balance %)))))
#'user/lazy-balance


lazy-balance has not been evaluated - we don't yet have the printlns. Only when we force the evaluation we have it computed ..

user> lazy-balance
(getting balance
getting balance
200 300)


Had Clojure been a strict language it would have been stupid to follow the above strategy for a large list of accounts. We would have been doing multiple iterations over the list generating lots of intermediate structures to arrive at the final result. An imperative loop would have rested the case much more cheaply.

Laziness improves compositionality. With laziness, Clojure sequences and the higher order functions on them essentially reify loops so that you can transform them all at once. As Cale Gibbard defends laziness in Haskell with his comments on this LtU thread .. "It's laziness that allows you to think of data structures like control structures."

Clojure is not as lazy as Haskell. And hence the benefits are also not as pervasive. Haskell being lazy by default, the compiler can afford to make aggressive optimizations like reordering of operations and transformations that Clojure can't. With Haskell's purity that guarantees absence of side-effects, deforestation optimizations like stream fusion generates tight loops and minimizes heap allocations. But I hear that Clojure 1.2 will have some more compiler level optimizations centered around laziness of its sequences.

Laziness makes you think differently. I had written an earlier post on this context with Haskell as the reference language. I have been doing some Clojure programming of late. Many of my observations with Haskell apply to Clojure as well. You need to keep in mind the idioms and best practices that laziness demands. And at many times they may not seem obvious to you. In fact with Clojure you need to know the implementation of the abstraction in order to ensure that you get the benefits of lazy sequences.

You need to know that destructuring's & uses nthnext function which uses next that needs to know the future to determine the present. In short, next doesn't fit in the lazy paradigm.

The other day I was working on a generic walker that traverses some recursive data structures for some crap processing. I used walk from clojure.walk, but later realized that for seqs it does a doall that realizes the sequence - another lazy gotcha that caught me unawares. But I actually needed to peek into the implementation to get to know it.

Being interoperable with Java is one of the benefits of Clojure. However you need to be aware of the pitfalls of using Java's data structures with the lazy paradigm of Clojure. Consider the following example where I put all accounts in a java.util.concurrent.LinkedBlockingQueue.

(import '(java.util.concurrent LinkedBlockingQueue))
(def myq (new LinkedBlockingQueue))
(doto myq (.put acc-1) (.put acc-2) (.put acc-3))


Now consider the following snippet that does some stuff on the queue ..

(let [savings-accounts (filter #(= (:type %) ::savings) myq)]
     (.clear myq)
     (.addAll myq savings-accounts))


Should work .. right ? Doesn't ! filter is lazy and hence savings-accounts is empty within the let-block. Then we clear myq and when we do an addAll, it fails since savings-accounts is still empty. The solution is to use a doall, that blows away the laziness and realizes the filtered sequence ..

(let [savings-accounts (doall (filter #(= (:type %) ::savings) myq))]
     (.clear myq)
     (.addAll myq savings-accounts))


Of course laziness in Clojure sequences is something that adds power to your abstractions. However you need to be careful on two grounds :
  • Clojure as a language is not lazy by default in totality (unlike Haskell) and hence laziness may get mixed up with strict evaluation leading to surprising and unoptimized consequences.
  • Clojure interoperates with Java, which has mutable data structures and strict evaluation. Like the situation I described above with LinkedBlockingQueue, sometimes it's always safe to bite the bullet and do things the Java way.

8 comments:

peropaal said...

That was a nice read. Thanks!

fogus said...

"But I actually needed to peek into the implementation to get to know it."

One of the strong points in Clojure's core functions is that when a function is non-lazy its docs says so. You might have been able to save some time by running (doc walk) instead of looking into its implementation. By and large, relying on docs in this way is not reliable, but the core functions are always accurate. :f

Debasish said...

@fogus - yeah .. (doc walk) could have saved some time. But even after seeing the doc I would have peeked into the implementation anyway :) .. But that's only me ..

Bill Smith said...

Thank you for the article. I might have titled it, "Laziness in Clojure is a Leaky Abstraction."

Debasish said...

@Bill - I like your title. Laziness is an orthogonal feature of a language which crosscuts across all features. If it's not pervasive, then you will have some leaky abstractions. Haskell is lazy by default which is pervasive. Hence in Haskell you get the benefits of full compositionality, unlike Clojure. Glad that you liked the post.

Alex Miller said...

I am still learning when things are lazy and when they're not. Sometimes I wish there was a tool (in Clojure or maybe in IDEs, etc) that highlighted where things are eager or lazy or where a head is held, etc.

I very much appreciate that looking at the source (source walk) is just as easy as looking at the docs (doc walk).

Debasish said...

@alex - I also love looking around in the source code. After all it gives pleasure to my eyes looking at idiomatic Clojure code :) .. But we need to do this because laziness is not all pervasive in Clojure unlike Haskell. Often it creates problems when u try to mix lazy with strict evaluation or inter-operate with mutable Java data structures. I am not saying that's outright bad. But it makes you think in a different way when u need to peek at an implementation to know its external behaviors.

naedyr said...

scala's lazy keyword helps to keep laziness more distinct