Ruminations of a Programmer: Collector Idiom in Scheme

Monday, August 13, 2007

Collector Idiom in Scheme - Is this a Design Pattern ?

As a newbie into the world of functional programming (one of my new year resolutions for 2007), I started my venture to learn the skills of constructing recursive processes and manipulating recursive data structures with Friedman and Felleisen's The Little Schemer. I am now into Chapter 8, Lambda The Ultimate, sailing through The Ninth Commandment (Abstract Common Patterns with a new function) and into Page 137 of the text. And here I meet multirember&co(), destined to be my thought-process-companion for the last one week. It took me quite some cycles of stepping through the debugger of DrScheme to figure out the actual computation process going on inside the stacks of lambdas that the function defines.

Here is a copy of the function for those uninitiated to the amazing text ..

(define multirember&co
  (lambda (a lat col)
    (cond
      ((null? lat)
       (col (quote()) (quote())))
      ((eq? (car lat) a)
       (multirember&co a
                       (cdr lat)
                       (lambda (newlat seen)
                         (col newlat
                              (cons (car lat) seen)))))
      (else
       (multirember&co a
                       (cdr lat)
                       (lambda (newlat seen)
                         (col (cons (car lat) newlat)
                              seen)))))))

What does this function call (multirember&co a lat f) do ?

In the words of the authors ..

It looks at every atom of the lat to see whether it is eq? to a. Those atoms that are not are collected in one list ls1; the others for which the answer is true are collected in a second list ls2. Finally it determines the value of (f ls1 ls2).

The statement of work is quite straightforward. I tried implementing the same in another functional language, Erlang, and it did not take me too much time to come up with a supposedly meaningful Erlangy implementation ..

filter_atom(A, L, Col) -> filter_atom(A, L, [], [], Col).

filter_atom(A, [H|T], Has, NotHas, Col) ->
  case (H =:= A) of
    true    -> filter_atom(A, T, [H|Has], NotHas, Col);
    false   -> filter_atom(A, T, Has, [H|NotHas], Col)
  end;

filter_atom(A, [], Has, NotHas, Col) ->
  Col(Has, NotHas).

The Erlang implementation looks simple enough and does not have the accidental complexity that we find in the Scheme implementation.

Honest confession : I am also a newbie in Erlang - any suggestions for improvements towards a more Erlangy modeling is welcome!

Why does the Scheme implementation look so complex ?

Coming back to the Scheme implementation, I am not sure if this is the most idiomatic implementation of the solution. But it definitely is an ample demonstration of the power of closures (lambdas). The inner lambdas build the new collectors by getting stuff from the enclosing scope and constructs values that are handed over to the next collector in the stack. This is precisely what the Tenth Commandment preaches - Build functions to collect more than one value at a time.

Is this a Design Pattern ?

The idiom of using the Collector for building up values has been used in subsequent function implementations as well - refer to multiinsertLR&co() and evens-only*&co() in the same chapter of the book. Given a list, and some predicate, the Collector idiom uses continuation passing style (CPS) to create closures that successively collects multiple values from the list. Is there a design pattern lurking around ?

I am not qualified enough to comment on the virtues of the above Scheme implementation. One great thing about it is that the above implementation is tail recursive. I can come up with an accumulator based implementation, which is not tail recursive, but possibly has less of an accidental complexity.

What do the experts have to say about the collector based idiom ?

10 comments:

tof said...: In fact, there is a more concise way, using lists:partition/2 from the stdlib of OTP:

filter_atom(A, L, Col) ->
{Has, NotHas } = lists:partition(fun(X) -> X == A end, L),
Col(Has, NotHas).

But, lists:partition/2 does exactly the same thing as your code, except it reverses the lists at the end of the partitioning, so the order is respected.; Monday, August 13, 2007 at 5:49:00 PM GMT+5:30
JeremyHussell said...: There are two bugs in your Erlang rewrite of the Scheme funtion. Has and NotHas need to be reversed, as has already been pointed out, and Col is supposed to take the list of unequal atoms as its first argument.

An improved version might look like this:

filter_atom(A, L, Col) ->
filter_atom(A, L, Col, [], []).

filter_atom(_, [], Col, NotHas, Has) ->
Col(lists:reverse(NotHas), lists:reverse(Has));

filter_atom(A, [A|T], Col, NotHas, Has) ->
filter_atom(A, T, Col, NotHas, [A|Has]);

filter_atom(A, [H|T], Col, NotHas, Has) ->
filter_atom(A, T, Col, [H|NotHas], Has).

However, if you really want to match what the Scheme implementation is doing, this would be even better:

filter_atom(_, [], Col) ->
Col([], []);

filter_atom(A, [A|T], Col) ->
filter_atom(A, T, fun(NotHas, Has) -> Col(NotHas, [A|Has]) end);

filter_atom(A, [H|T], Col) ->
filter_atom(A, T, fun(NotHas, Has) -> Col([H|NotHas], Has) end).; Monday, August 13, 2007 at 8:03:00 PM GMT+5:30
Unknown said...: Thanks a lot for all your suggestions .. as I mentioned in the blog, I have just started learning Erlang. Hence all your suggestions will help me a lot.

Reddit has also picked up this post .. the comments thread is here.; Tuesday, August 14, 2007 at 11:09:00 AM GMT+5:30
Anonymous said...: Ignoring the continuation passing style that is the point of this example, here is how you can do this using the iterate package in Common Lisp

(defun remove-multiple-members (a list)
(iter (for x in list)
   (if (eq a x)
    (collect x into matched)
    (collect x into unmatched))
   (finally (return (values matched unmatched))))); Tuesday, August 14, 2007 at 5:06:00 PM GMT+5:30
matthias said...: Here is the response from the author:

First, TLL/TLS uses 'poems' to discuss insights into program design and the programmng language per se. It is not about expressing things concisely, using some library.

Second, TLL/TLS does not use concise Scheme/Lisp notation for historical reasons. If you were to use such a notation in DrScheme, it might look like this:

[please pardon the mess, your blog doesn't accept pre and code tags]

(define (multirember&co a L col)
(cond
((null? L)
(col (quote()) (quote())))
((eq? (car L) a)
(multirember&co a (cdr L) (λ (newlat seen) (col newlat `(,(car L) . ,seen)))))
(else
(multirember&co a (cdr L) (λ (newlat seen) (col `(,(car L) . ,newlat) seen))))))

or perhaps like that:

(require (lib "match.ss"))

(define (multirember&co.v2 a L col)
(match L
(()
(col (quote()) (quote())))
(((? (λ (x) (eq? a x)) H) . T)
(multirember&co a T (λ (notHas has) (col notHas `(,H . ,has)))))
((H . T)
(multirember&co a T (λ (notHas has) (col `(,H . ,notHas) has))))))

Now it starts looking like Erlang or any other FPL with pattern matching except that the clauses of the function are surrounded by a pair of parens, which I very much like. Of course, real programming language reserachers never discusss syntax, only semantics and pragmatics.

Third, if you asked me to write this function in a concise manner, using standard libraries, I'd come up with this in 20 seconds:

(define (multirember&co a l col)
(define (pred x) (eq? x a))
(col (filter (compose not pred) l) (filter pred l)))

If you gave me another minute, I'd look in Olin Shivers's library whether it provides listpartition.

Fourth if you wanted to save a factor of 2, you would write a local iterative function (using let loop) that accumulates two lists (with tc), and applies col via another tc to the reverse of the two accumulators.

Finally, the example introduces the idea of "fourth" via CPS. If you follow some more language lore from the 1980s, you can derive the "fourth" solution from the poem, which is the point of these books.; Tuesday, August 14, 2007 at 9:37:00 PM GMT+5:30
Unknown said...: Hi Matthias -

Thanks a lot for your enlightening comment. I feel flattered! I am an absolute newbie trying to learn Lisp and functional programming coming from a 10+ years background of Java and C++. Your explanation of the philosophy behind the design of multirember&co() as in TLS, will indeed help me in a deeper understanding of the poem that Lisp is.
Thanks and Regards.; Tuesday, August 14, 2007 at 11:38:00 PM GMT+5:30
Anonymous said...: In Haskell:

multirememberandco :: Eq a => a -> [a] -> ([a] -> [a] -> b) -> b
multirememberandco a ls f = uncurry f $ parition (==a) ls

easy as pie :); Wednesday, August 15, 2007 at 3:51:00 AM GMT+5:30
matthias said...: You're welcome.

And don't trust the Haskell guy. He didn't even bother to run the program that he posts; otherwise he would have discovered the typo. If he actually read the discusssion, he would have realized that it is not about libraries :-); Wednesday, August 15, 2007 at 4:33:00 AM GMT+5:30
Genji said...: In the last few weeks I've actually had the need to do just this kind of collection in a non-toy program I'm re-implementing in Scheme in an attempt to get my programming brain a bit more functionalised.

I'd read TLS a few months before and my brain exploded somewhere in the middle of this chapter (what was left didn't survive the Y-combinator stuff later on!)

Anyway a bell rang later when I need to collect values and I went back to TLS and worked through some multirember&co examples by hand until I got it... so I munged the code around to make it do what I needed and everything was fine.

But ever since, I've had this nagging doubt about whether or not this was the canonical pattern for doing this in Scheme... having been doing some Lisp reading, I was more than sure that 'there is more than one way to do it'... so I googled and found your blog.

So a belated thanks to you and Matthias for formulating the question better than I could have and answering it (which I obviously could not have:))!; Monday, June 9, 2008 at 9:00:00 PM GMT+5:30
Genji said...: There is little I can add to Matthias' exposition and I wouldn't presume to do so... but for fellow Scheme neophytes who might google this at a later date, it's worth stressing the point that in addition to TLS being all about the poetry of computation, both *let and named let are not introduced or used in this book* - because TLS is all about recursion and functions... it's turtles (oops lambdas!) all the way down.

Once you have let and named let, you can recur on your list of input data *and* update other list(s) of collected stuff at the same time - without the mind-bending collector function.

I'm glad to have now dimly grasped that on some meta level this ties in with continuation passing.... and whilst lambdas all the way down might be beautiful, I'm going to be using let/named let for until further enlightenment is achieved.; Tuesday, June 10, 2008 at 1:49:00 PM GMT+5:30