Thursday, January 22, 2009

Seeking Scala equivalent for this Haskell snippet

I was going through an old post of Tom Moertel, where he defines a cute function in Haskell, clusterBy, that clusters an aggregate based on a signature function. Here are some examples of its usage from his post ..

*Main> let antwords = words "the tan ant gets some fat"

*Main> clusterBy length antwords

*Main> clusterBy head antwords

*Main> clusterBy last antwords

The Haskell implementation uses Arrows and is super elegant and concise ..

import Control.Arrow ((&&&))
import qualified Data.Map as M

clusterBy :: Ord b => (-> b) -> [a] -> [[a]]
clusterBy f = M.elems . reverse . M.fromListWith (++) . map (&&& return)

I was trying to implement the same in Scala, but could only proceed to the extent of having separate functions clusterByLength, clusterByHead, clusterByLast etc. ..

val l: List[String] = List("the tan ant gets some fat".split(" "): _*)

def clusterByLength = {
  val m = new HashMap[Int, List[String]] {=> (x.length, x)}.foreach {=>
    m.put(v._1, v._2 :: m.get(v._1).getOrElse(List[String]()))
  m.keys.toList.sort((e1, e2) => e1 < e2).map {=> m.get(x).get.reverse}

def clusterByHead = {
  val m = new HashMap[String, List[String]] {=> (x.substring(0, 1), x)}.foreach {=>
    m.put(v._1, v._2 :: m.get(v._1).getOrElse(List[String]()))
  m.keys.toList.sort((e1, e2) => e1 < e2).map {=> m.get(x).get.reverse}

def clusterByLast = {
  val m = new HashMap[String, List[String]] {=> (x.substring(x.length - 1), x)}.foreach {=>
    m.put(v._1, v._2 :: m.get(v._1).getOrElse(List[String]()))
  m.keys.toList.sort((e1, e2) => e1 < e2).map {=> m.get(x).get.reverse}

Is there any way that I can raise the level of abstraction for the above implementation, and have a Scala equivalent of the Haskell clusterBy ?


Anonymous said...

Something like ?

Jorge Ortiz said...

Here's my contribution:

Jorge Ortiz said...

Oh, yuck. The inner sets are mutable. That's no good.

Change .map(m) to either



.map(m.andThen(collection.immutable.Set.empty ++ _))

Neither is ideal, but there ya go.

Anonymous said...

I don't know if your looking for this arrow but maybe it helps:

Dave said...

It may be out-of-date, but this looks useful:

Johannes said...

My try:

Actually almost all the needed functionality is Haskell's fromListWith function so I implemented that one first.

Alexey Naidyonov said...

Obvious solution with mutable hashmap:

Tracy Harms said...

Not what you were asking for, I understand, but here's a solution in J:

cluster=: 1 :'] <@#~ [:(="_ 0 ~.)u&.>'

Eric said...

A solution using arrows as defined here:

Note: I had to define "new Function1" for type inference to work properly when composing the arrows. I don't know if there a simpler way to do it.

Vassil Dichev said...

Reading the answers was an interesting exercise in itself, the common theme being folding into a map, where the function result value is used as a key for the map.

But the fact that there's an arrows implementation in Scala is even better.