Thursday, January 22, 2009

Seeking Scala equivalent for this Haskell snippet

I was going through an old post of Tom Moertel, where he defines a cute function in Haskell, clusterBy, that clusters an aggregate based on a signature function. Here are some examples of its usage from his post ..

*Main> let antwords = words "the tan ant gets some fat"

*Main> clusterBy length antwords
[["the","tan","ant","fat"],["gets","some"]]

*Main> clusterBy head antwords
[["ant"],["fat"],["gets"],["some"],["the","tan"]]

*Main> clusterBy last antwords
[["the","some"],["tan"],["gets"],["ant","fat"]]


The Haskell implementation uses Arrows and is super elegant and concise ..

import Control.Arrow ((&&&))
import qualified Data.Map as M

clusterBy :: Ord b => (-> b) -> [a] -> [[a]]
clusterBy f = M.elems . M.map reverse . M.fromListWith (++) . map (&&& return)


I was trying to implement the same in Scala, but could only proceed to the extent of having separate functions clusterByLength, clusterByHead, clusterByLast etc. ..

val l: List[String] = List("the tan ant gets some fat".split(" "): _*)

def clusterByLength = {
  val m = new HashMap[Int, List[String]]
  l.map {=> (x.length, x)}.foreach {=>
    m.put(v._1, v._2 :: m.get(v._1).getOrElse(List[String]()))
  }
  m.keys.toList.sort((e1, e2) => e1 < e2).map {=> m.get(x).get.reverse}
}

def clusterByHead = {
  val m = new HashMap[String, List[String]]
  l.map {=> (x.substring(0, 1), x)}.foreach {=>
    m.put(v._1, v._2 :: m.get(v._1).getOrElse(List[String]()))
  }
  m.keys.toList.sort((e1, e2) => e1 < e2).map {=> m.get(x).get.reverse}
}

def clusterByLast = {
  val m = new HashMap[String, List[String]]
  l.map {=> (x.substring(x.length - 1), x)}.foreach {=>
    m.put(v._1, v._2 :: m.get(v._1).getOrElse(List[String]()))
  }
  m.keys.toList.sort((e1, e2) => e1 < e2).map {=> m.get(x).get.reverse}
}


Is there any way that I can raise the level of abstraction for the above implementation, and have a Scala equivalent of the Haskell clusterBy ?

10 comments:

Anonymous said...

Something like http://paste.pocoo.org/show/100767/ ?

Jorge Ortiz said...

Here's my contribution:

http://paste.pocoo.org/show/100774/

Jorge Ortiz said...

Oh, yuck. The inner sets are mutable. That's no good.

Change .map(m) to either

.map(m.andThen(_.readOnly))

or

.map(m.andThen(collection.immutable.Set.empty ++ _))

Neither is ideal, but there ya go.

Anonymous said...

Hello!
I don't know if your looking for this arrow but maybe it helps:
http://lucdup.blogspot.com/2008/11/scala-monads-and-arrows.html

Dave said...

It may be out-of-date, but this looks useful:
http://scala.sygneca.com/code/arrows

Johannes said...

My try:

http://gist.github.com/50572

Actually almost all the needed functionality is Haskell's fromListWith function so I implemented that one first.

Alexey Naidyonov said...

Obvious solution with mutable hashmap:

http://paste.pocoo.org/show/100848/

Tracy Harms said...

Not what you were asking for, I understand, but here's a solution in J:

cluster=: 1 :'] <@#~ [:(="_ 0 ~.)u&.>'

Eric said...

A solution using arrows as defined here:

http://paste.pocoo.org/show/100931

Note: I had to define "new Function1" for type inference to work properly when composing the arrows. I don't know if there a simpler way to do it.

Vassil Dichev said...

Reading the answers was an interesting exercise in itself, the common theme being folding into a map, where the function result value is used as a key for the map.

But the fact that there's an arrows implementation in Scala is even better.