Thursday, January 22, 2009

Seeking Scala equivalent for this Haskell snippet

I was going through an old post of Tom Moertel, where he defines a cute function in Haskell, clusterBy, that clusters an aggregate based on a signature function. Here are some examples of its usage from his post ..

*Main> let antwords = words "the tan ant gets some fat"

*Main> clusterBy length antwords
[["the","tan","ant","fat"],["gets","some"]]

*Main> clusterBy head antwords
[["ant"],["fat"],["gets"],["some"],["the","tan"]]

*Main> clusterBy last antwords
[["the","some"],["tan"],["gets"],["ant","fat"]]


The Haskell implementation uses Arrows and is super elegant and concise ..

import Control.Arrow ((&&&))
import qualified Data.Map as M

clusterBy :: Ord b => (-> b) -> [a] -> [[a]]
clusterBy f = M.elems . M.map reverse . M.fromListWith (++) . map (&&& return)


I was trying to implement the same in Scala, but could only proceed to the extent of having separate functions clusterByLength, clusterByHead, clusterByLast etc. ..

val l: List[String] = List("the tan ant gets some fat".split(" "): _*)

def clusterByLength = {
  val m = new HashMap[Int, List[String]]
  l.map {=> (x.length, x)}.foreach {=>
    m.put(v._1, v._2 :: m.get(v._1).getOrElse(List[String]()))
  }
  m.keys.toList.sort((e1, e2) => e1 < e2).map {=> m.get(x).get.reverse}
}

def clusterByHead = {
  val m = new HashMap[String, List[String]]
  l.map {=> (x.substring(0, 1), x)}.foreach {=>
    m.put(v._1, v._2 :: m.get(v._1).getOrElse(List[String]()))
  }
  m.keys.toList.sort((e1, e2) => e1 < e2).map {=> m.get(x).get.reverse}
}

def clusterByLast = {
  val m = new HashMap[String, List[String]]
  l.map {=> (x.substring(x.length - 1), x)}.foreach {=>
    m.put(v._1, v._2 :: m.get(v._1).getOrElse(List[String]()))
  }
  m.keys.toList.sort((e1, e2) => e1 < e2).map {=> m.get(x).get.reverse}
}


Is there any way that I can raise the level of abstraction for the above implementation, and have a Scala equivalent of the Haskell clusterBy ?

10 comments:

  1. Something like http://paste.pocoo.org/show/100767/ ?

    ReplyDelete
  2. Here's my contribution:

    http://paste.pocoo.org/show/100774/

    ReplyDelete
  3. Oh, yuck. The inner sets are mutable. That's no good.

    Change .map(m) to either

    .map(m.andThen(_.readOnly))

    or

    .map(m.andThen(collection.immutable.Set.empty ++ _))

    Neither is ideal, but there ya go.

    ReplyDelete
  4. Hello!
    I don't know if your looking for this arrow but maybe it helps:
    http://lucdup.blogspot.com/2008/11/scala-monads-and-arrows.html

    ReplyDelete
  5. It may be out-of-date, but this looks useful:
    http://scala.sygneca.com/code/arrows

    ReplyDelete
  6. My try:

    http://gist.github.com/50572

    Actually almost all the needed functionality is Haskell's fromListWith function so I implemented that one first.

    ReplyDelete
  7. Obvious solution with mutable hashmap:

    http://paste.pocoo.org/show/100848/

    ReplyDelete
  8. Not what you were asking for, I understand, but here's a solution in J:

    cluster=: 1 :'] <@#~ [:(="_ 0 ~.)u&.>'

    ReplyDelete
  9. A solution using arrows as defined here:

    http://paste.pocoo.org/show/100931

    Note: I had to define "new Function1" for type inference to work properly when composing the arrows. I don't know if there a simpler way to do it.

    ReplyDelete
  10. Reading the answers was an interesting exercise in itself, the common theme being folding into a map, where the function result value is used as a key for the map.

    But the fact that there's an arrows implementation in Scala is even better.

    ReplyDelete