Sunday, May 03, 2009

Hacking with Scala and CouchDB

I have been hacking with CouchDB and Scala since the last couple of week ends as a part time project. CouchDB is a REST based document store that steams with the force of map/reduce paradigm implemented in Erlang. Objects are stored as JSON documents in CouchDB in a format which is far too disruptive for the community so indoctrinated in the constraints of the relational database paradigm. This is not to predict the demise of the relational world - the use cases of CouchDB are somewhat orthogonal, but fits like a glove in cases where we have so long been trying to force-fit the fangs of SQL backed with a heavily normalized schema.

I wanted some of my Scala objects to reside in a CouchDB database. It should look like normal persistence API s and the primary pre condition was one of non-invasiveness. I do not want my Scala objects to be CouchDB aware. Incorporating CouchDB specific attributes like _id and _rev take away a lot of reusability goodness from domain objects and make them constrained only for the specific platform.

Suppose I have a Scala class used to record item prices in various stores ..

case class ItemPrice(store: String, item: String, price: Number)

and I would like to store it in CouchDB through an API that converts it to JSON under the covers and issues a PUT/POST to the CouchDB server.

Here is a sample session that does this for a local CouchDB server running on localhost and port 5984 ..

// specification of the db server running
val couch = Couch("127.0.0.1")
val test = Db("test_db")

// create the database
couch(test create)

// create the Scala object
val s = ItemPrice("Best Buy", "mac book pro", 3000)

// create a document for the database with an id
val doc = Doc(test, "best_buy")

// add
couch(doc add s)

// query by id to get the id and revision of the document
val id_rev = couch(test by_id "best_buy")

// query by id to get back the object
// returns a tuple3 of (id, rev, object)
val sh = couch(test by_id("best_buy", classOf[ItemPrice]))

// got back the original object
sh._3.item should equal(s.item)
sh._3.price should equal(s.price)


Suppose the price of a mac book pro has changed in Best Buy and I get a new ItemPrice. I need to update the document that I have in CouchDB with the new ItemPrice object. For updates, I need to pass in the original revision that I would like to update ..

val new_itemPrice = //..
couch(doc update(new_itemPrice, sh._2))


The Scala client is at a very early stage. All the above stuff works now, a lot more have been planned and is present in the roadmap. The main focus has been on the non intrusiveness of the framework, so that the Scala objects remain pure to be used freely in other contexts of the application. The library uses the goodness of Nathan Hamblen's dispatch library, which provides elegant Scala wrappers over apache commons Java http client and a great JSON parser with a set of extractors.

Very often we need to have different property names in the JSON document than what is present in the Scala object. Sometimes we may also want to filter out some properties while persisting in the data store. The framework uses annotations to achieve these functionalities (much like the ones used by jcouchdb, the Java client of CouchDB) ..

case class Trade(
  @JSONProperty("Reference No")
  val ref: String,

  @JSONProperty("Instrument"){val ignoreIfNull = true}
  val ins: Instrument,
  val amount: Number
)


When this class will be spitted out in JSON and stored in CouchDB, the properties will be renamed as suggested by the annotation. Also selective filtering is possible through usage of additional annotation properties as shown above.

Handling aggregate data members for JSON serialization is tricky, since erasure takes away information of the underlying types contained in the aggregates. e.g.

case class Person(
  lastName: String
  firstName: String,

  @JSONTypeHint(classOf[Address])
  addresses: List[Address]
)


Using the annotation makes it possible to get the proper types during runtime and generate the proper serialization format.

One of the biggest hits of CouchDB is the view engine that uses the power of MapReduce to fetch data to the users. The current version of the framework does not offer much in terms of view creation apart from basic abstractions that allow plugging in "map" and "reduce" functions in Javascript to the design document. There are some plans to make this more Scala ish with little languages that will enable map and reduce function generation from Scala objects.

But what it offers today is a small DSL that enables building up view queries along with the sea of options that CouchDB server offers ..

// fetches records from the view named least_cost_lunch
couch(test view(Views.builder("lunch/least_cost_lunch").build))

// fetches records from the view named least_cost_lunch 
// using key and limit options
couch(test view(
  Views.builder("lunch/least_cost_lunch")
       .options(optionBuilder key(List("apple", 0.79)) limit(10) build)
       .build))

// fetches records from the view named least_cost_lunch 
// using specific keys and other options for deciding output filters
couch(test view(
  Views.builder("lunch/least_cost_lunch")
       .options(optionBuilder descending(true) limit(10) build)
       .keys(List(List("apple", 0.79), List("banana", 0.79)))
       .build))



Reflection warts!

The framework is based on introspecting Scala objects for serialization and de-serialization. This brings in some of the usual warts like having default constructors for the class. This does not mean that the properties need to be mutable, this is only used for using the reflection magic to set the properties after a newInstance() within the framework. Still thinking of ways to get around this. I need to look at some third party frameworks that do bytecode instrumentation to preserve constructor parameter names .. but I guess this can wait .. and having the default constructor is not necessarily a constraint so long it does not invade the immutability guarantees of the abstraction with public setters.

Test It Early

The framework is very much a work-in-progress, as things are for a typical side project. It does not yet handle lots of stuffs like attachments, compaction, bulk document creation etc. I have been working on some of these and they will see the light of the day hopefully pretty soon.

The current snapshot of the source code is available in the google-code repository for scouchdb. No formal release has been made so far. However, there is a test suite that accompanies the project. It is not a unit test suite per se, in the sense that it actually requires a CouchDB server running on the localhost on port 5984. Still the intention is to give an idea of the API set that it exposes today. It is a very pre-alpha release, no API compatibility guarantees, as it plans to evolve.

Have fun .. and let me know your feedbacks on the API ..

4 comments:

Dustin Ted Whitney said...

This is fantastic! I've been working with CouchDB quite a bit lately myself. I have some code for field validations with annotations. I'll show you some code when I've cleaned it up. The two things could work well together.

Unknown said...

Sure .. keep an eye on my blog and the project hosted on google-code (http://code.google.com/p/scouchdb). We can surely collaborate.

n8han said...

Strange but true: back when Dispatch's basic CouchDB support was in a separate project, it was called 'scouch'! I'm glad to see the name revived.

h4ckem.blogspot.com said...

cool