Monday, May 11, 2009

CouchDB and Scala - Updates on scouchdb

A couple of posts back, I introduced scouchdb, the Scala driver for CouchDB persistence. The primary goal of the framework is to offer non-intrusiveness in persistence, in the sense that the Scala objects can be absolutely oblivious to the underlying CouchDB existence. The last post discussed how Scala objects can be added, updated or deleted from CouchDB with the underlying JSON representation carefully veneered away from client APIs. Here is an example of the fetch API in scouchdb ..

val sh = couch(test by_id(s_id, classOf[Shop]))

The document is fetched as an instance of the Scala class Shop, which can then be manipulated using usual Scala machinery. The return type is a Tuple3, where the first two components are the id and revision that may be useful for doing future updates of the document, while sh._3 is the object retrieved from the data store. Returning tuples from a method is a typical Scala idiom that can give rise to some nice pattern matching code capsules ..

couch(test by_id(s_id, classOf[Shop])) match {
  case (id, rev, obj) =>
    //..
  //..
}


The last post also discussed the View APIs and the little builder syntax for View queries.

Over the weekend, scouchdb got some more features, hence a brief post introducing the new additions ..

Temporary Views

No frills, just shares the similar builder interface as ordinary views, with the addition of specifying the map and reduce functions. Here is the necessary spec for querying temporary views ..


describe("fetch from temporary views") {
  it("should fetch 3 rows with group option and 1 row without group option") {
    val mf = 
      """function(doc) {
           var store, price;
           if (doc.item && doc.prices) {
             for (store in doc.prices) {
               price = doc.prices[store];
               emit(doc.item, price);
             }
           }
         }"""
      
    val rf = 
      """function(key, values, rereduce) {
           return(sum(values))
         }"""
      
    // with grouping
    val aq = 
      Views.adhocBuilder(View(mf, rf))
           .options(optionBuilder group(true) build)
           .build
    val s = couch(
      test adhocView(aq))
    s.size should equal(3)
      
    // without grouping
    val aq_1 = 
      Views.adhocBuilder(View(mf, rf))
           .build
    val s_1 = couch(
      test adhocView(aq_1))
    s_1.size should equal(1)
  }
}




Attachment Handling

With each document, CouchDB allows attachments, much like emails. Along with creating a document, I can have a separate attachment associated with the document. However, when the document is retrieved, the attachment, by default is not fetched. It has to be fetched using a special URI. All these are now encapsulated in Scala APIs in scouchdb. Have a look at the following spec ..


describe("create a document and make an attachment") {
  val att = "The quick brown fox jumps over the lazy dog."
    
  val s = Shop("Sears", "refrigerator", 12500)
  val d = Doc(test, "sears")
  var ir:(String, String) = null
  var ii:(String, String) = null
    
  it("document creation should be successful") {
    couch(d add s)
    ir = couch(>%(Id._id, Id._rev))
    ir._1 should equal("sears")
  }
  it("query by id should fetch a row") {
    ii = couch(test by_id ir._1)
    ii._1 should equal("sears")
  }
  it("sticking an attachment should be successful") {
    couch(d attach("foo", "text/plain", att.getBytes, Some(ii._2)))
  }
  it("retrieving the attachment should equal to att") {
    val air = couch(>%(Id._id, Id._rev))
    air._1 should equal("sears")
    couch(d.getAttachment("foo") as_str) should equal(att)
  }
}




CouchDB also allows adding attachments to yet non-existing documents. Adding the attachment will create the document as well. scouchdb supports that as well. Have a look at the bdd specs in the test folder for details of the usage.

Bulk Documents

CouchDB has separate REST interfaces for handling editing of multiple documents at the same time. I can have multiple documents, some of which need to be added as new, some to be updated with specific revision information and some to be deleted from the existing database. And all these can be done using a single POST. scouchdb uses a small DSL for handling such requests. Here is how ..


describe("bulk updates of documents") {
  it("should create 3 documents with 1 post") {
    val cnt = couch(test all_docs).filter(_.startsWith("_design") == false).size 
      
    val s1 = Shop("cc", "refrigerator", 12500)
    val s2 = Shop("best buy", "macpro", 1500)
    val a1 = Address("Survey Park", "Kolkata", "700075")
    val a2 = Address("Salt Lake", "Kolkata", "700091")
      
    couch(test docs(List(s1, s2, a1, a2), false)).size should equal(4)
    couch(test all_docs).filter(_.startsWith("_design") == false).size should equal(cnt + 4)
  }
  it("should insert 2 new documents, update 1 existing document and delete 1 - all in 1 post") {
    val sz = couch(test all_docs).filter(_.startsWith("_design") == false).size
    val s = Shop("Shoppers Stop", "refrigerator", 12500)
    val d = Doc(test, "ss")
      
    val t = Address("Monroe Street", "Denver, CO", "987651")
    val ad = Doc(test, "add1")
      
    var ir:(String, String) = null
    var ir1:(String, String) = null
    
    couch(d add s)
    ir = couch(>%(Id._id, Id._rev))
    ir._1 should equal("ss")
      
    couch(ad add t)
    ir1 = couch(ad >%(Id._id, Id._rev))
    ir1._1 should equal("add1")
      
    val s1 = Shop("cc", "refrigerator", 12500)
    val s2 = Shop("best buy", "macpro", 1500)
    val a1 = Address("Survey Park", "Kolkata", "700075")
      
    val d1 = bulkBuilder(Some(s1)).id("a").build 
    val d2 = bulkBuilder(Some(s2)).id("b").build
    val d3 = bulkBuilder(Some(s)).id("ss").rev(ir._2).build
    val d4 = bulkBuilder(None).id("add1").rev(ir1._2).deleted(true).build

    couch(test bulkDocs(List(d1, d2, d3, d4), false)).size should equal(4)
    couch(test all_docs).filter(_.startsWith("_design") == false).size should equal(sz + 3)
  }
}




As can be found from the above, there are 2 levels of APIs for bulk updates. scouchdb already has an api for creating a document from a Scala object with auto id generation :

def doc[<: AnyRef](obj: T) = { //..

As an extension, I introduce the following which lets users add multiple new documents through a single API. Note here all of the documents will be added new ..

def docs(objs: List[<: AnyRef], allOrNothing: Boolean) = { //..

and the objects can be of any type, not necessarily the same. This is illustrated in the first of the 2 specs above.

But in case you need to use the full feature of bulk uploads and editing of multiple documents, I offer a builder based interface, which is illustrated in the second spec above. Here 2 new documents are added, 1 being updated and 1 deleted, all through one single API.

In case you are doing CouchDB and Scala stuff, give scouchdb a spin and post comments on your feedback. I am yet to write a meaningful application using scouchdb - any feedback will be immensely helpful.

8 comments:

Dustin Ted Whitney said...

I had been writing my own ad hoc CouchDB interface but it's not nearly as complete as this. I'll give it a try shortly.

One thing that I was doing in my own libraries was returning an HTTP response code from the result of the operation. It was nice to match against it, like

... match {
case (201, _) => ...
case (409, _) => ...
}

-Dustin

Debasish said...

actually this is one thing I have kept open till date. I am still not sure what will be the best way to handle exceptions. Currently I am happy to keep it at the level that dbDispatch offers - raising dispatch.StatusCode to the client. I am slightly hesitant to create another hierarchy of exceptions on top of it. And does it make a lot of sense to just wrap dispatch.StatusCode into another wrapper class ? When I hit a 404 in a fetch, should I throw a custom NotFoundException to the client ? Or modify the return type of the by_id method to an Option[] type ? Would love to hear what people think of the approaches to handle exceptions.

- Debasish

nathan_h said...

These concerns were the motivation for one of the dozen recent Dispatch refactors; a StatusCode exception shouldn't be the only way to handle non-OK responses when using >> or ># or any operator. So those now return Handler[T], which is compatible not just with Http#apply in anticipation of OK responses, but also Http#when to set your own response-code expectations, or Http#also to use two handlers, like a predefined one and a block of your own design that can pattern-match against a response code, HttpResponse, and Option[HttpEntity]. (current Http x-ray)

My hunch is that Http#also is almost right for error handling but not quite, because you probably don't want to run both handlers in case of failure. And Http#when is just a nice way to throw StatusCode exceptions. But the good news is Handler[T] gives us the ability to define request-response handlers and apply them various ways, so the perfect Http#flexible-error-method can be added without breaking everything else.

Debasish said...

Nathan -

Http#when and Http#also look good for canned error handling and Handler[T] looks perfect as an extension point to define custom request/response handlers. I will let u know how these work out in my case.

- Debasish

Dustin Ted Whitney said...

I'd have to look closer at dispatch to understand your comments better, but I'll add that I don't like the idea of having to catch an exception if a 404 is returned, because a 404 might be something desirable (an exception, IMHO, implies undesireable behavior).

Imagine that below, I expect a 404 or a 409 to be common, then this sort of thing is ugly and doesn't really describe what I'm trying to accomplish:

try{
... match {
case (200, _) => ...
case (304, _) => ...
}
} catch {
n: Exception404 => ...
f: Exception409 => ...
e: Exception => ...
}

Debasish said...

Dustin:

That's precisely my point. Whether 404 is an exception or not depends on the use case and the framework cannot throw indiscriminately. The 0.3 of dbDispatch has some interesting extensions (see my last comment and nathan's comment) which can be of help in providing meaningful contract. Let me see if I can get some time to work on it in scouchdb.

- Debasish

Martin Kleppmann said...

Thanks for sharing scouchdb, I am finding it very useful. I have been hacking on it a bit today and have just sent you a patch (via the Google Code issue tracker) for a new feature -- storing the type name in the JSON so that you can automatically create beans of the right type when you pull a document out of the database. I hope you find it useful.

Cheers, Martin

Debasish said...

Martin -
Thanks a lot for contributing the patch. Looks a very useful feature. I will definitely include it in the trunk when I have some time.

Cheers.
- Debasish