Sunday, May 17, 2009

scouchdb gets View Server in Scala

CouchDB views are the real wings of the datastore that goes into every document and pulls out data exactly what you have asked for through your queries. The queries are different from the ones you do in an RDBMS using SQL - here you have all the state-of-the-art map/reduce being exercised through each of the cores that your server may have. One very good part of views in CouchDB is that the view server is a separate abstraction from the data store. Computation of views is delegated to an external server process that communicates with the main process over standard input/output using a simple line-based protocol. You can find more details about this protocol in the couchdb wiki.

The default implementation of the query server in CouchDB uses Javascript running via Mozilla SpiderMonkey. However, language aficionados always find a way to push their own favorite into any accessible option. People have developed query servers for Ruby, Php, Python and Common Lisp.

scouchdb gives one for Scala. You can now write map and reduce scripts for CouchDB views in Scala .. the reduce part is not yet ready. But the map functions actually do work in the repository. Here is a usual session using ScalaTest ..


// create some records in the store
couch(test doc Js("""{"item":"banana","prices":{"Fresh Mart":1.99,"Price Max":0.79,"Banana Montana":4.22}}"""))
couch(test doc Js("""{"item":"apple","prices":{"Fresh Mart":1.59,"Price Max":5.99,"Apples Express":0.79}}"""))
couch(test doc Js("""{"item":"orange","prices":{"Fresh Mart":1.99,"Price Max":3.19,"Citrus Circus":1.09}}"""))

// create a design document
val d = DesignDocument("power", null, Map[String, View]())
d.language = "scala"

// a sample map function in Scala
val mapfn1 = 
  """(doc: dispatch.json.JsValue) => {
    val it = couch.json.JsBean.toBean(doc, classOf[couch.json.TestBeans.Item_1])._3; 
    for (st <- it.prices)
      yield(List(it.item, st._2))
  }"""
    
// another map function
val mapfn2 = """(doc: dispatch.json.JsValue) => {
    import dispatch.json.Js._; 
    val x = Symbol("item") ? dispatch.json.Js.str;
    val x(x_) = doc; 
    val i = Symbol("_id") ? dispatch.json.Js.str;
    val i(i_) = doc;
    List(List(i_, x_)) ;
  }"""




Now the way the protocol works is that when the view functions are stored in the view server, CouchDB starts sending the documents one by one and every function gets invoked on every document. So once we create a design document and attach the view with the above map functions, the view server starts processing the documents based on the line based protocol with the main server. And if we invoke the views using scouchdb API as ..

couch(test view(
  Views builder("power/power_lunch") build))


and

couch(test view(
  Views builder("power/mega_lunch") build))


we get back the results based on the queries defined in the map functions. Have a look at the project home page for a complete description of the sample session that works with Scala view functions.

Setting up the View Server

The view server is an external program which will communicate with the CouchDB server. In order to set our scouchdb query server, here are the steps :

The common place to do custom settings for couchdb is local.ini. This can usually be found under /usr/local/etc/couchdb folder. There has been some changes in the configuration files since CouchDB 0.9 - check out the wiki for them. In my system, I set the view server path as follows in local.ini ..

[query_servers]
scala=$SCALA_HOME/bin/scala -classpath couch.db.VS "/tmp/vs.txt"

  • scala is the language of query server that needs to be registered with CouchDB. Once you start futon after registering scala as the language, you should be able to see "scala" registered as a view query language for writing map functions.

  • The classpath points to the jar where you deploy scouchdb.

  • couch.db.VS is the main program that interacts with the CouchDB server. Currently it takes as argument one file name where it sends all statements that it exchanges with the CouchDB server. If it is not supplied, all interactions are routed to the stderr.

  • another change that I needed to make was setting of the os_process_timeout value. The default is set to 5000 (5 seconds). I made the following changes in local.ini ..


[couchdb]
os_process_timeout=20000

Another thing that needs to be setup is an environment variable named CDB_VIEW_CLASSPATH. This should point to the classpath which needs to be passed to the Scala interpreter for executing the map/reduce functions.

You've been warned!

All the above stuff is very much development in progress and has been tested only to the limits of some unit test suites also recorded in the codebase. Use at your own risk, and please, please send feedbacks, patches, bug reports etc. in the project tracker.

Happy hacking!

P.S. Over the weekend I got a patch from Martin Kleppmann that adds the ability to store the type name of an object in the JSON blob when it is serialized (either as fully-qualified class name or as base name without the package component), and to automatically create a bean of the right type when that JSON blob is loaded from the database (without advance knowledge of what that type is going to be). Thanks Martin - I will have a look and integrate it in the trunk.

I have undertaken this as a side project and only get to work on it over the weekends. It is great to have contributory patches from the community that only goes on to enrich the framework. I need to work on the reduce part of the query server and then will launch into a major refactoring to incorporate 0.3 release of Nathan's dbDispatch. Nathan has made some fruitful changes on exceptions and response-code handling. I am itching to incorporate the goodness in scouchdb.

5 comments:

Madhav said...

Hi
I am trying out scouchdb bt when i am running the test cases i am getting following error wherever test case is trying to fetch rows from couchdb.
INF [20091001-10:23:40.545] dispatch: GET http://127.0.0.1:5984/test/_design/power/_view/power_lunch?group=true
- should fetch 9 rows from view power_lunch *** FAILED ***
dispatch.StatusCode: Exceptional resoponse code: 500
{"error":"case_clause","reason":"{{exit_status,2},\n {gen_server,call,[couch_query_servers,{get_proc,<<\"scala\">>}]}}"}

Any help is appreciated .
Thanks

Debasish said...

Looks like your view server setting is not correct. It's not able to recognize Scala as a valid view server language. Please follow the instructions in the section "Setting up the View Server" in the wiki page http://wiki.github.com/debasishg/scouchdb/scala-view-server ..

Cheers.

J.F. Zarama said...

thanks for scouchdb and related blog entries; it might be elementary but I am unable to understand this construct:

val x = Symbol("item") ? dispatch.json.Js.str;

I understand the Symbol("item") part but what is the ? operator doing here; please help; thanks;

? dispatch.json.Js.str;

Debasish said...

@Zarama : Thanks for using scouchdb. Have a look at dispatch-json from Nathan Hamblen. ? is one of the extractors to extract stuff out of a JSON structure. It's there in JsonExtractor.scala .. http://github.com/jberkel/dispatch/blob/master/json/src/main/scala/dispatch/JsonExtractor.scala

J.F. Zarama said...

your response pointed me to the right place and I had the opportunity to read about Dispatch and the example, Twine; much to read and learn from my part; thanks again for the response.

Just a comment re scouchdb; it will be preferable to have the binary distribution in the form of a jar-file.

I pointed a colleague to couchDB and while getting into Sacala, CouchDB and scouchdb, he needed to install and understand Maven to obtain scouchdb.jar.

It is a steep learning curve for all of us and even more so to someone just introduced to the language, Scala, couchDB, JavaScript et al and detracts from working on the subject.