Sunday, April 05, 2009

Framework Inertia, CouchDB and the case of the missing R

There is nothing wrong in using frameworks, so long I can justify the cause. A language like Java leaves a lot for frameworks to implement. While Java offers idioms that promote separation of interfaces from implementation, yet we need dependency injection frameworks to get around the problems of wiring concrete implementations within the application. This is an encouraged practice in the Java world that leads to codebase which is more flexible and unit testable. Nothing wrong with it .. Similarly there are lots of other frameworks which are seen as essential components of your toolbox so long you continue your adventures within the Java world.

However, the real problem with using frameworks is the inertia which your particularly favorite framework brings on to you. You would almost wish that you had it in every possible programming language that you use. You like the conciseness of the framework, you like the style that it offers in composing abstractions and you love the way it allows you to develop applications. In short, you get virtually married to it, so much so, that you become myopic to the fact that the new programming paradigm actually makes your pet framework irrelevant.

A classic example is the inertia that dependency injection frameworks bring upon us. While DI is an integral part of your development ecosystem in Java, it becomes virtually useless in languages like Ruby that offer the power of open classes, dynamically changeable at runtime with class names being just labels posted on objects. Jim Weirich did a great presentation in OSCON 2005 that answers this question in a greater detail. Jamis Buck had an interesting presentation in RubyConf 2008, where he described his initial passion towards dependency injection and how it took time and experience in moving away from patterns and habits formed in Java and embrace Ruby's idioms to rediscover the joy of programming. Both presentations worth watching to get a feel of the inertia that frameworks can bring upon us.

But I digress .. The real intention of this post is to add my 2 cents to the ongoing debate on the propensity of ActiveRecord like frameworks being used to abstract CouchDB based applications, which I think is yet another instance of framework inertia.

Object Relational Mapping is a layer of abstraction that is supposed to abstract away the impedance mismatch between your object oriented domain model and the relational persistence model. This post is not about whether they actually offer the benefits that they claim and allow truly parallel evolution of the domain and the persistence model. This is more about how we developers tend to carry the inertia of our favorite ORMs into every paradigm that we implement, irrespective of its applicability.

ActiveRecord is one of our favorite ORM frameworks that has become a household name in every Rails shop. It is terse, it is intuitive, makes explosive use of Ruby meta-programming resulting in magically concise domain models. How much we love when we can write the following three lines of ActiveRecord code and find the entire Product model, along with its business validations, being weaved in the magic of Rails goodness ..

class Product < ActiveRecord::Base
  validates_presence_of :title, :description, :image_url

More we use ActiveRecord, more infatuated we tend to be in its goodness and convenience. And finally we lose the ability to judge whether the solution offered by the framework correctly maps to the problem at hand. Steve Vinoski once mentioned about Convenience over Correctness in one his columns on Internet Computing. He was referring to our urge of trying to force fit the fundamentally flawed RPC oriented paradigms simply because of easy availability of those abstractions in popular general purpose programming languages. He mentions in his blog .. "Making a function or method call to a remote or distributed function, object, or service appear just like any other function or method call allows such developers to stay within the comfortable confines of their language. Those who choose this approach essentially decide that developer convenience and comfort is more important than dealing with hard distribution issues like latency, concurrency, reliability, scalability, and partial failure".

Recently there has been a debate on the applicability of ActiveRecord like frameworks as frontending CouchDB. Alexander Lang had a great presentation at Scotland On Rails that dealt with the applicability of Ruby frameworks with CouchDB. The debate started with his views where he recommended against frameworks that think in terms of records and associations. And encouraged thinking in terms of documents and views, which is absolutely in line with the philosophy that CouchDB espouses. We developers have been too strongly indoctrinated in the teachings of relational databases, third normal forms, joins, constraints and ACID based principles. It is extremely difficult to come out of such mindset, particularly when we are dealing with disruptive technologies like CouchDB or MongoDB. But unless we are able to get into the train of thoughts that these systems encourage, we will never be able to derive fully the benefits that they offer. Along the lines of the above discussion, we cannot afford to choose convenience over correctness or surrender to the temptations of adhering to the confines of our known paradigms.

There is really no *R* in the ORM layer that can possibly sit between CouchDB and your application. CouchDB stores data as schemaless JSON documents - hence the mapping layer should really be a thin wrapper that converts between JSON documents and your languages' objects. Want to include helper methods for writing validation logic within your framework ? Cool stuff, use Ruby's meta-programming abilities and allow users to write validates_presence_of. This one absolutely belongs to the domain, and has no strings attached to the relational paradigm whatsoever. But want to handle lazy loading of associations or want to do caching like ActiveRecord within the framework ? CouchDB offers REST interface that comes with all the caching abilities that http offers - handling it within the framework will be playing against the rules of the game. Playing along the rules of REST, CouchDB documents and views represent resources - all queries are mapped to CouchDB views, which can be generated lazily and updated incrementally using the map/reduce paradigms. Instead of restricting queries within a group of ActiveRecord style finder methods, allow users the power to write custom views in the same map/reduce style that the CouchDB way encourages ..

Here is an example custom view from the CouchRest distribution ..

word_count = {
  :map => 'function(doc){
    var words = doc.text.split(/\W/);
      if (word.length > 0) emit([word,doc.title],1);
  :reduce => 'function(key,combine){
    return sum(combine);

For a 1..n association there is only one relational way of doing things, while CouchDB allows programmers to model data closer to the domain, much like the way the data will be used in the actual use case. This I discuss next in the context of managing associations. The main point, here, is that, if data organization is best left to the user, fetch strategies should also not be bounded within canned finder methods. And this is where the programmer needs flexibility to organize his fetch logic around custom maps and reduces. Its only the practice of empowering the users more to harness the full power of CouchDB engine and philosophy.

And the Associations ..

Some of the ActiveRecord like frameworks for CouchDB allows developers to model associations as follows ..

class BlogPost < CouchXXX::Base  
  has_many :comments  

While this is the only way of modeling a 1..n association in ActiveRecord, it has a definite rationale in a relational database system. This will be modeled optimally in an RDBMS as a combination of 2 tables with a foreign key relationship. In case of CouchDB modeling, we can have various options to model this association :

  1. Inline comments, where the comments are stored in the same document as the post

  2. Separate documents for comments, one document per comment, each having a backlink to the post it belongs to

  3. Use the power of view collation where the blog post and the associated comments are grouped together in a view by using complex keys with arbitrary JSON values. This is best explained in this post.

The option of modeling that the user would like to adopt depends upon the usage in the domain. Unlike the relational paradigm, we have multiple options of storage here and the best way to handle it is to leave it upon the user to decide the strategy and implementation. Hence has_many makes sense in the ActiveRecord world, not in CouchDB paradigm. In the CouchDB world, associations are best modeled outside your framework using native CouchDB practices.

I am all for liberating the programmers with more power, if that helps, instead of restricting them within the confines of convenience. Unfortunately frameworks often prove too limiting for technologies that attempt to do things differently ..


Anonymous said...

excellent post. i hope this will clarify things for people trapped in their developer convenience a bit.

Brandon Zylstra said...

I totally agree with your excellent points. However, I see ActiveRecord style wrappers for CouchDB as an easy way for RDBMS-indoctrinated developers to more easily transition their apps and their thinking to CouchDB. But the risk, as you point out, is that they won't fully transition, and will end up failing to really benefit fully from Couch.

Manish Jhawar said...

Great post! This clearly outlines for the need to refactor the ActiveRecord style wrapper to meet the flexibility that CouchDB supports .. something that allows to choose between 1 of the 3 join style alternatives more easily. This would then be a more feasible transition path for the developers accustomed to convenience of ActiveRecord. Hopefully we will see many refactorings of the ActiveRecord model appear in an attempt to provide the right balance of power & flexibility to the developer.