Ruminations of a Programmer: orm

Stephan Schmidt has blogged on the ORMs being a thing of the past. While he emphasizes on ORMs' performance concerns and dismisses them as leaky abstractions that throw LazyInitializationException, he does not present any concrete alternative. In his concluding section on alternatives he mentions ..

"What about less boiler plate code due to ORMs? Good DAOs with standard CRUD implementations help there. Just use Spring JDBC for databases. Or use Scala with closures instead of templates. A generic base dao will provide create, read, update and delete operations. With much less magic than the ORM does."

Unfortunately, all these things work on small projects with a few number of tables. Throw in a large project with a complex domain model, requirements for relational persistence and the usual stacks of requirements that today's enterprise applications offer, you will soon discover that your home made less boilerplated stuff goes for a toss. In most cases you will end up either rolling out your own ORM or start building a concoction of domain models invaded with indelible concerns of persistence. In the former case, obviously your ORM will not be as performant or efficient as the likes of Hibernate. And in the latter case, either you will end up building an ActiveRecord model with the domain object mirroring your relational table or you may be more unfortunate with a bigger unmanageable bloat.

It's very true that none of the ORMs in the market today are without their pains. You need to know their internals in order to make them generate efficient queries, you need to understand all the nuances to make use of their caching behaviors and above all you need to manage all the reams of jars that they come with.

Yet, in the Java stack, Hibernate and JPA are still the best of options when we talk about big persistent domain models. Here are my points in support of this claim ..

If you are not designing an ActiveRecord based model, it's of paramount importance that you keep your domain model decoupled from the persistent model. And ORMs offer the most pragmatic way towards this approach. I know people will say that it's indeed difficult to achieve this in a real life world and in typical situations compromises need to be made. Yet, I think if you need to make compromise for performance or whatever reasons, it's only an exception. Ultimately you will find that the mjority of your domain model is decoupled enough for a clean evolution.

ORMs save you from writing tons of SQL code. This is one of the compelling advantages that I have found with an ORM that my Java code is not littered with SQL that's impossible to refactor when my schema changes. Again, there will be situations when your ORM may not churn out the best of optimized SQLs and you will have to do that manually. But, as I said before, it's an exception and decisions cannot be made based on exceptions only.

ORMs help you virtualize your data layer. And this can have huge gains in your scalability aspect. Have a look at how grids like Terracotta can use distributed caches like EhCache to scale out your data layer seamlessly. Without the virtualization of the ORM, you may still achieve scalability using vendor specific data grids. But this comes at the price of lots of $$ and the vendor lock-ins.

Stephan also feels that the future of ORMs will be jeopardized because of the advent of polyglot persistence and nosql data stores. The fact is that the use cases that nosql datastores address are very much orthogonal to those served by the relational databases. Key/value lookups with semi-structured data, eventual consistency, efficient processing of web scale networked data backed with the power of map/reduce paradigms are not something that your online transactional enterprise application with strict requirements of ACID will comply with. So long we have been trying to shoehorn every form of data processing with a single hammer of relational databases. It's indeed very refreshing to see the onset of nosql paradigm and it being already in use in production systems. But ORMs will still have their roles to play in the complementary set of use cases.

There is nothing wrong in using frameworks, so long I can justify the cause. A language like Java leaves a lot for frameworks to implement. While Java offers idioms that promote separation of interfaces from implementation, yet we need dependency injection frameworks to get around the problems of wiring concrete implementations within the application. This is an encouraged practice in the Java world that leads to codebase which is more flexible and unit testable. Nothing wrong with it .. Similarly there are lots of other frameworks which are seen as essential components of your toolbox so long you continue your adventures within the Java world.

However, the real problem with using frameworks is the inertia which your particularly favorite framework brings on to you. You would almost wish that you had it in every possible programming language that you use. You like the conciseness of the framework, you like the style that it offers in composing abstractions and you love the way it allows you to develop applications. In short, you get virtually married to it, so much so, that you become myopic to the fact that the new programming paradigm actually makes your pet framework irrelevant.

A classic example is the inertia that dependency injection frameworks bring upon us. While DI is an integral part of your development ecosystem in Java, it becomes virtually useless in languages like Ruby that offer the power of open classes, dynamically changeable at runtime with class names being just labels posted on objects. Jim Weirich did a great presentation in OSCON 2005 that answers this question in a greater detail. Jamis Buck had an interesting presentation in RubyConf 2008, where he described his initial passion towards dependency injection and how it took time and experience in moving away from patterns and habits formed in Java and embrace Ruby's idioms to rediscover the joy of programming. Both presentations worth watching to get a feel of the inertia that frameworks can bring upon us.

But I digress .. The real intention of this post is to add my 2 cents to the ongoing debate on the propensity of ActiveRecord like frameworks being used to abstract CouchDB based applications, which I think is yet another instance of framework inertia.

Object Relational Mapping is a layer of abstraction that is supposed to abstract away the impedance mismatch between your object oriented domain model and the relational persistence model. This post is not about whether they actually offer the benefits that they claim and allow truly parallel evolution of the domain and the persistence model. This is more about how we developers tend to carry the inertia of our favorite ORMs into every paradigm that we implement, irrespective of its applicability.

ActiveRecord is one of our favorite ORM frameworks that has become a household name in every Rails shop. It is terse, it is intuitive, makes explosive use of Ruby meta-programming resulting in magically concise domain models. How much we love when we can write the following three lines of ActiveRecord code and find the entire Product model, along with its business validations, being weaved in the magic of Rails goodness ..

class Product < ActiveRecord::Base
  validates_presence_of :title, :description, :image_url
end

More we use ActiveRecord, more infatuated we tend to be in its goodness and convenience. And finally we lose the ability to judge whether the solution offered by the framework correctly maps to the problem at hand. Steve Vinoski once mentioned about Convenience over Correctness in one his columns on Internet Computing. He was referring to our urge of trying to force fit the fundamentally flawed RPC oriented paradigms simply because of easy availability of those abstractions in popular general purpose programming languages. He mentions in his blog .. "Making a function or method call to a remote or distributed function, object, or service appear just like any other function or method call allows such developers to stay within the comfortable confines of their language. Those who choose this approach essentially decide that developer convenience and comfort is more important than dealing with hard distribution issues like latency, concurrency, reliability, scalability, and partial failure".

Recently there has been a debate on the applicability of ActiveRecord like frameworks as frontending CouchDB. Alexander Lang had a great presentation at Scotland On Rails that dealt with the applicability of Ruby frameworks with CouchDB. The debate started with his views where he recommended against frameworks that think in terms of records and associations. And encouraged thinking in terms of documents and views, which is absolutely in line with the philosophy that CouchDB espouses. We developers have been too strongly indoctrinated in the teachings of relational databases, third normal forms, joins, constraints and ACID based principles. It is extremely difficult to come out of such mindset, particularly when we are dealing with disruptive technologies like CouchDB or MongoDB. But unless we are able to get into the train of thoughts that these systems encourage, we will never be able to derive fully the benefits that they offer. Along the lines of the above discussion, we cannot afford to choose convenience over correctness or surrender to the temptations of adhering to the confines of our known paradigms.

There is really no *R* in the ORM layer that can possibly sit between CouchDB and your application. CouchDB stores data as schemaless JSON documents - hence the mapping layer should really be a thin wrapper that converts between JSON documents and your languages' objects. Want to include helper methods for writing validation logic within your framework ? Cool stuff, use Ruby's meta-programming abilities and allow users to write validates_presence_of. This one absolutely belongs to the domain, and has no strings attached to the relational paradigm whatsoever. But want to handle lazy loading of associations or want to do caching like ActiveRecord within the framework ? CouchDB offers REST interface that comes with all the caching abilities that http offers - handling it within the framework will be playing against the rules of the game. Playing along the rules of REST, CouchDB documents and views represent resources - all queries are mapped to CouchDB views, which can be generated lazily and updated incrementally using the map/reduce paradigms. Instead of restricting queries within a group of ActiveRecord style finder methods, allow users the power to write custom views in the same map/reduce style that the CouchDB way encourages ..

Here is an example custom view from the CouchRest distribution ..

word_count = {
  :map => 'function(doc){
    var words = doc.text.split(/\W/);
    words.forEach(function(word){
      if (word.length > 0) emit([word,doc.title],1);
    });
  }',
  :reduce => 'function(key,combine){
    return sum(combine);
  }'
}

For a 1..n association there is only one relational way of doing things, while CouchDB allows programmers to model data closer to the domain, much like the way the data will be used in the actual use case. This I discuss next in the context of managing associations. The main point, here, is that, if data organization is best left to the user, fetch strategies should also not be bounded within canned finder methods. And this is where the programmer needs flexibility to organize his fetch logic around custom maps and reduces. Its only the practice of empowering the users more to harness the full power of CouchDB engine and philosophy.

And the Associations ..

Some of the ActiveRecord like frameworks for CouchDB allows developers to model associations as follows ..

class BlogPost < CouchXXX::Base  
  has_many :comments  
end

While this is the only way of modeling a 1..n association in ActiveRecord, it has a definite rationale in a relational database system. This will be modeled optimally in an RDBMS as a combination of 2 tables with a foreign key relationship. In case of CouchDB modeling, we can have various options to model this association :

Inline comments, where the comments are stored in the same document as the post

Separate documents for comments, one document per comment, each having a backlink to the post it belongs to

Use the power of view collation where the blog post and the associated comments are grouped together in a view by using complex keys with arbitrary JSON values. This is best explained in this post.

The option of modeling that the user would like to adopt depends upon the usage in the domain. Unlike the relational paradigm, we have multiple options of storage here and the best way to handle it is to leave it upon the user to decide the strategy and implementation. Hence has_many makes sense in the ActiveRecord world, not in CouchDB paradigm. In the CouchDB world, associations are best modeled outside your framework using native CouchDB practices.

I am all for liberating the programmers with more power, if that helps, instead of restricting them within the confines of convenience. Unfortunately frameworks often prove too limiting for technologies that attempt to do things differently ..

Ruminations of a Programmer

Sunday, October 18, 2009

Are ORMs really a thing of the past ?

Sunday, April 05, 2009

Framework Inertia, CouchDB and the case of the missing R