Thursday, January 21, 2010

A new way to think of Data Storage for your Enterprise Application

A couple of posts earlier I had blogged about a real life case study of one of our projects where we are using a SQL store (Oracle) and a NoSQL store (MongoDB) in combination over a message based backbone. MongoDB was used to cater to a very specific subset of the application functionality, where we felt it made a better fit than a traditional RDBMS. This hybrid architecture of data organization is turning out to be an increasingly attractive option today with more and more specialized persistent storage structures being developed.

In many applications we need to process graph data structures. Neo4J can be a viable option for this. You can have your mainstream data storage still in an RDBMS and use Neo4J only for the subset of functionalities for which you need to use graph data structures. If you need to sync back to your main storage, use messaging as the transport to talk back to your relational database.

Multiple data storage use along with asynchronous messaging is one of the options that will looks very potent today. Drizzle has its entire replication based on a RabbitMQ based transport. And using AMQP messaging, Drizzle replicates data to a host of key/value stores like Voldemort, memcachedDB and Cassandra.

If we agree that messaging is going to be one of the most dominant paradigms in shaping application architectures, why not try to go one level up and look at some higher level abstractions for message based programming? Erlang programmers have been using the actor model for many years now and have demonstrated all the good qualities that the model imbibes. Inspired by Erlang, Scala also offers a similar model on the JVM. In an earlier post I had discussed how we can use the actor model in Scala to scale out messaging applications using a RabbitMQ storage.

Now with the developing ecosystem of polyglot storage, we can use the same model of actor based communication as the backbone for integrating multiple data storage options that you may plug in to your application. Have specific clients front end the storage that they need to work with and use messaging to sync that up with the main data storage backend to have a consistent system of record. Have a look at the following diagram that may not look that unreal today. You have a host of options that bring your data closer to the way you process them in your domain model, be it document oriented, graph based, key/value based or simple POJO based across a data grid like Terracotta.



When we have a bunch of architectural components loosely connected through messaging infrastructure, you can have a world of options managing interactions between them. In fact your options open up more when you get to interact with data shaped the way you would like to be. You now can think in terms of having a data model aligned with the model of your domain. You know once your rule base gets updated in Neo4J, it will somehow be synced up with the backend storage through some other service that will make it eventually consistent.

In a future post I will explore some of the options that a higher order middleware service like Akka can add to your stack. With Akka providing abstractions like transactors, pluggable persistence and out of the box integration modules for AMQP, there's a number of ways you can think of modularizing your application's domain model and storage. You can use peer to peer distributed actor based communication model that sets up synchronization options with your databases or you can use AMQP based transport to do the same much like what Drizzle does for replication. But that's some food for thought for yet another future post.

6 comments:

Kiran said...

Hello Debasish,
Interesting post, and I like the way you have used the storage solution based on context.
I was wondering if an XMPP based mechanism could be used instead of a AMQP/MQ based MOM? Would like to hear your thoughts on this.
Thanks,
Kiran

Mats Henricson said...

Thanks for pointing out what should be obvious, but wasn't to me - polyglot storage. Looking back at the past systems I've worked with, it seems today one mistake was to insist all data had to go to the same RDBMS.

Unknown said...

@kiran :-

regarding AMQP vrs XMPP, I follow one specific rule. If the end points are controlled by me, I tend to use POJOs and AMQP. If the end points are not under my control I use XML and XMPP. However, in the context of the post, that's not the main point. The figure just illustrates a couple of end points. You can have many others with an appropriate selection of the protocol.

Thanks.

Unknown said...

@Mats :-

You must thank the NoSQL movement that has made all of us aware of this mistake that we have been doing all over the years.

Thanks.

Erik van Oosten said...

Interesting to see that databases start to use messaging/event based setups as well.

At my company we invest in the CQRS architecture (e.g. http://blog.jteam.nl/2009/12/21/rethinking-architecture-with-cqrs/) where every update is an event, making these kinds of tools very useful.

Domixway said...

Why not using OrientDB which is both document and graph oriented ? also still keeping sql capabilities and bringing object and key/value options.