Ruminations of a Programmer: October 2009

Sunday, October 18, 2009

Are ORMs really a thing of the past ?

Stephan Schmidt has blogged on the ORMs being a thing of the past. While he emphasizes on ORMs' performance concerns and dismisses them as leaky abstractions that throw LazyInitializationException, he does not present any concrete alternative. In his concluding section on alternatives he mentions ..

"What about less boiler plate code due to ORMs? Good DAOs with standard CRUD implementations help there. Just use Spring JDBC for databases. Or use Scala with closures instead of templates. A generic base dao will provide create, read, update and delete operations. With much less magic than the ORM does."

Unfortunately, all these things work on small projects with a few number of tables. Throw in a large project with a complex domain model, requirements for relational persistence and the usual stacks of requirements that today's enterprise applications offer, you will soon discover that your home made less boilerplated stuff goes for a toss. In most cases you will end up either rolling out your own ORM or start building a concoction of domain models invaded with indelible concerns of persistence. In the former case, obviously your ORM will not be as performant or efficient as the likes of Hibernate. And in the latter case, either you will end up building an ActiveRecord model with the domain object mirroring your relational table or you may be more unfortunate with a bigger unmanageable bloat.

It's very true that none of the ORMs in the market today are without their pains. You need to know their internals in order to make them generate efficient queries, you need to understand all the nuances to make use of their caching behaviors and above all you need to manage all the reams of jars that they come with.

Yet, in the Java stack, Hibernate and JPA are still the best of options when we talk about big persistent domain models. Here are my points in support of this claim ..

If you are not designing an ActiveRecord based model, it's of paramount importance that you keep your domain model decoupled from the persistent model. And ORMs offer the most pragmatic way towards this approach. I know people will say that it's indeed difficult to achieve this in a real life world and in typical situations compromises need to be made. Yet, I think if you need to make compromise for performance or whatever reasons, it's only an exception. Ultimately you will find that the mjority of your domain model is decoupled enough for a clean evolution.

ORMs save you from writing tons of SQL code. This is one of the compelling advantages that I have found with an ORM that my Java code is not littered with SQL that's impossible to refactor when my schema changes. Again, there will be situations when your ORM may not churn out the best of optimized SQLs and you will have to do that manually. But, as I said before, it's an exception and decisions cannot be made based on exceptions only.

ORMs help you virtualize your data layer. And this can have huge gains in your scalability aspect. Have a look at how grids like Terracotta can use distributed caches like EhCache to scale out your data layer seamlessly. Without the virtualization of the ORM, you may still achieve scalability using vendor specific data grids. But this comes at the price of lots of $$ and the vendor lock-ins.

Stephan also feels that the future of ORMs will be jeopardized because of the advent of polyglot persistence and nosql data stores. The fact is that the use cases that nosql datastores address are very much orthogonal to those served by the relational databases. Key/value lookups with semi-structured data, eventual consistency, efficient processing of web scale networked data backed with the power of map/reduce paradigms are not something that your online transactional enterprise application with strict requirements of ACID will comply with. So long we have been trying to shoehorn every form of data processing with a single hammer of relational databases. It's indeed very refreshing to see the onset of nosql paradigm and it being already in use in production systems. But ORMs will still have their roles to play in the complementary set of use cases.

Tuesday, October 06, 2009

DSLs in Action : Sharing the detailed Table of Contents (WIP)

Just wanted to share the detailed Table of Contents of the chapters that have been written so far. Please send in your feedbacks either as comments on this post or in the Author Online Forum. The brief ToC is part of the book home page. Let me know of any other topic that you wopuld like to see as part of this book.

Chapter 1. Learning to speak the Language of the Domain

1.1. The Problem Domain and the Solution Domain
1.1.1. Abstractions as the Core

1.2. Domain Modeling - Establishing a Common Vocabulary

1.3. Role of Abstractions in Domain Modeling
1.3.1. Minimalism publishes only what YOU promise
1.3.2. Distillation Keeps only what You need
1.3.3. Extensibility Helps Piecemeal Growth
1.3.3.1. Mixins - A Design Pattern for Extensibility
1.3.3.2. Mixins for extending MAP
1.3.3.3. Functional Extensibility
1.3.3.4. Extensibility can be Monkey Business too
1.3.4. Composability comes from Purity
1.3.4.1. Design Patterns for Composability
1.3.4.2. Back to Languages
1.3.4.3. Side-effects and Composability
1.3.4.4. Composability and Concurrency

1.4. Domain Specific Language (DSL) - It's all about Expressivity
1.4.1. Clarity of Intent
1.4.2. Expressivity's all about Well-Designed Abstractions

1.5. When do we need a DSL
1.5.1. The Advantages
1.5.2. The Disadvantages

1.6. DSL - What's in it for Non-Programmers?

1.7. Summary
1.8. Reference

Chapter 2. Domain Specific Languages in the Wild

2.1. A Motivating Example
2.1.1. Setting up the Common Vocabulary
2.1.2. The First Java Implementation
2.1.3. Externalize the Domain with XML
2.1.4. Groovy - a more Expressive Implementation Language
2.1.4.1. Executing the Groovy DSL

2.2. Classification of DSLs
2.2.1. Internal DSL Patterns - Commonality and Variability
2.2.1.1. Smart APIs, Fluent Interfaces
2.2.1.2. Code Generation through Runtime Meta-programming
2.2.1.3. Code Generation through Compile time Meta-programming
2.2.1.4. Explicit Abstract Syntax Tree manipulation
2.2.1.5. Pure Embedding of Typed Abstractions
2.2.2. External DSL Patterns - Commonality and Variability
2.2.2.1. Context driven String Manipulation
2.2.2.2. Transforming XML to Consumable Resource
2.2.2.3. Non-textual Representations
2.2.2.4. Mixing DSL with Embedded Foreign Code
2.2.2.5. Parser Combinator based DSL Design

2.3. Choosing DSL Implementations - Internal OR External

2.4. The Meta in the DSL
2.4.1. Runtime Meta-Programming in DSL Implementation
2.4.2. Compile time Meta-Programming in DSL Implementation

2.5. Lisp as the DSL

2.6. Summary
2.7. Reference

Chapter 3. DSL Driven Application Development

3.1. Exploring DSL Integration

3.2. Homogeneous Integration
3.2.1. Java 6 Scripting Engine
3.2.2. A DSL Wrapper
3.2.3. Language Specific Integration Features
3.2.4. Spring based Integration

3.3. Heterogeneous Integration with External DSLs
3.4. Handling Exceptions
3.5. Managing Performance

3.6. Summary
3.7. Reference

Chapter 4. Internal DSL Implementation Patterns

4.1 Building up your DSL Toolbox

4.2 Embedded DSL - Patterns in Meta-programming
4.2.1 Implicit Context and Smart APIs
4.2.2 Dynamic Decorators using Mixins
4.2.3 Hierarchical Structures using Builders
4.2.4 New Additions to your Toolbox

4.3 Embedded DSL - Patterns with Typed Abstractions
4.3.1 Higher Order Functions as Generic Abstractions
4.3.2 Explicit Type Constraints to model Domain logic
4.3.3 New Additions to your Toolbox

4.4 Generative DSL - Boilerplates for Runtime Generation
4.5 Generative DSL - one more tryst with Macros

4.6 Summary
4.7 References

Monday, October 05, 2009

DSLs in Action now in MEAP

My book DSLs in Action (see sidebar) is now available in MEAP (Manning Early Access Program). I have planned it to be one totally for the real world DSL implementers. It starts with a slow paced introduction to abstraction design, discusses principles for well-designed abstractions and then makes a deep dive to the world of DSL based development.

The first part of the book focuses on usage of DSLs in the real world and how you would go about setting up your DSL based development environment. The second part is focused entirely on implementation techniques, patterns and idioms and how they map to the various features offered by today's programming languages. The book is heavily biased towards the JVM. The three most discussed languages are Scala, Ruby and Groovy, with some snippets of Clojure as well.

The book is still very much a WIP. Please send all of your feedbacks in the Author's Forum. It can only make the quality better.

Enjoy!

Sunday, October 04, 2009

Pluggable Persistent Transactors with Akka

NoSql is here. Yes, like using multiple programnming languages, we are thinking in terms of using the same paradigm with storage too. And why not? If we can use an alternate language to be more expressive for a specific problem, why not use an alternate form of storage that is a better fit for your requirement?

More and more projects are using alternate forms of storage for persistence of the various forms of data that the application needs to handle. Of course relational databases have their very own place in this stack - the difference is that people today are not being pedantic about their use. And not using the RDBMS as the universal hammer for every nail that they see in the application.

Consider an application that needs durability for transactional data structures. I want to model a transactional banking system, basic debit credit operations, with a message based model. But the operations have to be persistent. The balance needs to be durable and all transactions need to be persisted on the disk. It doesn't matter what structures you store underneath - all I need is some key/value interface that allows me to store the transactions and the balances keyed by the transaction id. I don't even need to bother what form of storage I use at the backend. It can be any database, any key-value store, Terracotta or anything. Will you give me the flexibility to make the storage pluggable? Well, that's a bonus!

Enter Akka .. and its pluggable persistence layer that you can nicely marry to its message passing actor based interface. Consider the following messages for processing debit/credit operations ..

case class Balance(accountNo: String)
case class Debit(accountNo: String, amount: BigInt, failer: Actor)
case class MultiDebit(accountNo: String, amounts: List[BigInt], failer: Actor)
case class Credit(accountNo: String, amount: BigInt)
case object LogSize

In the above messages, the failer actor is used to report fail operations in case the debit fails. Also we want to have all of the above operations as transactional, which we can make declaratively in Akka. Here's the basic actor definition ..

class BankAccountActor extends Actor {
  makeTransactionRequired
  private val accountState = 
    TransactionalState.newPersistentMap(MongoStorageConfig())
  private val txnLog = 
    TransactionalState.newPersistentVector(MongoStorageConfig())
  //..
}

makeTransactionRequired makes the actor transactional

accountState is a persistent Map that plugs in to a MongoDB based storage, as is evident from the config parameter. In real life application, this will be further abstracted from a configuration file. Earlier I had blogged about the implementation of the MongoDB layer for Akka persistence. accountState offers the key/value interface that will be used by the actor to maintain the durable snapshot of all balances.

txnLog is a persistent vector, once again backed up by a MongoDB storage and stores all the transaction logs that occurs in the system

Let us now look at the actor interface that does the message receive and process the debit/credit operations ..

class BankAccountActor extends Actor {
  makeTransactionRequired
  private val accountState = 
    TransactionalState.newPersistentMap(MongoStorageConfig())
  private val txnLog = 
    TransactionalState.newPersistentVector(MongoStorageConfig())

  def receive: PartialFunction[Any, Unit] = {
    // check balance
    case Balance(accountNo) =>
      txnLog.add("Balance:" + accountNo)
      reply(accountState.get(accountNo).get)

    // debit amount: can fail
    case Debit(accountNo, amount, failer) =>
      txnLog.add("Debit:" + accountNo + " " + amount)
      val m: BigInt =
      accountState.get(accountNo) match {
        case None => 0
        case Some(v) => {
          val JsNumber(n) = v.asInstanceOf[JsValue]
          BigInt(n.toString)
        }
      }
      accountState.put(accountNo, (m - amount))
      if (amount > m)
        failer !! "Failure"
      reply(m - amount)

    //..
  }
}

Here we have the implementation of two messages -

Balance reports the current balance and

Debit does a debit operation on the balance

Note that the interfaces that these implementations use is in no way dependent on the MongoDB specific APIs. Akka offers a uniform key/value API set across all supported persistent storage. And each of the above pattern matched message processing fragments offer transaction semantics. This is pluggability!

Credit looks very similar to Debit. However, a more interesting use case is the MultiDebit operation that offers a transactional interface. Just like your relational database's ACID semantics, the transactional semantics of Akka offers atomicity over this message. Either the whole MultiDebit will pass or it will be rollbacked.

class BankAccountActor extends Actor {

  //..
  def receive: PartialFunction[Any, Unit] = {

    // many debits: can fail
    // demonstrates true rollback even if multiple puts have been done
    case MultiDebit(accountNo, amounts, failer) =>
      txnLog.add("MultiDebit:" + accountNo + " " + amounts.map(_.intValue).foldLeft(0)(_ + _))
      val m: BigInt =
      accountState.get(accountNo) match {
        case None => 0
        case Some(v) => BigInt(v.asInstanceOf[String])
      }
      var bal: BigInt = 0
      amounts.foreach {amount =>
        bal = bal + amount
        accountState.put(accountNo, (m - bal))
      }
      if (bal > m) failer !! "Failure"
      reply(m - bal)
    
    //..
  }
}

Now that we have the implementation in place, let's look at the test cases that exercise them ..

First a successful debit test case. Note how we have a separate failer actor that reports failure of operations to the caller.

@Test
def testSuccessfulDebit = {
  val bactor = new BankAccountActor
  bactor.start
  val failer = new PersistentFailerActor
  failer.start
  bactor !! Credit("a-123", 5000)
  bactor !! Debit("a-123", 3000, failer)
  val b = (bactor !! Balance("a-123")).get.asInstanceOf[JsValue]
  val JsNumber(n) = b
  assertEquals(BigInt(2000), BigInt(n.toString))

  bactor !! Credit("a-123", 7000)
  val b1 = (bactor !! Balance("a-123")).get.asInstanceOf[JsValue]
  val JsNumber(n1) = b1
  assertEquals(BigInt(9000), BigInt(n1.toString))

  bactor !! Debit("a-123", 8000, failer)
  val b2 = (bactor !! Balance("a-123")).get.asInstanceOf[JsValue]
  val JsNumber(n2) = b2
  assertEquals(BigInt(1000), BigInt(n2.toString))
  assertEquals(7, (bactor !! LogSize).get)
}

And now the interesting MultiDebit that illustrates the transaction rollback semantics ..

@Test
def testUnsuccessfulMultiDebit = {
  val bactor = new BankAccountActor
  bactor.start
  bactor !! Credit("a-123", 5000)
  val b = (bactor !! Balance("a-123")).get.asInstanceOf[JsValue]
  val JsNumber(n) = b
  assertEquals(BigInt(5000), BigInt(n.toString))

  val failer = new PersistentFailerActor
  failer.start
  try {
    bactor !! MultiDebit("a-123", List(500, 2000, 1000, 3000), failer)
    fail("should throw exception")
  } catch { case e: RuntimeException => {}}

  val b1 = (bactor !! Balance("a-123")).get.asInstanceOf[JsValue]
  val JsNumber(n1) = b1
  assertEquals(BigInt(5000), BigInt(n1.toString))

  // should not count the failed one
  assertEquals(3, (bactor !! LogSize).get)
}

In the snippet above, the balance remains at 5000 when the debit fails while processing the final amount of the list passed to MultiDebit message.

Relational database will always remain for the use case that it serves the best - persistence of data that needs a true relational model. NoSQL is gradually making its place in the application stack for the complementary set of use cases that need a much loosely coupled model, a key/value database or a document oriented database. Apart from easier manageability, another big advantage using these databases is that they do not need big ceremonious ORM layers between the application model and the data model. This is because what you see is what you store (WYSIWYS), there is no paradigm mismatch that needs to be bridged.