Monday, January 31, 2011

CQRS with Akka actors and functional domain models

Fighting with impedance mismatch has been quite a losing battle so far in the development of software systems. We fight mismatch to handle stateful interactions with a stateless protocol. We fight mismatch of paradigms between the user interface layers, domain layers and data layers. Nothing concrete has emerged till date, though there has been quite a number of efforts to keep the organization of persistent data as close as possible to the way the domain layer uses them.

Command Query separation is nothing new. It's yet another attempt to manage impedance mismatch between how your application uses data and how the underlying store manages data so that transactional updates can be served with equal agility as read-only queries. Bertrand Meyer made this distinction long back when he mentioned that ..

The features that characterize a class are divided into commands and queries. A command serves to modify objects, a query to return information about objects.
Processing a command involves manipulation of state - hence the underlying data model needs to be organized in a way that makes updation easier. A query needs to return data in the format the user wants to view them. Hence it makes sense to organize your storage likewise so that we don't need to process expensive joins in order to process queries. This leads to a dichotomy in the way the application, as a whole, requires processing of data. Command Query Separation (CQRS) endorses this separation. Commands update state - hence produce side-effects. Queries are like pure functions and should be designed using applicative, completely side-effect free approaches. So, the CQRS principle, as Bertrand Meyer said is ..

Functions should not produce abstract side-effects.
Greg Young has delivered some great sessions on DDD and CQRS. In 2008 he said "A single model cannot be appropriate for reporting, searching and transactional behavior". We have at least two models - one that processes commands and feeds changes to another model which serves user queries and reports. The transactional behavior of the application gets executed through the rich domain model of aggregates and repositories, while the queries are directly served from a de-normalized data model.

CQRS and Event Sourcing

One other concept that goes alongside CQRS is that of Event Sourcing. I blogged about some of the benefits that it has quite some time back and implemented event sourcing using Scala actors. The point where event sourcing meets CQRS is how we model the transactions of the domain (resulting from commands) as a sequence of events in the underlying persistent store. Modeling persistence of transactions as an event stream helps record updates as append only event snapshots that can be replayed as and when required. All updates in the domain model are now being translated into inserts in the persistence model. And this gives us an explicit view of all state changes in the domain model.

Over the last few days I have been playing around implementing CQRS and Event Sourcing within a domain model using the principles of functional programming and actor based asynchronous messaging. One of the big challenges is to model updates in a functional way and store them as sequences of event streams. In this post, I will share some of the experiences and implementation snippets that I came up with over the last few days. The complete implementation, so far, can be found in my github repository. It's very much a work in progress, which I hope to enrich more and more as I get some time.

A simple domain model

First the domain model and the aggregate root that will be used to publish events .. It's a ridiculously simple model for a security trade, with lots and lots of stuff elided for simplicity ..

// the main domain class
case class Trade(account: Account, instrument: Instrument, refNo: String, 
  market: Market, unitPrice: BigDecimal, quantity: BigDecimal, 
  tradeDate: Date = Calendar.getInstance.getTime, valueDate: Option[Date] = None, 
  taxFees: Option[List[(TaxFeeId, BigDecimal)]] = None, 
  netAmount: Option[BigDecimal] = None) {

  override def equals(that: Any) = refNo == that.asInstanceOf[Trade].refNo
  override def hashCode = refNo.hashCode
}


For simplicity, we make reference number a unique identifier for a trade. So all comparisons and equalities will be based on reference numbers only.

In a typical application, the entry point for users is the service layer that exposes facade methods that render use cases for the business. In a trading application, two of the most common services that need to be done on a security trade are it's value date computation and it's enrichment. So when the trade passes through its processing pipeline it gets its value date updated and then gets enriched with the applicable taxes and fees and finally its worth net cash value.

If you are a client using these services (again, overly elided for simplicity) you may have the following service methods ..

class TradingClient {
  // create a trade : wraps the model method
  def newTrade(account: Account, instrument: Instrument, refNo: String, market: Market,
    unitPrice: BigDecimal, quantity: BigDecimal, tradeDate: Date = Calendar.getInstance.getTime) =
      //..

  // enrich trade
  def doEnrichTrade(trade: Trade) = //..

  // add value date
  def doAddValueDate(trade: Trade) = //..

  // a sample query
  def getAllTrades = //..
}


In a typical implementation these methods will invoke the domain artifacts of repositories that will either query aggregate roots or do updates on them before being persisted in the underlying store. In a CQRS implementation, the domain model will be updated but the persistent store will record these updates as event streams.

So now we have the first problem - how do we represent updates in the functional world so that we can compose them later when we need to snapshot the persistent aggregate root?

Lenses FTW

I used type lenses for representing updates functionally. Lenses solve the problem of representing updates so that they can be composed. A lens between a set of source structures S and a set of target structures T is a pair of functions:
- get from S to T
- putback from T x S to S

For more on lenses, have a look at this presentation by Benjamin Pierce. scalaz contains lenses as part of its distribution and models a lens as a case class containing a pair of get and set functions ..

case class Lens[A,B](get: A => B, set: (A,B) => A) extends Immutable { //..


Here are some examples from my domain model for updating a trade with its value date or enriching it with tax/fee values and net cash value ..

// add tax/fees
val taxFeeLens: Lens[Trade, Option[List[(TaxFeeId, BigDecimal)]]] = 
  Lens((t: Trade) => t.taxFees, 
       (t: Trade, tfs: Option[List[(TaxFeeId, BigDecimal)]]) => t.copy(taxFees = tfs))

// add net amount
val netAmountLens: Lens[Trade, Option[BigDecimal]] = 
  Lens((t: Trade) => t.netAmount, 
       (t: Trade, n: Option[BigDecimal]) => t.copy(netAmount = n))

// add value date
val valueDateLens: Lens[Trade, Option[Date]] = 
  Lens((t: Trade) => t.valueDate, 
       (t: Trade, d: Option[Date]) => t.copy(valueDate = d))


We will use the above lenses for updation of our aggregate root and also wrap them into closures for subsequent feed into the event stream for persistent storage. In this example I have implemented in-memory persistence for both the command and the query store. Persistence into an on disk database will be available very soon at a github repository near you :)

Combinators that abstract state processing

Let's now define a couple of combinators that encapsulate our transactional service method implementations within the domain model. Note how the lenses have also been abstracted away from the client API as implementation artifacts. For details of these implementations please visit the github repo that contains a working model along with test cases.

// closure that enriches a trade with tax/fee information and net cash value
val enrichTrade: Trade => Trade = {trade =>
  val taxes = for {
    taxFeeIds      <- forTrade // get the tax/fee ids for a trade
    taxFeeValues   <- taxFeeCalculate // calculate tax fee values
  }
  yield(taxFeeIds map taxFeeValues)
  val t = taxFeeLens.set(trade, taxes(trade))
  netAmountLens.set(t, t.taxFees.map(_.foldl(principal(t))((a, b) => a + b._2)))
}

// closure for adding a value date
val addValueDate: Trade => Trade = {trade =>
  val c = Calendar.getInstance
  c.setTime(trade.tradeDate)
  c.add(Calendar.DAY_OF_MONTH, 3)
  valueDateLens.set(trade, Some(c.getTime))
}


We will now use these combinators to implement our transactional services which the TradingClient will invoke. Each of these service methods will do 2 things :-

1. effect the closure on the domain model and
2. as a side-effect stream the event into the command store

Sounds like a kestrel .. doesn't it ? Well here's a kestrel combinator and the above service methods realized in my CQRS implementation ..

// refer To Mock a Mockingbird
private[service] def kestrel[T](trade: T, proc: T => T)(effect: => Unit) = {
  val t = proc(trade)
  effect
  t
}

// enrich trade
def doEnrichTrade(trade: Trade) = 
  kestrel(trade, enrichTrade) { 
    ts ! TradeEnriched(trade, enrichTrade)
  }

// add value date
def doAddValueDate(trade: Trade) = 
  kestrel(trade, addValueDate) { 
    ts ! ValueDateAdded(trade, addValueDate)


Back to Akka!

It was only expected that I will be using Akka for transporting the event down to the command store. And this transport is implemented as a asynchronous side-effect of the service methods - just what the doctor ordered for an actor use case :)

With Event Sourcing and CQRS, one of the things that you would require is the ability to snapshot your persistent versions of the aggregate root. The current implementation is simple and does a zero based snapshotting i.e every time you ask for a snapshot, it replays the whole stream for that trade and gives you the current state. In typical real world systems, you do interval snapshotting and start replaying from the latest available snapshot in case you want to get the current state.

Here's our command store modeled as an Akka actor that processes the various events that it receives from an upstream server ..

// CommandStore modeled as an actor
class CommandStore(qryStore: ActorRef) extends Actor {
  private var events = Map.empty[Trade, List[TradeEvent]]

  def receive = {
    case m@TradeEnriched(trade, closure) => 
      events += ((trade, events.getOrElse(trade, List.empty[TradeEvent]) :+ closure))
      qryStore forward m
    case m@ValueDateAdded(trade, closure) => 
      events += ((trade, events.getOrElse(trade, List.empty[TradeEvent]) :+ closure))
      qryStore forward m
    case Snapshot => 
      self.reply(events.keys.map {trade =>
        events(trade).foldLeft(trade)((t, e) => e(t))
      })
  }
}


Note how the Snapshot message is processed as a fold over all the accumulated closures starting with the base trade. Also the command store adds the event to its repository (which is currently an in memory collection) and forwards the event to the query store. There we can have the trade modeled as per the requirements of the query / reporting client. For simplicity the current example assumes the model as the same as our domain model presented above.

Here's the query store, also an actor, that persists the trades on receiving relevant events from the command store. In effect the command store responds to messages that it receives from an upstream TradingServer and asynchronously updates the query store with the latest state of the trade.

// QueryStore modeled as an actor
class QueryStore extends Actor {
  private var trades = new collection.immutable.TreeSet[Trade]()(Ordering.by(_.refNo))

  def receive = {
    case TradeEnriched(trade, closure) => 
      trades += trades.find(== trade).map(closure(_)).getOrElse(closure(trade))
    case ValueDateAdded(trade, closure) => 
      trades += trades.find(== trade).map(closure(_)).getOrElse(closure(trade))
    case QueryAllTrades =>
      self.reply(trades.toList)
  }
}


And here's a sample sequence diagram that illustrates the interactions that take place for a sample service call by the client in the CQRS implementation ..


The full implementation also contains the complete wiring of the above abstractions along with Akka's fault tolerant supervision capabilities. A complete test case is also included along with the distribution.

Have fun!

Monday, January 10, 2011

Iteration in Scala - effectful yet functional

One of the papers that influenced me a lot in 2010 was The Essence of the Iterator Pattern by Jeremy Gibbons and Bruno C. d. S. Oliveira. It builds upon where McBride and Paterson left in their treatise on Applicative Functors. Gibbons' paper discusses the various aspects of building traversal structures in the presence of effects.

In this post I look at some of the traversal patterns' functional implementations using scalaz. In the paper on applicative functors, McBride and Paterson defines traverse as an applicative mapping operation ..

traverse :: Applicative f => (-> f b) -> [a] -> f [b]


Gibbons et. al. uses this abstraction to study various traversal structures in the presence of effects. The paper starts with a C# code snippet that uses the syntax sugar of foreach to traverse over a collection of elements ..

public static int loop<MyObj> (IEnumerable<MyObj> coll){
  int n = 0;
  foreach (MyObj obj in coll){
    n = n+1;
    obj.touch();
  }
  return n;
}


In the above loop method, we do two things simultaneously :-

  1. mapping - doing some operation touch() on the elements of coll with the expectation that we get the modified collection at the end of the loop
  2. accumulating - counting the elements, which is a stateful operation for each iteration and which is independent of the operation which we do on the elements
And in the presence of mutation, the two concerns are quite conflated. Gibbons et. al. uses McBride and Paterson’s applicative functors, the traverse operator which they discuss in the same paper, to come up with some of the special cases of effectful traversals where the mapping aspect is independent of accumulation and vice versa.

Over the last weekend I was exploring how much of these effectful functional traversals can be done using scalaz, the closest to Haskell you can get with Scala. Section 4.2 of the original paper talks about two definite patterns of effectful traversal. Both of these patterns combine mapping and accumulation (like the C# code above) but separates the concerns skillfully using functional techniques. Let's see how much of that we can manage with scalaz functors.

Pattern #1

The first pattern of traversal accumulates elements effectfully, but modifies the elements of the collection purely and independently of this accumulation. Here's the scalaz implementation of collect (see the original paper for the haskell implementation) ..

def collect[T[_]:Traverse, A, B, S](f: A => B, t: T[A], g: S => S) =
  t.traverse[({type λ[x] = State[S,x]}), B](=> state((s: S) => (g(s), f(a))))


To the uninitiated, the type annotation in traverse looks ugly - it's there because scalac cannot infer partial application of type constructors, a problem which will be rectified once Adriaan fixes issue 2712 on the Scala Trac.

Traverse is one of the typeclasses in scalaz similar to the model of Data.Traversable in Haskell.

trait Traverse[T[_]] extends Functor[T] {
  def traverse[F[_] : Applicative, A, B](f: A => F[B], t: T[A]): F[T[B]]

  import Scalaz._

  override def fmap[A, B](k: T[A], f: A => B) = traverse[Identity, A, B](f(_), k)
}


and scalaz defines implementations of the Traverse typeclass for a host of classes on which you can invoke traverse.

The above implmentation uses the State monad to handle effectful computations. For an introduction to the State monad in scalaz, have a look at this post from Tony Morris.

Note, f is the pure function that maps on the elements of the collection, g is the function that does the effectful accumulation through the State monad. Using collect, here's a version of the C# loop method that we did at the beginning ..

val loop = collect((a: Int) => 2 * a, List(10, 20, 30, 40), (i: Int) => i + 1)
loop(0) should equal((4, List(20, 40, 60, 80)))


Now we have the effectful iteration without using any mutable variables.

Pattern #2

The second pattern of traversal modifies elements purely but dependent on some state that evolves independently of the elements. Gibbons et. al. calls this abstraction disperse, whose scalaz implementation can be as follows ..

def disperse[T[_]: Traverse, A, S, B](t: T[A], s: A => State[S, B]) =
  t.traverse[({type λ[x] = State[S,x]}), B](s)


Note how the elements of the collection are being modified through the State monad. Using disperse, we can write a labeling function that labels every element with its position in order of traversal ..

def label[T[_]: Traverse, A](t: T[A]) = 
  disperse(t, ((a: A) => state((i: Int) => (i+1, i)))) ! 0

label(List(10, 20, 30, 40)) should equal(List(0, 1, 2, 3)) 


disperse can also be used to implement the wordCount example that ships with scalaz distribution. Actually it counts the number of characters and lines in a stream.

def charLineCount[T[_]:Traverse](t: T[Char]) =
  disperse(t, ((a: Char) => state((counts: (Int, Int)) =>
    ((counts._1 + 1, counts._2 + (if (== '\n') 1 else 0)), (counts._1, counts._2))))) ! (1,1)

charLineCount("the cat in the hat\n sat on the mat\n".toList).last should equal((35, 2))