Ruminations of a Programmer: January 2012

Tuesday, January 24, 2012

List Algebras and the fixpoint combinator Mu

In my last post on recursive types and fixed point combinator, we saw how the type equations of the form a = F(a), where F is the type constructor have solutions of the form Mu a . F where Mu is the fixed point combinator. Substituting the solution in the original equation, we get ..

Mu a . F = F {Mu a . F / a}

where the rhs indicates substitution of all free a's in F by Mu a . F.

Using this we also got the type equation for ListInt as ..

ListInt = Mu a . Unit + Int x a

In this post we view the same problem from a category theory point of view. This post assumes understanding of quite a bit of category theory concepts. If you are unfamiliar with any of them you can refer to some basic text on the subject.

We start with the definition of ListInt as in the earlier post ..

// nil takes no arguments and returns a List data type

nil : 1 -> ListInt



// cons takes 2 arguments and returns a List data type

cons : (Int x ListInt) -> ListInt

Combining the two functions above, we get a single function as ..

in = [nil, cons] : 1 + (Int x ListInt) -> ListInt

We can say that this forms an algebra of the functor F(X) = 1 + (Int x X). Let's represent this algebra by (Mu F, in) or (Mu F, [nil, cons]), where Mu F is ListInt in the above combined function.

As a next step we show that the algebra (Mu F, [nil, cons]) forms an initial algebra representing the data type of Lists over a given set of integers. Here we are dealing with lists of integers though the same result can be shown for lists of any type A.

In order to show (Mu F, [nil cons]) form an initial F-algebra we consider an arbitrary F-algebra (C, phi), where phi is an arrow out of the sum type given by :

C : 1 -> C

h : (Int x C) -> C

and the join given by [c, h] : 1 + (Int x C) -> C

By definition, if (Mu F, [nil, cons]) has to form an initial F-algebra, then for any arbitrary F-algebra (C, phi) in that category, we need to find a function f: Mu F -> C which is a homomorphism and it should be unique. So for the algebra [c, h] the following diagram must commute ..

which means we must have a unique solution to the following 2 equations ..

f o nil = c

f o cons = h o (id x f)

From the universal property of initial F-algebras it's easy to see that this system of equations has a unique solution which is fold(c, h). It's the catamorphism represented by ..

f: {[c, h]}: ListInt -> C

This proves that (Mu F, [nil, cons]) is an initial F-algebra over the endofunctor F(X) = 1 + (Int x X). And it can be shown that an initial algebra in: F (Mu F) -> Mu F is an isomorphism and the carrier of the initial algebra is (upto isomorphism) a fixed point of the functor. Well, that may sound a bit of a mouthful. But we can discuss this in more details in one of my subsequent posts. There's a well established lemma due to Lambek that proves this. I can't do it in this blog post, since it needs some more prerequisites to be established beforehand which would make this post a bit bloated. But it's really a fascinating proof and I promise to take this up in one of my upcoming posts. Also we will see many properties of initial algebras and how they can be combined to define many of the properties of recursive data types in a purely algebraic way.

As I promised in my last post, here we have seen the other side of Mu - we started with the list definition, showed that it forms an initial algebra over the endofunctor F(X) = 1 + (Int x X) and arrived at the same conclusion that Mu F is a fixed point. Or Mu is the fixed point combinator.

Sunday, January 15, 2012

Event Sourcing, Akka FSMs and functional domain models

I blogged on Event Sourcing and functional domain models earlier. In this post I would like to share more of my thoughts on the same subject and how with a higher level of abstraction you can make your domain aggregate boundary more resilient and decoupled from external references.

When we talk about a domain model, the Aggregate takes the centerstage. An aggregate is a core abstraction that represents the time invariant part of the domain. It's an embodiment of all states that the aggregate can be in throughout its lifecycle in the system. So, it's extremely important that we take every pain to distil the domain model and protect the aggregate from all unwanted external references. Maybe an example will make it clearer.

Keeping the Aggregate pure

Consider a Trade model as the aggregate. By Trade, I mean a security trade that takes place in the stock exchange where counterparties exchange securities and currencies for settlement. If you're a regular reader of my blog, you must be aware of this, since this is almost exclusively the domain that I talk of in my blog posts.

A trade can be in various states like newly entered, value date added, enriched with tax and fee information, net trade value computed etc. In a trading application, as a trade passes through the processing pipeline, it moves from one state to another. The final state represents the complete Trade object which is ready to be settled between the counterparties.

In the traditional model of processing we have the final snapshot of the aggregate - what we don't have is the audit log of the actual state transitions that happened in response to the events. With event sourcing we record the state transitions as a pipeline of events which can be replayed any time to rollback or roll-forward to any state of our choice. Event sourcing is coming up as one of the potent ways to model a system and there are lots of blog posts being written to discuss about the various architectural strategies to implement an event sourced application.

That's ok. But whose responsibility is it to manage these state transitions and record the timeline of changes ? It's definitely not the responsibility of the aggregate. The aggregate is supposed to be a pure abstraction. We must design it as an immutable object that can respond to events and transform itself into the new state. In fact the aggregate implementation should not be aware of whether it's serving an event sourced architecture or not.

There are various ways you can model the states of an aggregate. One option that's frequently used involves algebraic data types. Model the various states as a sum type of products. In Scala we do this as case classes ..

sealed abstract class Trade {
  def account: Account
  def instrument: Instrument
  //..
}

case class NewTrade(..) extends Trade {
  //..
}

case class EnrichedTrade(..) extends Trade {
  //..
}

Another option may be to have one data type to model the Trade and model states as immutable enumerations with changes being effected on the aggregate as functional updates. No in place mutation, but use functional data structures like zippers or type lenses to create the transformed object in the new state. Here's an example where we create an enriched trade out of a newly created one ..

// closure that enriches a trade
val enrichTrade: Trade => Trade = {trade =>
  val taxes = for {
    taxFeeIds      <- forTrade // get the tax/fee ids for a trade
    taxFeeValues   <- taxFeeCalculate // calculate tax fee values
  }
  yield(taxFeeIds ° taxFeeValues)
  val t = taxFeeLens.set(trade, taxes(trade))
  netAmountLens.set(t, t.taxFees.map(_.foldl(principal(t))((a, b) => a + b._2)))
}

But then we come back to the same question - if the aggregate is distilled to model the core domain, who handles the events ? Someone needs to model the event changes, effect the state transitions and take the aggregate from one state to the next.

Enter Finite State Machines

In one of my projects I used the domain service layer to do this. The domain logic for effecting the changes lies with the aggregate, but they are invoked from the domain service in response to events when the aggregate reaches specific states. In other words I model the domain service as a finite state machine that manages the lifecycle of the aggregate.

In our example a Trading Service can be modeled as an FSM that controls the lifecycle of a Trade. As the following ..

import TradeModel._

class TradeLifecycle(trade: Trade, timeout: Duration, log: Option[EventLog]) 
  extends Actor with FSM[TradeState, Trade] {
  import FSM._

  startWith(Created, trade)

  when(Created) {
    case Event(e@AddValueDate, data) =>
      log.map(_.appendAsync(data.refNo, Created, Some(data), e))
      val trd = addValueDate(data)
      notifyListeners(trd) 
      goto(ValueDateAdded) using trd forMax(timeout)
  }

  when(ValueDateAdded) {
    case Event(StateTimeout, _) =>
      stay

    case Event(e@EnrichTrade, data) =>
      log.map(_.appendAsync(data.refNo, ValueDateAdded, None,  e))
      val trd = enrichTrade(data)
      notifyListeners(trd)
      goto(Enriched) using trd forMax(timeout)
  }

  when(Enriched) {
    case Event(StateTimeout, _) =>
      stay

    case Event(e@SendOutContractNote, data) =>
      log.map(_.appendAsync(data.refNo, Enriched, None,  e))
      sender ! data
      stop
  }

  initialize
}

The snippet above contains a lot of other details which I did not have time to prune. It's actually part of the implementation of an event sourced trading application that uses asynchronous messaging (actors) as the backbone for event logging and reaching out to multiple consumers based on the CQRS paradigm.

Note that the FSM model above makes it very explicit about the states that the Trade model can reach and the events that it handles while in each of these states. Also we can use this FSM technique to log events (for event sourcing), notify listeners about the events (CQRS) in a very much declarative manner as implemented above.

Let me know in the comments what are your views on this FSM approach towards handling state transitions in domain models. I think it helps keep aggregates pure and helps design domain services that focus on serving specific aggregate roots.

I will be talking about similar stuff, Akka actor based event sourcing implementations and functional domain models in PhillyETE 2012. Please drop by if this interests you.

Sunday, January 08, 2012

Learning the type level fixpoint combinator Mu

I blogged on Mu, type level fixpoint combinator some time back. I discussed how Mu can be implemented in Scala and how you can use it to derive a generic model for catamorphism and some cool type level data structures. Recently I have been reading TAPL by Benjamin Pierce that gives a very thorough treatment of the theories and implementation semantics of types in a programming language.

And Mu we meet again. Pierce does a very nice job of explaining how Mu does for types what Y does for values. In this post, I will discuss my understanding of Mu from a type theory point of view much of what TAPL explains.

As we know, the collection of types in a programming language forms a category and any equation recursive in types can be converted to obtain an endofunctor on the same category. In an upcoming post I will discuss how the fixed point that we get from Mu translates to an isomoprhism in the diagram of categories.

Let's have a look at the Mu constructor - the fixed point for type constructor. What does it mean ?

Here's the ordinary fixed point combinator for functions (from values to values) ..

Y f = f (Y f)

and here's Mu

Mu f = f (Mu f)

Quite similar in structure to Y, the difference being that Mu operates on type constructors. Here f is a type constructor (one that takes a type as input and generates another type). List is the most commonly used type constructor. You give it a type Int and you get a concrete type ListInt.

So, Mu takes a type constructor f and gives you a type T. This T is the fixed point of f, i.e. f T = T.

Consider the following recursive definition of a List ..

// nil takes no arguments and returns a List data type

nil : 1 -> ListInt



// cons takes 2 arguments and returns a List data type

cons : (Int x ListInt) -> ListInt

Taken together we would like to solve the following equation :

a = Unit + Int x a     // ..... (1)

Now this is recursive and can be unfolded infinitely as

a = Unit + Int x (Unit + Int x a)

  = Unit + Int x (Unit + Int x (Unit + Int x a))

  = ...

TAPL shows that this equation can be represented in the form of an infinite labeled tree and calls this infinite type regular. So, generally speaking, we have an equation of the form a = τ where

1. if a does not occur in τ, then we have a finite solution which, in fact is τ
2. if a occurs in τ, then we have an infinite solution represented by an infinite regular tree

So the above equation is of the form a = ... a ... or we can say a = F(a) where F is the type constructor. This highlights the recursion of types (not of values). Hence any solution to this equation will give us an object which will be the fixed point of the equation. We call this solution Mu a . F.

Since Mu a . F is a solution to a = F(a), we have the following:

Mu a . F = F {Mu a . F / a}, where the rhs indicates substitution of all free a's in F by Mu a . F.

Here Mu is the fixed point combinator which takes the type constructor F and gives us a type, which is the fixed point of F. Using this idea, the above equation (1) has the solution ListInt, which is the fixed point type ..

ListInt = Mu a . Unit + Int x a

In summary, we express recursive types using the fix point type constructor Mu and show that Mu generates the fixed point for the type constructor just like Y generates the same for functions on values.

Sunday, January 01, 2012

2011 - The year that was

The very first thing that strikes me as I start writing a personal account of 2011 as it was is how it has successfully infused some of the transformations in my regular chores of programming world. It has been different and I am starting to enjoy some of the renewed vigor in areas like Type Systems, Machine Learning, Algebra etc. Throughout the year I used mostly one single language - Scala for programming with some occasional stints in Haskell and Octave for the Stanford Machine Learning course. But I have no regrets in not being more polyglotic, because I could find more time to dig deep into some of the more fundamental areas like algebra, category theory and type systems.

Favorite books read / started reading

Types and Programming Languages by Benjamin Pierce : definitely a Knuth statured book in the theory of type systems in programming languages. It's written with a very pragmatic outlook and contains all necessary implementation details to complement the accompanying theory. I have not yet finished reading the book. I am into Chapter 20 and 21 doing recursive types, that look to be one of the most exhaustive treatments of the subject I have ever seen. If and when I manage to finish reading this book, my next plan for theory of programming languages is Design Concepts in Programming Languages.
Conceptual Mathematics by F. William Lawvere and Stephen H. Schanuel - I started reading this book from recommendation by Paul Snively as a precursor to Benjamin Pierce's Category Theory for Computer Scientists. This is an excellent introduction to Category Theory and contains a detailed treatment of the unifying ideas of mathemetics, set theory and category theory.
Learn You a Haskell for Great Good by Miran Lipovaca - possibly the most recommended and updated Haskell reading in print form. The chapters on Applicative Functors, Monads and Zippers are real treats.
Language Proof and Logic by Jon Barwise and John Etchemendy - Starts with a great review of logic and goes on to discuss proofs of soundness and
A Tribute to a Mathemagician by Cipra, Demaine, Demaine and Rodgers - Another book in a series written by those illustrious mathematicians and puzzlers who were inspired by Martin Gardner. It's a fascinating collection of essays on mathematical puzzles - get it if you have that bent of mind.

Exploring new ideas

Category Theory - Often debated on its usefulness in the practical world, category theory gives you the basic understanding of programming language design, semantics and domain theory. I did lots of readings on Category Theory this year and this has led to a more concrete understanding of type systems as well. Hope to continue more in 2012.
Algebra - What's the algebra behind the term Algebraic Data Types ? I took some notes as I started understanding the algebra of recursive data types. Have a look at my notes on github.
Machine Learning - I took the Stanford online course on machine learning. It's been a revelation for me to find the pervasiveness of the subject in today's application context. Also the course encouraged me to look more into mathematics that govern all the theories that ML implements.

Some great papers read

Theorems for Free! by Philip Wadler
Monad Transformers and Modular Interpreters by Sheng Liang, Paul Hudak and Mark Jones
Lazy Functional State Threads by John Launchbury and Simon Peyton Jones
A Domain-Specific Language for manipulation of binary data in Dylan by Hannes Mehnert and Andreas Bogk
RRB-Trees: Efficient Immutable Vectors by Phil Bagwell and Tiark Rompf
Categorical Programming with Inductive and Coinductive Types by Varmo Vene
Functional Programming with Overloading and Higher-Order Polymorphism by Mark P Jones

Programming and Open Source

Once again a year passed by where I did 95% of programming in Scala. Scala has somehow hit the sweet spot of my liking - OO, FP, JVM, succinctness, I get them all in Scala. However, having said that I have every honest intention to renew all my friendships with Haskell and Clojure in 2012. I did quite a bit of Haskell in 2010 and still reaping the ebnefits of being a better Scala programmer piggybacking on my Haskell thoughts. I know Haskell is purer, a piece of Haskell code can be poetry. But the pragmatics of being on the JVM makes Scala more appealing to my professional life.

Two of my open source projects sjson and scala-redis are still quite active. I get pull requests on a regular basis and of course quite a few feature requests and bugs reported on Github. I plan to make some major upgrades to sjson particularly when reflection becomes more accessible in Scala 2.10. Also in line are some enhancements planned towards functor based JSON composition in sjson, which I plan to take up pretty soon. I tried to upgrade scala-redis to keep it in sync with the various releases of redis. Thanks to all of you for trying out sjson and scala-redis. Open source programming is fun and I consider myself blessed to have the opportunities to give something back to the community, which has given me so much over the years.

Any mention of my programming activities in 2011 would be incomplete without mentioning scalaz. I now use it in almost every project. It's really a great creation by Tony, Runar, Jason, Paul and the other members of the team. Using scalaz, I have learnt a lot about functional programming and functional thinking.

Another library that I have been using regularly since its inception is Akka. Asynchronous messaging is the gateway towards writing scalable applications and Akka provides the right set of batteries towards that. You get messaging, data flows, agents, STMs and all through a nice set of APIs both in Java and Scala. I think Akka is nicely poised to be the killer application to push Scala into the mainstream.

Some Publications

In 2011 I got the following two papers published, one of them as part of the esteemed team of Justin, Kresten and Steve. Thanks guys ..

Debasish Ghosh, Justin Sheehy, Kresten Krab Thorup and Steve Vinoski, "Programming Language Impact on the Development of Distributed Systems," FOME'11: Future of Middleware at Middleware'2011.
Debasish Ghosh, "DSL for the Uninitiated," Communications of the ACM, vol. 54, no. 7, pp. 44-50, July 2011

Some nice experiences

I attended 2 international conferences in 2011 - QCon London and PhillyETE. I also talked at PhillyETE on Domain Specific Languages. Both the conferences were amazing and I got to know in person many of the faces that I see and talk to regularly on Twitter and Google+. Incidentally I will also be talking at PhillyETE 2012 slated to be held in April.

My book DSLs In Action came out in late Dec 2010. 2011 was the year where I got the first royalty check from Manning. The writing of the book has been an amazing experience and to get to hear good words from people using the book gives another level of satisfaction. Thank you Manning for giving me the opportunity.

Looking forward to 2012

I am not one for resolutions, but here's a wish list towards more geekery in 2012 ..

Program more in Haskell and Clojure
Blog more (It was pathetic in 2011)
Do more math
Attend more online classes (currently registered for Natural Language Processing, Algorithms and Probabilistic Graphical Modeling at Stanford)
Try to do more conferences (currently registered for PhillyETE and Scala Days)
Learn more algebra, type theory and category theory
Get started with TAOCP Vol 4A
Learn Factor

Wish you a very happy new year. See you all in 2012!