Sunday, February 07, 2010

Scala Self-Type Annotations for Constrained Orthogonality

I talked about orthogonality in design in one of my earlier posts. We had a class Address in Scala and we saw how we can combine it with other orthogonal concerns without polluting the core abstraction. We could do this because Scala offers a host of capabilities to compose smaller abstractions and build larger wholes out of them. A language is orthogonal when it allows such capabilities of composition without overlaps in functionalities between the composing featuresets.

The design of Scala offers many orthogonal features - I showed some of them in my earlier post. The power of mixins for adding orthogonal features ..

val a = new Address(..) with LabelMaker {
  override def toLabel = {
    //..
  }
}

and the power of Scala views with implicits ..

object Address {
  implicit def AddressToLabelMaker(addr: Address) = new LabelMaker {
    def toLabel =
    //..
  }
}

Here Address and LabelMaker are completely unrelated and offers truly orthogonal capabilities when mixed in. However there can be some cases where the mixins themselves are not completely orthogonal to the core abstractions, but really optional extensions to them. In fact the mixins implement some functionalities that may depend on the core abstraction as well. Let's see yet another feature of the Scala type system that makes this modeling wholesome.

Consider the following abstraction for a security trade that takes place in a stock exchange ..

// details ellided for clarity
case class Trade(refNo: String, account: String, 
                 instrument: String, quantity: Int,
                 unitPrice: Int) {
  // principal value of the trade
  def principal = quantity * unitPrice
}


For every trade executed on the exchange we need to have a set of tax and fees associated with it. The exact set of tax and fees depend on a number of factors like type of trade, instruments traded, the exchange where it takes place etc. Let's have a couple of tax/fee traits that model this behavior ..

trait Tax { 
  def calculateTax = //..
}
  
trait Commission { 
  def calculateCommission = //..
}


In the above definitions both methods calculateTax and calculateCommission depends on the trade being executed. One option is to keep them abstract in the above trait and provide their implementations after mixing in with Trade ..

val t = new Trade(..) with Tax with Commission {
  // implementations
  def calculateTax = principal * 0.2
  def calculateCommission = principal * 0.15
}

I did it at the instance level. You can very well use this idiom at the class level and define ..

class RichTrade extends Trade with Tax with Commission {
  //..
}


However the above composition does not clearly bring out the fact that the domain rules mandate that the abstractions Tax and Commission should be constrained to be used with the Trade abstraction only.

Scala offers one way of making this knowledge explicit at the type level .. using self-type annotations ..

trait Tax { this: Trade =>
  // refers to principal of trade
  def calculateTax = principal * 0.2
}
  
trait Commission { this: Trade =>
  // refers to principal of trade
  def calculateCommission = principal * 0.15
}


The traits are still decoupled. But using Scala's self type annotations you make it explicit that Tax and Commission are meant to be used *only* by mixing them with Trade.

val t = new Trade(..) with Tax with Commission
t.calculateTax
t.calculateCommission


Can I call this constraining the orthogonality of abstractions ? Tax and Commission provide orthogonal attributes to Trade optionally and publish their constraints explicitly in their definitions. It's not much of a difference from the earlier implementations. But I prefer to use this style to make abstractions closer to what the domain speaks.

Thursday, January 21, 2010

A new way to think of Data Storage for your Enterprise Application

A couple of posts earlier I had blogged about a real life case study of one of our projects where we are using a SQL store (Oracle) and a NoSQL store (MongoDB) in combination over a message based backbone. MongoDB was used to cater to a very specific subset of the application functionality, where we felt it made a better fit than a traditional RDBMS. This hybrid architecture of data organization is turning out to be an increasingly attractive option today with more and more specialized persistent storage structures being developed.

In many applications we need to process graph data structures. Neo4J can be a viable option for this. You can have your mainstream data storage still in an RDBMS and use Neo4J only for the subset of functionalities for which you need to use graph data structures. If you need to sync back to your main storage, use messaging as the transport to talk back to your relational database.

Multiple data storage use along with asynchronous messaging is one of the options that will looks very potent today. Drizzle has its entire replication based on a RabbitMQ based transport. And using AMQP messaging, Drizzle replicates data to a host of key/value stores like Voldemort, memcachedDB and Cassandra.

If we agree that messaging is going to be one of the most dominant paradigms in shaping application architectures, why not try to go one level up and look at some higher level abstractions for message based programming? Erlang programmers have been using the actor model for many years now and have demonstrated all the good qualities that the model imbibes. Inspired by Erlang, Scala also offers a similar model on the JVM. In an earlier post I had discussed how we can use the actor model in Scala to scale out messaging applications using a RabbitMQ storage.

Now with the developing ecosystem of polyglot storage, we can use the same model of actor based communication as the backbone for integrating multiple data storage options that you may plug in to your application. Have specific clients front end the storage that they need to work with and use messaging to sync that up with the main data storage backend to have a consistent system of record. Have a look at the following diagram that may not look that unreal today. You have a host of options that bring your data closer to the way you process them in your domain model, be it document oriented, graph based, key/value based or simple POJO based across a data grid like Terracotta.



When we have a bunch of architectural components loosely connected through messaging infrastructure, you can have a world of options managing interactions between them. In fact your options open up more when you get to interact with data shaped the way you would like to be. You now can think in terms of having a data model aligned with the model of your domain. You know once your rule base gets updated in Neo4J, it will somehow be synced up with the backend storage through some other service that will make it eventually consistent.

In a future post I will explore some of the options that a higher order middleware service like Akka can add to your stack. With Akka providing abstractions like transactors, pluggable persistence and out of the box integration modules for AMQP, there's a number of ways you can think of modularizing your application's domain model and storage. You can use peer to peer distributed actor based communication model that sets up synchronization options with your databases or you can use AMQP based transport to do the same much like what Drizzle does for replication. But that's some food for thought for yet another future post.

Sunday, January 10, 2010

A Case for Orthogonality in Design

In Learning, using and designing command paradigms, John M Carroll introduces the notion of lexical congruence. When you design a language, one of the things that you do is lexicalization of the domain. If we extrapolate the concept to software designs in general, we go through the same process with our domain model. We identify artifacts or lexemes and choose to name them appropriately so that the names are congruent with the semantics of the domain. This notion of lexical congruence is the essence of having a good mnemonics for your domain language. I found the reference to Carroll's work in Testing the principle of orthogonality in language design, which discusses the same issue of organizing your language around an optimal set of orthogonal semantic concepts. This blog post tries to relate the same concepts of orthogonality in designing domain models using the power that the newer languages of today offers.

The complexity of a modeling language depends on the number of lexemes, their congruence with the domain concepts being modeled and the number of ways you can combine them to form higher order lexemes. The more decoupled each of these lower level lexemes are, the easier they are to compose. When you have overlapping concepts being modeled as part of lexemes, the mixing is not easy. You need to squeeze in some special boundary conditions as part of composition logic. Making your concepts independent yet composable makes your design orthogonal.

The first time I came across the concept of orthogonality in design and consciously appreciated the power of unification that it brings on to your model, is through Andrei Alexandrescu's idea of policy based design that he evangelized in his book Modern C++ Design and in the making of the Loki library. You have orthogonal policies that are themselves reusable as independent abstractions. And at the same time you can use the language infrastructure to combine them when composing your higher order model. Consider this C++ example from Andrei's book ..

template
<
  class T,
  template <class> class CheckingPolicy,
  template <class> class ThreadingModel
>
class SmartPtr;


CheckingPolicy enforces constraints that need to be satisfied by the pointee object. The ThreadingModel abstraction defines the concurrency semantics. These two concerns are not related in any way between themselves. But you can use the power of C++ templates to plug in appropriate behaviors of these concerns when composing your own custom type of SmartPtr ..

template SmartPtr<Widget, NoChecking, SingleThreaded>
  WidgetPtr;


This is orthogonal design where you have a minimal set of lexemes to model otherwise unrelated concerns. And use the power of C++ templates to evolve a larger abstraction by composing them together. The policies themselves are independent and can be applied to construct arbitrary families of abstraction.

The crux of the idea is that you have m concepts that you can use with n types. There's no static relationship between the concepts and the types - that's what makes orthogonality an extensible concept. Consider Haskell typeclasses ..

class Eq a where 
  (==) :: a -> a -> Bool


The above typeclass defines the concept of equality. It's parameterized on the type and defines the constraint that the type as to define an equality operator in order to qualify itself as an instance of the Eq typeclass. The actual type is left open which gives the typeclass an unbounded extensibility.

For integers, we can do

instance Eq Integer where 
  x == y =  x `integerEq` y


For floats we can have ..

instance Eq Float where 
  x == y =  x `floatEq` y


We can define Eq even for any custom data type, even recursive types like Tree ..

instance (Eq a) => Eq (Tree a) where 
  Leaf a         == Leaf b          =  a == b
  (Branch l1 r1) == (Branch l2 r2)  =  (l1==l2) && (r1==r2)
  _              == _               =  False


Haskell typeclasses, like C++ templates help implement orthogonality in abstractions through a form of parametric polymorphism. Programming languages offer facilities to promote orthogonal modeling of abstractions. Of course the power varies depending on the power of abstraction that the language itself offers.

Let's consider a real world scenario. We have an abstraction named Address and modeled as a case class in Scala ..

case class Address(no: Int, street: String, 
                   city: String, state: String, zip: String)


There can be many contexts in which you would like to use the Address abstraction. Consider printing of labels for shipping that needs your address to be printed in some specific label format, as per the following trait ..

trait LabelMaker {
  def toLabel: String
}


Note that printing addresses in the form of labels is not one of the primary concerns of your Address abstraction. Hence it makes no sense to model it as one of the methods of the class. It's only required that in some situations we may need to use the Address to print itself in the form of a label as per the specification mandated by LabelMaker.

One other concern is sorting. You may need to have your addresses sorted based on zip code before submitting them to your Printer module for shpping. Sorting may be required in combination with label printing or as well as on its own - these two are orthogonal concerns that should never have any dependence amongst themselves within your abstraction.

Depending on your use case, you can decide to compose your Address abstraction as

case class Address(houseNo: Int, street: String, 
  city: String, state: String, zip: String)
  extends Ordered[Address] with LabelMaker {
  //..
}


which makes your Address abstraction statically coupled with the other two.

Or you may like to make the composition based on individual objects which would keep the base abstraction independent of any static coupling.

val a = new Address(..) with LabelMaker {
  override def toLabel = {
    //..
  }
}


As an alternative you can also choose to implement implicit conversions from Address using Scala views ..

object Address {
  implicit def AddressToLabelMaker(addr: Address) = new LabelMaker {
    def toLabel =
      "%d-%s, %s, %s-%s".format(
        addr.houseNo, addr.street, addr.city, addr.state, addr.zip)
  }
}


Whatever be the implementation, take note of the fact that we are not polluting the basic Address abstraction with the concerns that are orthogonal to it. Our model, which is the design language treats orthogonal concerns as separate lexemes and encourages ways to compose them non invasively by the user.

Sunday, January 03, 2010

Pragmatics of Impurity

James Hague, a long time Erlanger, drives home a point or two regarding purity of paradigms in a couple of his latest blog posts. Here's his take on being effective with pure functional languages ..

"My real position is this: 100% pure functional programing doesn't work. Even 98% pure functional programming doesn't work. But if the slider between functional purity and 1980s BASIC-style imperative messiness is kicked down a few notches--say to 85%--then it really does work. You get all the advantages of functional programming, but without the extreme mental effort and unmaintainability that increases as you get closer and closer to perfectly pure."

Purity is not necessarily pragmatic. In my last blog post I also tangentially touched upon the notion of purity while discussing how a *hybrid* model of SQL-NoSQL database stack can be effective for large application deployments. Be it with programming languages or with databases or any other paradigms of computation, we need to have the right balance of purity and pragmatism.

Clojure introduced transients. Rich Hickey says in the rationale .. "If a pure function mutates some local data in order to produce an immutable return value, is that ok?". Transients in Clojure allow localized mutation in initializing or transforming a large persistent data structure. This mutation will only be seen by the code that does the transformation - the client gets back a version for immutable use that can be shared. In no way does this invalidate the benefits that immutability brings in reasoning of Clojure programs. It's good to see Rich Hickey being flexible and pragmatic at the expense of injecting that little impurity into his creation.

Just like the little compromise (and big pragmatism) with the purity of persistent data structures, Clojure also made a similar compromise with laziness by introducing chunked sequences that optimize the overhead associated with lazy sequences. These are design decisions that have been taken consciously by the creator of the language that values pragmatism over purity.

Enough has already been said about the virtues of purity in functional languages. Believe me, 99% of the programming world does not even care for purity. They do what works best for them and hybrid languages are mostly the ones that find the sweetest spots. Clojure is as impure as Scala is, considering the fact that both allow side-effecting with mutable references and uncontrolled IO. Even Erlang has uncontrolled IO and a mutable process dictionary, though its use is often frowned upon within the community. The important point is that all of them have proved to be useful to programmers at large.

Why do creators infuse impurity into their languages ? Why aren't every language created as pure as Haskell is ? Well, it's mostly related to a larger thought that the language often targets to. Lisp started as an incarnation of the lambda calculus under the tutelage of John McCarthy and became the first significant language promoting the purely applicative model of programming without side-effects. Later on it added the impurities of mutation constructs based on the von Neumann architecture of the machines where Lisp was implemented. The obvious reason was to get an improved performance over purely functional constructs. Scala and Clojure both decided to go for the JVM as the primary runtime platform - hence both languages are susceptible to the pitfalls of impurity that JVM offers. Both of them decided to inherit all the impurities that Java has.

Consider the module system of Scala. You can compose modules using traits with deferred concrete definitions of types and objects. You can even compose mutually recursive modules using lazy vals, somewhat similar to what Newspeak and some dialects of ML offer. But because you have decided to bite the Java pill, you can also wreak havoc through shared mutable state at the top level object that you compose. In his post titled A Ban on Imports Gilad Bracha discusses all evil effects that an accessible global namespace can bring to the modularity aspects of your code. Newspeak is being designed as pure in this respect, with all dependencies being abstract and need to be plugged together explicitly as part of configuring the module. Scala is impure in this respect, allows imports to bring in the world on to your module definitions, but at the same time opens up all possibilities of sharing the huge ecosystem that the Java community has built over the years. You can rightfully choose to be pure in Scala, but that's not enforced by the language.

When we talk about impurity in languages, it's mostly related to how it handles side-effects and mutable state. And Haskell has a completely different take on this aspect than what we discussed with Lisp, Scala or Clojure. You have to use monads in Haskell towards any side-effecting operation. And people with a taste for finer things in life are absolutely fine with that. You cannot just stick in a printf to your program for debugging. You need to return the whole stuff within an IO monad and then do a print. The Haskell philosophy looks at a program as a model of mathematical functions where side-effects are also implemented in a functional way. This makes reasoning and optimization by the compiler much easier - you can make your pure Haskell code run as fast as C code. But you need to think differently. Pragmatic ? What do you think ?

Gilad Bracha is planning to implement pure subsets of Newspeak. It will be really exciting to get to see languages which are pure, functional (note: not purely functional) and object-oriented at the same time. He observes in his post that (t)he world is slowly digesting the idea that object-oriented and functional programming are not contradictory concepts. They are orthogonal, and can be arranged to be rather complementary. This is an interesting trend where we can see families of languages built around the same philosophy but differing in aspects of purity. You need to be pragmatic to choose and even mix them depending on your requirements.

Thursday, December 24, 2009

A case for hybrid SQL - NoSQL stack

Alex Popescu talks about Drizzle replication in his MyNoSql column. He makes a very interesting observation in his post regarding Drizzle's replication capabilities into a host of NoSQL storage backends ..

"Leaving aside the technical details — which are definitely interesting .., the solution using the Erlang AMQP .. implementation RabbitMQ .. — I think this replication layer could represent a good basis for SQL-NoSQL hybrid solutions".

We are going to see more and more of such hybrid solutions in the days to come. Drizzle does it at the product level. Using RabbitMQ as the transport, Drizzle can replicate data as serialized Java objects to Voldemort, as JSON marshalled objects to Memcached or as a hashmap to column family based Cassandra.

Custom implementation projects have also started using the hybrid stack of persistent stores. When you are dealing with real high volumes and patterns of access where you cannot use joins, anyway you need to denormalize. ORMs cease to be a part of your solution toolset. Data access patterns vary widely across the profile of clients using your system. If you are running an ecommerce suite and your product is launched you may have an explosive use of the shopping cart module. It makes every sense to move your shopping cart from the single relational data store where it was lying around and have it served through a more appropriate data store that lives up to the scalability requirements. It's not that you need to throw away your relational database that has served you so long. Like Alex mentioned, you can always go along with a hybrid model.

In one of our recent projects, we were using Oracle as the main relational database for a securities trading back office solution implementation. The database load was computed based on all calculations that were done initially. In a very late stage of the project a new requirement came up that needed heavy processing and storage of semi-structured data and meta-data from an external feed. Both the data and the meta-data were extensible which meant that it was difficult to model them with a fixed schema.

We could not afford frequent schema changes since it would entail long downtime of the production database. But there also was the requirement that after processing of these semi-structured data lots of them will have to be made available in the production database. We could have modeled it following the key/value paradigm in Oracle itself, which we were using anyway as the primary database. But that's again going down the age old saying of the hammer and nail story.

We decided to supplement the stack with another data store that fits the bill for this specific use case. We used MongoDB, that gave us phenomenal performance for our requirements. We were getting the feed from external data sources and loaded our MongoDB database with all the semi-structured data and meta-data. All necessary processing was done in MongoDB on those data and relevant information from MongoDB were pushed to JMS based queues for consumption by appropriate services that copied data asynchrnously to our Oracle servers.

What did we achieve with the above architecture ?

  1. Kept Oracle free to do what it does the best.

  2. Took away unnecessary load from production database servers.

  3. Introduced a document database for serving a requirement tailor made for its use - semi structured data, mainly reads, no constraints, no overhead of ORM ceremony. MongoDB supports a very clean programming model, a very decent query interface, simple to use and easy to convince your client.

  4. Used message based mapping to sync up data ASYNCHRONOUSLY between the nosql MongoDB and sql based Oracle. Each of the data stores were doing what they do the best, keeping us away from the blames of the hammer-nail paradigm.


With more and more of the nosql stores coming up, message based replication is going to play a very important role. Even within the nosql datastore, we are seeing choices of sql based storage backends being offered. Voldemort offers MySql as one of the storage backends - so the hybrid model starts right up there. It's always advisable to use multiple storage that fits your use case than trying to force-fit everything into a single paradigm.

Monday, December 14, 2009

No Comet, Hacking with WebSocket and Akka

Over the weekend I upgraded my Chrome to the developer channel and played around a bit with Web Sockets. It's really really exciting and for sure will make us think of Web applications in a different way. Web Sockets have been described as the "TCP for the Web" and will standardize bidirectional communication technology for Web applications.

The way we think today of push technology is mostly through Comet bsed implementations that use "long polling". The browser sends a request to the server and then waits for an event to happen and then sends a response. The client, on getting the response consumes it, closes the socket and does a new long polling connection to the server. The communication is not symmetric and connections can drop off and on between the client and the server. Also, the fact that Comet implementations are not portable across Web containers make it an additional headache to portability of applications.

Atmosphere offers a portable framework to implementing Ajax/Comet based applications for the masses. But still it does not make the underlying technology painless.

Web Sockets will make applications much more symmetric - instead of having long polling it's all symmetric push/pull. Once you get a Web Socket connection, you can exchange data between the browser and the server through send() method and an onmessage event handler.

I used this snippet to hack up a bidirectional communication channel with an Akka actor based service stack that I was already using in an application. I needed to hack up a very very experimental Web Socket server, which did not take much of an effort. The actors could do message exchange with the Web Socket and communicate further downstream to a CouchDB database. I already had all the goodness of transaction processing using Akka STM. But now I could get rid of much of the client code that looked like magic and replace it with something that's more symmetric from the application point of view. It just feels like a natural extension to socket based programming on the client. Also the fact that Javascript is ideally suited for an event driven programming model makes the client code much cleaner with the Web Socket API.

Talking about event based programming, we all saw the awesomeness that Ryan Dahl demonstrated a few weeks back with evented I/O in Node.js. Now people have already started developing experimental implementations of the Web Socket protocol for Node.js. And I noticed just now that Jetty also has a WebSocket server.

Tuesday, November 03, 2009

DSLs In Action - Updates on the Book Progress

DSLs In Action has been out for a month now in MEAP with the first 3 chapters being published. DSL is an emerging topic and I am getting quite some feedback from the readers who have already purchased the MEAP edition. Thanks for all the feedback.

Writing a book also has lots of similarities with coding. Particularly the refactoring part. The second attempt to articulate a piece of thought is almost always better than the first attempt, much like coding. You cannot imagine how many times I have discarded my first attempt and rewrote the stuff only to get a better feel of what I try to express for my readers.

Anyway, this post is not about book writing. Since the MEAP is out, I thought I should post a brief update on the progress since then .. here they are more as a list of bullet points :-

I delivered Chapter 4 and already got quite a few feedbacks on it. Chapter 4 starts the DSL Implementation section of the book. From here onwards expect lots of code snippets, zillions of implementation techniques that you may try out on your IDE. Chapter 4 deals with Implementation Patterns in Internal DSL. I discuss some of the common idioms that you will be using while implementing internal DSLs. As per the premise of the book, the snippets and idioms are in Ruby, Groovy, Clojure and Scala. It's more of a focus on the strengths of each of these languages that help you implement well designed DSLs. This chapter prepares the launching pad for some more extensive DSL implementations in Chapter 5. It's not there in MEAP yet, but here's the detailed ToC for Chapter 4.

Chapter 4 : Internal DSL Implementation Patterns

4.1. Building up your DSL Toolbox
4.2. Embedded DSL - Patterns in Meta-programming
4.2.1. Implicit Context and Smart APIs
4.2.2. Dynamic Decorators using Mixins
4.2.3. Hierarchical Structures using Builders
4.2.4. New Additions to your Toolbox
4.3. Embedded DSL - Patterns with Typed Abstractions
4.3.1. Higher Order Functions as Generic Abstractions
4.3.2. Explicit Type Constraints to model Domain logic
4.3.3. New Additions to your Toolbox
4.4. Generative DSL - Boilerplates for Runtime Generation
4.5. Generative DSL - One more tryst with Macros
4.6. Summary
4.7. Reference

Chapter 5 is also in the labs right now. But almost complete. I hope to post another update soon ..

Meanwhile .. Enjoy!