Ruminations of a Programmer: August 2009

Sunday, August 30, 2009

JSON Serialization for Scala Objects

SJSON : JSON Serialization for Scala Objects has just been published on the Github. It uses the awesome dispatch-json of Nathan Hamblen as the base JSON processor.

Here's the idea ..

I have a Scala object as ..

val addr = Address("Market Street", "San Francisco", "956871")

which I would like to store as JSON and retrieve as plain old Scala object. Here's the simple assertion that I would like to have as invariant ..

addr should equal(
  serializer.in[Address](serializer.out(addr)))

There are situations, particularly when writing generic libraries, when I don't know what class to serialize into. I can do that as well ..

serializer.in[AnyRef](serializer.out(addr))

or just as ..

serializer.in(serializer.out(addr))

What you get back from is a JsValue, an abstraction of the JSON object model. You can use extractors to get back individual attributes ..

val a = serializer.in[AnyRef](serializer.out(addr))

// use extractors
val c = 'city ? str
val c(_city) = a
_city should equal("San Francisco")

val s = 'street ? str
val s(_street) = a
_street should equal("Market Street")

val z = 'zip ? str
val z(_zip) = a
_zip should equal("956871")

Serialization of Embedded Objects

Suppose you have the following Scala classes .. Here Contact has an embedded Address Map ..

@BeanInfo
case class Contact(name: String, 
                   @JSONTypeHint(classOf[Address])
                   addresses: Map[String, Address]) {
  
  private [json] def this() = this(null, null)
  
  override def toString = "name = " + name + " addresses = " + 
    addresses.map(a => a._1 + ":" + a._2.toString).mkString(",")
}

@BeanInfo
case class Address(street: String, city: String, zip: String) {
  private [json] def this() = this(null, null, null)
  
  override def toString = "address = " + street + "/" + city + "/" + zip
}

With SJSON, I can do the following ..

val a1 = Address("Market Street", "San Francisco", "956871")
val a2 = Address("Monroe Street", "Denver", "80231")
val a3 = Address("North Street", "Atlanta", "987671")

val c = Contact("Bob", Map("residence" -> a1, "office" -> a2, "club" -> a3))
val co = serializer.out(c)

// with class specified
c should equal(serializer.in[Contact](co))

  // no class specified
val a = serializer.in[AnyRef](co)

// extract name
val n = 'name ? str
val n(_name) = a
"Bob" should equal(_name)

// extract addresses
val addrs = 'addresses ? obj
val addrs(_addresses) = a

// extract residence from addresses
val res = 'residence ? obj
val res(_raddr) = _addresses

// make an Address bean out of _raddr
val address = JsBean.fromJSON(_raddr, Some(classOf[Address]))
a1 should equal(address)

object r { def ># [T](f: JsF[T]) = f(a.asInstanceOf[JsValue]) }

// still better: chain 'em up
"Market Street" should equal(
  (r ># { ('addresses ? obj) andThen ('residence ? obj) andThen ('street ? str) }))

Feel free to fork, contribute and enjoy!

Sunday, August 16, 2009

5 Reasons why you should learn a new language NOW!

There have been quite a few murmers in the web sphere today regarding the ways Java programming paradigms have changed since its inception in the late 90s. A clear mandate and recommendation towards immutable abstractions, DSL like interfaces, actor based concurrency models indicate a positive movement towards a trend that nicely aligns with all the language research that has been going on in the community since quite some time. Language platforms are also improving by the day, efforts have been on for making the platforms a better host for multi-paradigm languages. Time is now for you to learn a new language - here are some of my thoughts of why you should invest in learning a new language of your choice .. NOW!

#1

Language barriers are going down - polyglot programming is on the way up. Two of the big enablers towards this movement are:

Middleware inter-operability using document formats like JSON. You can implement persistent actors in Scala or Java that use MongoDB or CouchDB as the storage of JSON documents, which interoperate nicely with your payment gateway system hosted on MochiWeb, developed on an Erlang stack.

Easier language inter-operability using DSLs. While you are on a specific platform like the Java Virtual Machine you can design better APIs in an alternative language that interoperates with the core language of your application. Here's how I got hooked on to Scala in an attempt to make my Java objects smarter and publish better APIs to my clients. Even Google, known for their selective set of languages to use in production applications, have been using s-expressions as an intermediate language expressed as a set of Scheme macros for their Android platform.

#2

Learning a different language helps you look at a problem in a different way. Maybe, the new way models your domain more expressively and succinctly. And you will need to write and maintain lesser amount of code in the new language. Once you're familiar with the paradigms of the new language, idiomatic code will look more expressive to you, and you will never complain about the snippet in defence of the average programmer. What you flaunt today as design patterns will come as natural idiomatic expressions in your new language - you will be programming at a higher level of abstraction.

#3

Playing on the strengths that the new language offers. Long back I blogged on Erlang becoming mainstream as a middleware language. You do not have to use Erlang for the chores of application development that you do in your day job. Nor you will have to be an Erlang expert to use Erlang based solutions like RabbitMQ or CouchDB. But look at the spurt of development that have been going on using the strengths of Erlang's concurrency, distribution and fault tolerance capabilities. As of today, Erlang is unmatched in this regard. And Erlang has the momentum both as a language and as the platform that delivers robust middlware. Learning Erlang will give you more insights into the platform's capabilities and will give you the edge to make a rational decision when your client asks you to select Webmachine as the REST based platform for your next Web application talking to the Riak datastore.

#4

The Java Virtual Machine is now the cynosure of performance optimization and language research. Initially being touted as the platform for hosting statically typed languages, the JVM is now adding capabilities to make itself a better host for dynamically typed languages as well. Anything that runs on the JVM is now a candidate for being integrated into your enterprise application architecture tomorrow. Learning a new JVM language will give you a head start. And it will safeguard your so long acquired Java expertise too. JRuby is a classic example. From a really humble beginning, JRuby today offers you the best of dynamic language capabilities by virtue of being a 100% compatible Ruby interpreter and a solid player in the JVM. JRuby looks to be the future of Ruby in the enterprise application space. Groovy has acquired the mindshare of lots of Java professionals by virtue of its solid integration with the Java platform. Clojure is bringing in the revival of Lisp on the JVM. And the list continues .. Amongst the statically typed ones, Scala is fast emerging as the next mainstream language for the JVM (after Java) and can match the performance of Java as of today. And the best part is that your erstwhile investment on Java will only continue to grow - you will be able to freely interoperate any of these languages with your Java application.

#5

This is my favorite. Learn a language for the fun of it. Learn something which is radically different from what you do in your day job. Maybe Factor, maybe some other concatenative language like Forth or Joy. Or Lua, that's coming up fast as a scripting language to extend your database or application. A couple of days ago I discovered JKat, a dynamically typed, stack-based (concatenative) language similar to Forth but implemented as an interpreter on top of the JVM. You can write neat DSLs and embed the JKat interpreter very much like Lua with your application. Indulge to the sinful feeling that programming in such languages offer - you will never regret it.

Monday, August 10, 2009

Static Typing gives you a head start, Tests help you finish

In one of my earlier posts (almost a year back) I had indicated how type driven modeling leads to succinct domain structures that inherit the following goodness :

Lesser amount of code to write, since the static types encapsulate lots of business constraints

Lesser amount of tests to write, since the compiler writes them implicitly for you

In a recent thread on Twitter, I had mentioned about a comment that Manuel Chakravarty made in one of the blog posts of Micheal Feathers ..

"Of course, strong type checking cannot replace a rigorous testing discipline, but it makes you more confident to take bigger steps."

The statement resonated my own feelings on static typing that I have been practising for quite some time now using Scala. Since the twitter thread became louder, Patrick Logan made an interesting comment in my blog on this very subject ..

This is interesting... it is a long way toward the kind of explanation I have been looking for re: "type-driven programming" with rich type systems as opposed to "test-driven programming" with dynamic languages.

I am still a big fan of the latter and do not fully comprehend the former.

I'd be interested in your "type development" process - without "tests" of some kind, the type system may validate the "type soundness" of your types, but how do you know they are the types you actually *want* to have proven sound?

and the conversation became somewhat longer where both of us were trying to look into the practices and subtleties that domain modeling with type constraints imply on the programmer. One of the points that Patrick raised was regarding the kind of tests that you would typically provide for a code like this.

Let me try to look into some of the real life coding that I have been using this practice on. When I have a code snippet like this ..

/**
 * A trade needs to have a Trading Account
 */
trait Trade {
  type T
  val account: T
  def valueOf: Unit
}

/**
 * An equity trade needs to have a Stock as the instrument
 */
trait EquityTrade extends Trade {
  override def valueOf {
    //.. calculate value
  }
}

/**
 * A fixed income trade needs to have a FixedIncome type of instrument
 */
trait FixedIncomeTrade extends Trade {
  override def valueOf {
    //.. calculate value
  }
}
//..
//..

/**
 * Accrued Interest is computed only for fixed income trades
 */
trait AccruedInterestCalculatorComponent {
  type T

  val acc: AccruedInterestCalculator
  trait AccruedInterestCalculator {
    def calculate(trade: T)
  }
}

I need to do validations and write up unit and functional tests to check ..

EquityTrade needs to work only on equity class of instruments

FixedIncomeTrade needs to work on fixed incomes only and not on any other instruments

For every method in the domain model that takes an instrument or trade, I need to check if the passed in instrument or trade is of the proper type and as well write unit tests that check the same. AccruedInterestCalculator takes a trade as an argument, which needs to be of type FixedIncomeTrade, since accrued interest is only meaningful for bond trades only. The method AccruedInterestCalculator#calculate() needs to do an explicit check for the trade type which makes me write unit tests as well for valid as well as invalid use cases.

Now let us introduce the type constraints that a statically typed language with a powerful type system offers.

trait Trade {
  type T <: Trading
  val account: T

  //..as above
}

trait EquityTrade extends Trade {
  type S <: Stock
  val equity: S

  //.. as above
}

trait FixedIncomeTrade extends Trade {
  type FI <: FixedIncome
  val fi: FI

  //.. as above
}
//..

The moment we add these type constraints our domain model becomes more expressive and implicitly constrained with a lot of business rules .. as for example ..

A Trade takes place on a Trading account only

An EquityTrade only deals with Stocks, while a FixedIncomeTrade deals exclusively with FixedIncome type of instruments

Consider this more expressive example that slaps the domain constraints right in front of you without them being buried within procedural code logic in the form of runtime checks. Note that in the following example, all the types and vals that were left abstract earlier are being instantiated while defining the concrete component. And you can only instantiate honoring the domain rules that you have defined earlier. How useful is that as a succinct way to write concise domain logic without having to write any unit test ?

object FixedIncomeTradeComponentRegistry extends TradingServiceComponentImpl
  with AccruedInterestCalculatorComponentImpl
  with TaxRuleComponentImpl {

  type T = FixedIncomeTrade
  val tax = new TaxRuleServiceImpl
  val trd = new TradingServiceImpl
  val acc = new AccruedInterestCalculatorImpl
}

Every wiring that you do above is statically checked for consistency - hence the FixedIncome component that you build will honor all the domain rules that you have stitched into it through explicit type constraints.

The good part is that these business rules will be enforced by the compiler itself, without me having to write any additional explicit check in the code base. And the compiler is also the testing tool - you will not be able to instantiate a FixedIncomeTrade with an instrument that is not a subtype of FixedIncome.

Then how do we test such type constrained domain abstractions ?

Rule #1: Type constraints are tested by the compiler. You cannot instantiate an inconsistent component that violates the constraints that you have incorporated in your domain abstractions.

Rule #2: You need to write tests for the business logic only that form the procedural part of your abstractions. Obviously! Types cannot be of much help there. But if you are using a statically typed language, get the maximum out of the abstractions that the type system offers. There are situations when you will discover repetitive procedural business logic with minor variations sprinkled across the code base. If you are working with a statically typed language, model them up into a type family. Your tests for that logic will be localized *only* within the type itself. This is true for dynamically typed languages as well. Where static typing gets the advantage is that all usages will be statically checked by the compiler. In a statically typed language, you think and model in "types". In a dynamically typed languages you think in terms of the messages that the abstrcation needs to handle.

Rule #3: But you need to create instances of your abstractions within the tests. How do you do that ? Very soon you will notice that the bulk of your tests are being polluted by complicated instantiations using concrete val or type injection. What I do usually is to use the generators that ScalaCheck offers. ScalaCheck offers a special generator, org.scalacheck.Arbitrary.arbitrary, which generates arbitrary values of any supported type. And once you have the generators in place, you can use them to write properties that do the necessary testing of the rest of your domain logic.

Sunday, August 02, 2009

MongoDB for Akka Persistence

Actors and message passing have been demonstrated to be great allies in implementing some of the specific use cases of concurrent applications. Message passing concurrency promotes loosely coupled application components, and hence has the natural side-effect of almost infinite scalability. But as Jonas Boner discusses in his JavaOne 2009 presentation, there are many examples in the real world today that have to deal with shared states, transactions and atomicity of operations. Software Transactional Memory provides a viable option towards these use cases, as has been implemented in Clojure and Haskell.

Akka, designed by Jonas Boner, offers Transactors, that combine the benefits of actors and STM, along with a pluggable storage model. It provides a unified set of data structures managed by the STM and backed by a variety of storage engines. It currently supports Cassandra as the storage model out of the box.

Over the weekend I was trying out MongoDB as yet another out of the box persistence options for Akka transactors. MongoDB is a high performance, schema free document oriented database that stores documents in the form of BSON, an enhanced version of JSON. The main storage abstraction is a Collection, which can loosely be equated to a table in a relational database. Besides support for replication, fault tolerance and sharding capabilities, the aspect which makes MongoDB much more easier to use is the rich querying facilities. It supports lots of built-in query capabilities with conditional operators, regular expressions and powerful variants of SQL where clauses on the document model .. Here are some examples of query filters ..

db.myCollection.find( { $where: "this.a > 3" });
db.myCollection.find( { "field" : { $gt: value1, $lt: value2 } } );  // value1 < field < value2

and useful convenience functions ..

db.students.find().limit(10).forEach( ... )  // limit the fetch count
db.students.find().skip(..) // skip some records

In Akka we can have a collection in MongoDB that can be used to store all transacted data keyed on a transaction id. The set of data can be stored in a HashMap as key-value pairs. Have a look at the following diagram for the scheme of data storage using MongoDB Collections ..

Akka TransactionalState offers APIs to publish the appropriate storage engines depending on the configuration ..

class TransactionalState {
  def newPersistentMap(
    config: PersistentStorageConfig): TransactionalMap[String, AnyRef] = 
    config match {
    case CassandraStorageConfig() => new CassandraPersistentTransactionalMap
    case MongoStorageConfig() => new MongoPersistentTransactionalMap
  }

  def newPersistentVector(
    config: PersistentStorageConfig): TransactionalVector[AnyRef] = 
    config match {
    //..
  }

  def newPersistentRef(
    config: PersistentStorageConfig): TransactionalRef[AnyRef] = 
    config match {
    //..
  }
  //..
}

and each transactional data structure defines the transaction semantics for the underlying structure that it encapsulates. For example, for a PersistentTransactionalMap we have the following APIs ..

abstract class PersistentTransactionalMap[K, V] extends TransactionalMap[K, V] {

  protected[kernel] val changeSet = new HashMap[K, V]

  def getRange(start: Int, count: Int)

  // ---- For Transactional ----
  override def begin = {}
  override def rollback = changeSet.clear

  //.. additional map semantics .. get, put etc.
}

A concrete implementation defines the rest of the semantics used to handle transactional data. The concrete implementation is parameterized with the actual storage engine that can be plugged in for specific implementations.

trait ConcretePersistentTransactionalMap extends PersistentTransactionalMap[String, AnyRef] {
  val storage: Storage
  
  override def getRange(start: Int, count: Int) = {
    verifyTransaction
    try {
      storage.getMapStorageRangeFor(uuid, start, count)
    } catch {
      case e: Exception => Nil
    }
  }

  // ---- For Transactional ----
  override def commit = {
    storage.insertMapStorageEntriesFor(uuid, changeSet.toList)
    changeSet.clear
  }

  override def contains(key: String): Boolean = {
    try {
      verifyTransaction
      storage.getMapStorageEntryFor(uuid, key).isDefined
    } catch {
      case e: Exception => false
    }
  }

  //.. others 
}

Note the use of abstract val in the above implementation that will be concretized when we make a Mongo map ..

class MongoPersistentTransactionalMap 
  extends ConcretePersistentTransactionalMap {
  val storage = MongoStorage
}

For the Storage part, we have another trait which abstracts the storage specific APIs ..

trait Storage extends Logging {
  def insertMapStorageEntriesFor(name: String, entries: List[Tuple2[String, AnyRef]])
  def removeMapStorageFor(name: String)
  def getMapStorageEntryFor(name: String, key: String): Option[AnyRef]
  def getMapStorageSizeFor(name: String): Int
  def getMapStorageFor(name: String): List[Tuple2[String, AnyRef]]
  def getMapStorageRangeFor(name: String, start: Int, 
    count: Int): List[Tuple2[String, AnyRef]]
}

I am in the process of implementing a concrete implementation of storage using MongoDB, which will look like the following ..

object MongoStorage extends Storage {
  val KEY = "key"
  val VALUE = "val"
  val db = new Mongo(..);  // needs to come from configuration
  val COLLECTION = "akka_coll"
  val coll = db.getCollection(COLLECTION)
  
  private[this] val serializer: Serializer = ScalaJSON
  
  override def insertMapStorageEntriesFor(name: String, entries: List[Tuple2[String, AnyRef]]) {
    import java.util.{Map, HashMap}
    val m: Map[String, AnyRef] = new HashMap
    for ((k, v) <- entries) {
      m.put(k, serializer.out(v))
    }
    coll.insert(new BasicDBObject().append(KEY, name).append(VALUE, m))
  }
  
  override def removeMapStorageFor(name: String) = {
    val q = new BasicDBObject
    q.put(KEY, name)
    coll.remove(q)
  }
  //.. others
}

As the diagram above illustrates, every transaction will have its own DBObject in the Mongo Collection, which will store a HashMap that contains the transacted data set. Using MongoDB's powerful query APIs we can always get to a specific key/value pair for a particular transaction as ..

// form the query object with the transaction id
val q = new BasicDBObject
q.put(KEY, name)

// 1. use the query object to get the DBObject (findOne)
// 2. extract the VALUE which has the HashMap of transacted data set
// 3. query on the HashMap on the passed in key to get the value
// 4. use the scala-json serializer to get back the Scala object
serializer.in(
  coll.findOne(q)
      .get(VALUE).asInstanceOf[JMap[String, AnyRef]]
      .get(key).asInstanceOf[Array[Byte]], None)

MongoDB looks a cool storage engine and has already been used in production as a performant key/value store. It looks promising to be used as the backup storage engine for persistent transactional actors as well. Akka transactors look poised to evolve as a platform that can deliver the goods for stateful STM based as well as stateless message passing based concurrent applications. I plan to complete the implementation in the near future and, if Jonas agrees will be more than willing to contribute to the Akka master.

Open source is as much about contributing, as it is about using ..