Monday, February 14, 2011

Applicatives for composable JSON serialization in Scala

It has been quite some time I have decided to play around with sjson once again. For the convenience of those who are not familiar with sjson, it's a tiny JSON serialization library that can serialize and de-serialize Scala objects. sjson offers two ways in which you can serialize your Scala objects :-
  1. typeclass based serialization, where you define your own protocol (typeclass instances) for your own objects. The standard ones, of course come out of the box.
  2. reflection based serialization, where you provide a bunch of annotations and sjson looks up reflectively and tries to get your objects serialized and de-serialized.
One of the things which bothered me in both the implementations is the way errors are handled. Currently I use exceptions to report errors in serializing / de-serializing. Exceptions, as you know, are side-effects and don't compose. Hence even though your input JSON value has many keys that don't match with the names in your Scala class, errors are reported one by one.

scalaz is a Haskell like library for Scala that offers myriads of options towards pure functional programming. I have been playing around with Scalaz recently, particularly the typeclasses for Applicatives. I have also blogged on some of the compositional features that scalaz offers that help make your code much more declarative, concise and composable.

The meat of scalaz is based on the two most potent forces that Scala offers towards data type generic programming :-
  1. typeclass encoding using implicits and
  2. ability to abstract over higher kinded types (type constructor polymorphism)
Using these features scalaz has made lots of operations available to a large family of data structures, which were otherwise available only for a smaller subset in the Scala standard library. Another contribution of scalaz has been to make many of the useful abstractions first class in Scala e.g. Applicative, Monad, Traversable etc. All of these are available in Haskell as typeclass hierarchies - so now you can use the goodness of these abstractions in Scala as well.

One of the areas which I focused on in sjson using scalaz is to make error reporting composable. Have a look at the following snippet ..

// an immutable value object in Scala
case class Address(no: Int, street: String, city: String, zip: String)
 
// typeclass instance for sjson serialization protocol for Address
object AddressProtocol extends DefaultProtocol {
 
 implicit object AddressFormat extends Format[Address] {
   def reads(json: JsValue): ValidationNEL[String, Address] = json match {
     case m@JsObject(_) => 
       (field[Int]("no", m)        |@| 
        field[String]("street", m) |@| 
        field[String]("city", m)   |@| 
        field[String]("zip", m)) { Address }
 
     case _ => "JsObject expected".fail.liftFailNel
   }
 //..
}


In the current version of sjson, reads returns an Address. Now it returns an applicative, ValidationNEL[String, Address], which is a synonym for Validation[NonEmptyList[String], Address]. Validation is isomorphic to scala.Either in the sense that it has two separate types for error and success. But it has a much cleaner API and does not leave the choice to convention. In our case since we will be accumulating errors, we choose to use a List type for the error part. As a general implementation strategy, when Validation is used as an Applicative, the error type is modeled as a SemiGroup that offers an append operation. Have a look at scalaz for details of how you can use Validation as an applicative for cumulative error reporting.

Let's see what happens in the above snippet ..

1. field extracts the value the relevant field (passed as the first argument) from the JsObject. Incidentally JsObject is from Nathan Hamblen's dispatch-json, which sjson uses under the covers. More on dispatch-json's awesomeness later :). Here's how I define field .. Note if the name is not available, it gives us a Failure type on the Validation.

def field[T](name: String, js: JsValue)(implicit fjs: Reads[T]): ValidationNEL[String, T] = {
 val JsObject(m) = js
 m.get(JsString(name))
  .map(fromjson[T](_)(fjs))
  .getOrElse(("field " + name + " not found").fail.liftFailNel)
}


2. field invocations are composed using |@| combinator of scalaz, which gives us an ApplicativeBuilder that allows me to play around with the elements that it composes. In the above snippet we simply pass these components to build up an instance of the Address class.

Since Validation is an Applicative, all errors that come up during composition of field invocations get accumulated in the final list that occurs as the error type of it.

Let's first look at the normal usecase where things are happy and we get an instance of Address constructed from the parsed json. No surprises here ..

// test case
it ("should serialize an Address") {
 import Protocols._
 import AddressProtocol.// typeclass instances
 val a = Address(12, "Tamarac Square", "Denver", "80231")
 fromjson[Address](tojson(a)) should equal(a.success)
}


But what happens if there are some errors in the typeclass instance that you created ? Things start to get interesting from here ..

implicit object AddressFormat extends Format[Address] {
 def reads(json: JsValue): ValidationNEL[String, Address] = json match {
   case m@JsObject(_) => 
     (field[Int]("number", m) |@| 
      field[String]("stret", m) |@| 
      field[String]("City", m) |@| 
      field[String]("zip", m)) { Address }
 
   case _ => "JsObject expected".fail.liftFailNel
 }
 //..
}


Note that the keys in json as passed to field API do not match the field names in the Address class. Deserialization fails and we get a nice list of all errors reported as part of the Failure type ..

it ("address serialization should fail") {
  import Protocols._
  import IncorrectPersonProtocol._
  val a = Address(12, "Tamarac Square", "Denver", "80231")
  (fromjson[Person](tojson(p))).fail.toOption.get.list 
    should equal (List("field number not found", "field stret not found", "field City not found"))
}


Composability .. Again!

A layer of monads on top of your API makes your API composable with any other monad in the world. With sjson de-serialization returning a Validation, we can get better composability when writing complex serialization code like the following. Consider this JSON string from where we need to pick up fields selectively and make a Scala object ..

val jsonString = 
  """{
       "lastName" : "ghosh", 
       "firstName" : "debasish", 
       "age" : 40, 
       "address" : { "no" : 12, "street" : "Tamarac Square", "city" : "Denver", "zip" : "80231" }, 
       "phone" : { "no" : "3032144567", "ext" : 212 },
       "office" :
        {
          "name" : "anshinsoft",
          "address" : { "no" : 23, "street" : "Hampden Avenue", "city" : "Denver", "zip" : "80245" } 
        }
     }"""


We would like to cherry pick a few of the fields from here and create an instance of Contact class ..

case class Contact(lastName: String, firstName: String, 
  address: Address, officeCity: String, officeAddress: Address)


Try this with the usual approach as shown above and you will find some of the boilerplate repetitions within your implementation ..

import dispatch.json._
import Js._

val js = Js(jsonString) // js is a JsValue

(field[String]("lastName", js)    |@| 
 field[String]("firstName", js)   |@| 
 field[Address]("address", js)    |@| 
 field[String]("city", (('office ! obj) andThen ('address ? obj))(js)) |@|
 field[Address]((('office ! obj) andThen ('address ! obj)), js)) { Contact } should equal(c.success)


Have a look at this how we need to repeatedly pass around js, though we never modify it any time. Since our field API is monadic, we can compose all invocations of field together with a Reader monad. This is a very useful technique of API composition which I discussed in an earlier blog post. (Here is a trivia : How can we compose similar stuff when there's modification involved in the passed around state ? Hint: The answer is within the question itself :D)

But for that we need to make a small change in our field API. We need to make it curried .. Here are 2 variants of the curried field API ..

// curried version: for lookup of a String name
def field_c[T](name: String)(implicit fjs: Reads[T]) = { js: JsValue =>
  val JsObject(m) = js
  m.get(JsString(name)).map(fromjson[T](_)(fjs)).getOrElse(("field " + name + " not found").fail.liftFailNel)
}

// curried version: we need to get a complete JSON object out
def field_c[T](f: (JsValue => JsValue))(implicit fjs: Reads[T]) = { js: JsValue =>
  try {
    fromjson[T](f(js))(fjs)
  } catch {
    case e: Exception => e.getMessage.fail.liftFailNel
  }
}


Note how in the second variant of field_c, we use the extractors of dispatch-json to take out nested objects from a JsValue structure. We use it below to get the office address from within the parsed JSON.

And here's how we compose all lookups monadically and finally come up with the Contact instance ..

// reader monad
val contact =
  for {
    last    <- field_c[String]("lastName")
    first   <- field_c[String]("firstName")
    address <- field_c[Address]("address")
    office  <- field_c[Address]((('office ! obj) andThen ('address ! obj)))
  }
  yield(last |@| first |@| address |@| office)

// city needs to be parsed separately since we are working on part of js
val city = field_c[String]("city")

// compose everything and build a Contact
(contact(js) |@| city((('office ! obj) andThen ('address ? obj))(js))) { 
  (last, first, address, office, city) => 
    Contact(last, first, address, city, office) } should equal(c.success)


I am still toying around with some of the monadic implementations of sjson APIs. It's offered as a separate package and will make a nice addition to the API families that sjson offers. You can have a look at my github repo for more details. I plan to finalize soon before I get to 1.0.

Monday, February 07, 2011

Why I made DSLs In Action polyglotic


Since I started writing DSLs In Action (buy here)*, a lot of my friends and readers asked me about my decision to make the book polyglotic. Indeed right from the word go, I had decided to treat the topic of DSL based design without a significant bias towards any specific language. Even today after the book has been published, many readers come up to me and ask the same question. I thought I would clarify my points on this subject in this blog post.

A DSL is a vehicle to speak the language of the domain on top of an implementation of a clean domain model. Whatever be the implementation language of the DSL, you need to make it speak the ubiquitous language of the domain. And by language I mean the syntax and the semantics that the experts of the particular domain are habituated to use.

A DSL is a facade of linguistic abstraction on top of your domain's semantic model. As a DSL designer it's your responsibility to make it as expressive to your users as possible. It starts with the mapping of the problem domain to the solution domain artifacts, converging on a common set of vocabulary for all the stakeholders of the implementation and finally getting into the nuts and bolts of how to implement the model and the language.

It's a known fact that there's NO programming language that can express ALL forms of abstraction in the most expressive way. So as a language designer you always have the flexibility to choose the implementation language based on your solution domain model. You make this choice as a compromise of all the forces that come up in any software development project. You have the timeline, the resident expertise within your team and other social factors to consider before you converge to the final set of languages. In short there's always a choice of the language(s) that you make just like any other software development effort.

Being Idiomatic

Same problem domains can be modeled in solution domain using radically different forms of abstraction. It depends on the language that you use and the power of abstraction that it offers. The same set of domain rules may be implemented using the type system of a statically typed language. While you need to use the power of meta-programming to implement the same concepts idiomatically in a dynamically typed language. Even within dynamic languages, idioms vary a lot. Clojure or the Lisp family offers compile time meta-programming in the form of macros to offer your users the specific custom syntactic structures that they need for the domain. While Ruby and Groovy do the same with runtime meta-programming.

Here's an example code snippet from my book of an internal DSL being used to register a security trade and computes its net cash value using the principal amount and the associated tax/fee components. It's implemented in Ruby using all of the Ruby idioms.

str = <<END_OF_STRING
new_trade 'T-12435' for account 'acc-123' to buy 100 shares of 'IBM',
                    at UnitPrice = 100
END_OF_STRING

TradeDSL.trade str do |t|
  CashValueCalculator.new(t).with TaxFee, BrokerCommission do |cv|
    t.cash_value = cv.value 
    t.principal = cv.p
    t.tax = cv.t
    t.commission = cv.c
  end
end


The above DSL has a different geometry than what you would get with the same domain concepts implemented using Clojure .. Have a look at the following snippet of a similar use case implemented in Clojure and executed from the Clojure REPL:

user> (def request {:ref-no "r-123", :account "a-123", :instrument "i-123", 
         :unit-price 20, :quantity 100})
#'user/request

user> (trade request)
{:ref-no "r-123", :account "a-123", :instrument "i-123", :principal 2000, :tax-fees {}}

user> (with-tax-fee trade
        (with-values :tax 12)
        (with-values :commission 23))
#'user/trade

user> (trade request)
{:ref-no "r-123", :account "a-123", :instrument "i-123", :principal 2000, 
  :tax-fees {:commission 460, :tax 240}}

user> (with-tax-fee trade
        (with-values :vat 12))
#'user/trade

user> (trade request)
{:ref-no "r-123", :account "a-123", :instrument "i-123", :principal 2000, 
  :tax-fees {:vat 240, :commission 460, :tax 240}}

user> (net-value (trade request))
2940


The above DSL is implemented using syntactic macros of Clojure for custom syntax building and standard functional programming idioms that the language supports.

In summary, we need to learn multiple languages in order to implement domain models idiomatically in each of them. DSLs In Action discusses all these ideas and contains lots and lots of implementations of real world use cases using Java, Scala, Clojure, Ruby and Groovy.

Hope this clears my intent of a polyglotic treatment of the subject in the book.


* Affiliate Link