Monday, February 14, 2011

Applicatives for composable JSON serialization in Scala

It has been quite some time I have decided to play around with sjson once again. For the convenience of those who are not familiar with sjson, it's a tiny JSON serialization library that can serialize and de-serialize Scala objects. sjson offers two ways in which you can serialize your Scala objects :-
  1. typeclass based serialization, where you define your own protocol (typeclass instances) for your own objects. The standard ones, of course come out of the box.
  2. reflection based serialization, where you provide a bunch of annotations and sjson looks up reflectively and tries to get your objects serialized and de-serialized.
One of the things which bothered me in both the implementations is the way errors are handled. Currently I use exceptions to report errors in serializing / de-serializing. Exceptions, as you know, are side-effects and don't compose. Hence even though your input JSON value has many keys that don't match with the names in your Scala class, errors are reported one by one.

scalaz is a Haskell like library for Scala that offers myriads of options towards pure functional programming. I have been playing around with Scalaz recently, particularly the typeclasses for Applicatives. I have also blogged on some of the compositional features that scalaz offers that help make your code much more declarative, concise and composable.

The meat of scalaz is based on the two most potent forces that Scala offers towards data type generic programming :-
  1. typeclass encoding using implicits and
  2. ability to abstract over higher kinded types (type constructor polymorphism)
Using these features scalaz has made lots of operations available to a large family of data structures, which were otherwise available only for a smaller subset in the Scala standard library. Another contribution of scalaz has been to make many of the useful abstractions first class in Scala e.g. Applicative, Monad, Traversable etc. All of these are available in Haskell as typeclass hierarchies - so now you can use the goodness of these abstractions in Scala as well.

One of the areas which I focused on in sjson using scalaz is to make error reporting composable. Have a look at the following snippet ..

// an immutable value object in Scala
case class Address(no: Int, street: String, city: String, zip: String)
 
// typeclass instance for sjson serialization protocol for Address
object AddressProtocol extends DefaultProtocol {
 
 implicit object AddressFormat extends Format[Address] {
   def reads(json: JsValue): ValidationNEL[String, Address] = json match {
     case m@JsObject(_) => 
       (field[Int]("no", m)        |@| 
        field[String]("street", m) |@| 
        field[String]("city", m)   |@| 
        field[String]("zip", m)) { Address }
 
     case _ => "JsObject expected".fail.liftFailNel
   }
 //..
}


In the current version of sjson, reads returns an Address. Now it returns an applicative, ValidationNEL[String, Address], which is a synonym for Validation[NonEmptyList[String], Address]. Validation is isomorphic to scala.Either in the sense that it has two separate types for error and success. But it has a much cleaner API and does not leave the choice to convention. In our case since we will be accumulating errors, we choose to use a List type for the error part. As a general implementation strategy, when Validation is used as an Applicative, the error type is modeled as a SemiGroup that offers an append operation. Have a look at scalaz for details of how you can use Validation as an applicative for cumulative error reporting.

Let's see what happens in the above snippet ..

1. field extracts the value the relevant field (passed as the first argument) from the JsObject. Incidentally JsObject is from Nathan Hamblen's dispatch-json, which sjson uses under the covers. More on dispatch-json's awesomeness later :). Here's how I define field .. Note if the name is not available, it gives us a Failure type on the Validation.

def field[T](name: String, js: JsValue)(implicit fjs: Reads[T]): ValidationNEL[String, T] = {
 val JsObject(m) = js
 m.get(JsString(name))
  .map(fromjson[T](_)(fjs))
  .getOrElse(("field " + name + " not found").fail.liftFailNel)
}


2. field invocations are composed using |@| combinator of scalaz, which gives us an ApplicativeBuilder that allows me to play around with the elements that it composes. In the above snippet we simply pass these components to build up an instance of the Address class.

Since Validation is an Applicative, all errors that come up during composition of field invocations get accumulated in the final list that occurs as the error type of it.

Let's first look at the normal usecase where things are happy and we get an instance of Address constructed from the parsed json. No surprises here ..

// test case
it ("should serialize an Address") {
 import Protocols._
 import AddressProtocol.// typeclass instances
 val a = Address(12, "Tamarac Square", "Denver", "80231")
 fromjson[Address](tojson(a)) should equal(a.success)
}


But what happens if there are some errors in the typeclass instance that you created ? Things start to get interesting from here ..

implicit object AddressFormat extends Format[Address] {
 def reads(json: JsValue): ValidationNEL[String, Address] = json match {
   case m@JsObject(_) => 
     (field[Int]("number", m) |@| 
      field[String]("stret", m) |@| 
      field[String]("City", m) |@| 
      field[String]("zip", m)) { Address }
 
   case _ => "JsObject expected".fail.liftFailNel
 }
 //..
}


Note that the keys in json as passed to field API do not match the field names in the Address class. Deserialization fails and we get a nice list of all errors reported as part of the Failure type ..

it ("address serialization should fail") {
  import Protocols._
  import IncorrectPersonProtocol._
  val a = Address(12, "Tamarac Square", "Denver", "80231")
  (fromjson[Person](tojson(p))).fail.toOption.get.list 
    should equal (List("field number not found", "field stret not found", "field City not found"))
}


Composability .. Again!

A layer of monads on top of your API makes your API composable with any other monad in the world. With sjson de-serialization returning a Validation, we can get better composability when writing complex serialization code like the following. Consider this JSON string from where we need to pick up fields selectively and make a Scala object ..

val jsonString = 
  """{
       "lastName" : "ghosh", 
       "firstName" : "debasish", 
       "age" : 40, 
       "address" : { "no" : 12, "street" : "Tamarac Square", "city" : "Denver", "zip" : "80231" }, 
       "phone" : { "no" : "3032144567", "ext" : 212 },
       "office" :
        {
          "name" : "anshinsoft",
          "address" : { "no" : 23, "street" : "Hampden Avenue", "city" : "Denver", "zip" : "80245" } 
        }
     }"""


We would like to cherry pick a few of the fields from here and create an instance of Contact class ..

case class Contact(lastName: String, firstName: String, 
  address: Address, officeCity: String, officeAddress: Address)


Try this with the usual approach as shown above and you will find some of the boilerplate repetitions within your implementation ..

import dispatch.json._
import Js._

val js = Js(jsonString) // js is a JsValue

(field[String]("lastName", js)    |@| 
 field[String]("firstName", js)   |@| 
 field[Address]("address", js)    |@| 
 field[String]("city", (('office ! obj) andThen ('address ? obj))(js)) |@|
 field[Address]((('office ! obj) andThen ('address ! obj)), js)) { Contact } should equal(c.success)


Have a look at this how we need to repeatedly pass around js, though we never modify it any time. Since our field API is monadic, we can compose all invocations of field together with a Reader monad. This is a very useful technique of API composition which I discussed in an earlier blog post. (Here is a trivia : How can we compose similar stuff when there's modification involved in the passed around state ? Hint: The answer is within the question itself :D)

But for that we need to make a small change in our field API. We need to make it curried .. Here are 2 variants of the curried field API ..

// curried version: for lookup of a String name
def field_c[T](name: String)(implicit fjs: Reads[T]) = { js: JsValue =>
  val JsObject(m) = js
  m.get(JsString(name)).map(fromjson[T](_)(fjs)).getOrElse(("field " + name + " not found").fail.liftFailNel)
}

// curried version: we need to get a complete JSON object out
def field_c[T](f: (JsValue => JsValue))(implicit fjs: Reads[T]) = { js: JsValue =>
  try {
    fromjson[T](f(js))(fjs)
  } catch {
    case e: Exception => e.getMessage.fail.liftFailNel
  }
}


Note how in the second variant of field_c, we use the extractors of dispatch-json to take out nested objects from a JsValue structure. We use it below to get the office address from within the parsed JSON.

And here's how we compose all lookups monadically and finally come up with the Contact instance ..

// reader monad
val contact =
  for {
    last    <- field_c[String]("lastName")
    first   <- field_c[String]("firstName")
    address <- field_c[Address]("address")
    office  <- field_c[Address]((('office ! obj) andThen ('address ! obj)))
  }
  yield(last |@| first |@| address |@| office)

// city needs to be parsed separately since we are working on part of js
val city = field_c[String]("city")

// compose everything and build a Contact
(contact(js) |@| city((('office ! obj) andThen ('address ? obj))(js))) { 
  (last, first, address, office, city) => 
    Contact(last, first, address, city, office) } should equal(c.success)


I am still toying around with some of the monadic implementations of sjson APIs. It's offered as a separate package and will make a nice addition to the API families that sjson offers. You can have a look at my github repo for more details. I plan to finalize soon before I get to 1.0.

8 comments:

Heiko Seeberger said...

Very descriptive example showing the power and usefulness of FP and scalaz.

You're saying "Since Validation is an Applicative, all errors that come up during composition of field invocations get accumulated ...". I don't think that's correct, because not every applicative will accumulate errors. In fact it is the special applicative that scalaz offers for Validations (which requires the failure type to have a semigroup) that does this special treatment.

Unknown said...

Hi Heiko -
Indeed applicatives are the most common abstraction that accumulates effects. This is because <*> keeps the structure of the computation fixed and just sequences the effects irrespective of the value returned by any of the computations. This is unlike monads, where the computation sequence is broken as soon as one of them fails. In case of a monad m, (>>=) :: m a -> (a -> m b) -> m b allows the value returned by one computation to influence the choice of another, quite unlike <*> of applicative. Have a look at section 5 of Conor McBride and Ross Paterson paper which introduces Applicatives.

In Haskell also we have the same use of applicatives for accumulating effects. The applicative version of the Parsec library uses applicatives to accumulate parse results. Just like we can do with scalaz.

Hence I think it's a common pattern in general to accumulate errors using applicatives.

Heiko Seeberger said...

I fully agree that collecting errors is a common use case for applicatives, but it is not a necessary consequence. The reason why I am that pedantic is that I was once mislead by that assumption, but that's probably just me.

Ittay Dror said...

Isn't Applicative about mapping regular functions inside a context? Here the function 'field' adds a context. It is not a String => T function, but String => Validation[String, T]. This sounds to me like a Monad, not an applicative

Unknown said...

Hi Ittay -

What happens here is that (M[A] |@| M[B]) or a chaining thereof of |@| returns an ApplicativeBuilder on which we apply a pure function, the constructor for Contact. Have a look at https://github.com/scalaz/scalaz/blob/master/core/src/main/scala/scalaz/MA.scala#L40 ..

andry said...

very good tutorial and can hopefully help me in building json in the application that I created for this lecture. thank you

anriz said...

I can't find the sjson-scalaz project in github. Do you have a link to it ? Thanks.

Unknown said...

@anriz - I have removed it for the time being. Will bring it back when I finish some of the changes on it. Also need to upgrade to the latest version of scalaz.