Sunday, July 12, 2009

DSL Composition techniques in Scala

One of the benefits of being on Twitter is the real time access to the collective thought streams of many great minds of our industry. Some time back, Paul Snively pointed to this paper on Polymorphic Embedding of DSLs in Scala. It discusses many advanced Scala idioms that you can implement while designing embedded DSLs. I picked up a couple of cool techniques on DSL composition using the power of Scala type system, which I could use in one of my implementations.

A big challenge with DSLs is composability. DSLs are mostly used in silos these days to solve specific problems in one particular domain. But within a single domain there are situations when you need to compose multiple DSLs to design modular systems. Languages like Scala and Haskell offer powerful type systems to achieve modular construction of abstractions. Using this power, you can embed domain specific types within the rich type systems offered by these languages. This post describes a cool example of DSL composition using Scala's type system. The example is a very much stripped down version of a real life scenario that computes the payroll of employees. It's not the richness of DSL construction that's the focus of this post. If you want to get a feel of the power of Scala to design internal and external DSLs, have a look at my earlier blog posts on the subject. Here the main focus is composition and reusability - how features like dependent method types and abstract types help compose your language implementations in Scala.

Consider this simple language interface for salary processing of employees ..

trait SalaryProcessing {
  // abstract type
  type Salary

  // declared type synonym
  type Tax = (Int, Int)

  // abstract domain operations
  def basic: BigDecimal
  def allowances: BigDecimal
  def tax: Tax
  def net(s: String): Salary
}

Salary is an abstract type, while Tax is desfined as a synonym of a Tuple2 for the tax components applicable for an employee. In real life, the APIs will be more detailed and will possibly take employee ids or employee objects to get the actual data out of the repository. But, once again, let's not creep about the DSL itself right now.

Here's a sample implementation of the above interface ..

trait SalaryComputation extends SalaryProcessing {
  type Salary = BigDecimal

  def basic = //..
  def allowances = //..
  def tax = //..

  private def factor(s: String) = {
    //.. some implementation logic
    //.. depending upon the employee id
  }

  def net(s: String) = {
    val (t1, t2) = tax

    // some logic to compute the net pay for employee
    basic + allowances - (t1 + t2 * factor(s))
  }
}

object salary extends SalaryComputation

Here's an implementation from the point of view of computation of the salary of an employee. The abstract type Salary has been concretized to BigDecimal which indicates the absolute amount that an employee makes as his net pay. Cool .. we can have multiple such implementations for various types of employees and contractors in the organization.

Irrespective of the number of implementations that we may have, the accounting process needs to record all of them in their books, where they would like to have all separate components of the salary separately from one single API. For this, we need to define a separate implementation for the accounting department with a different concrete type definition for Salary that separates the net pay and the tax part. Scala's abstract types allow this type definition overriding much like values. But the trick is to design the Accounting abstraction in such a way that it can be composed with all definitions of Salary that individual implementations of SalaryProcessing define. This means that any reference to Salary in the implementation of Accounting needs to refer to the same definition that the composed language uses.

Here's the definition of the Accounting trait that embeds the semantics of the other language that it composes with ..

trait Accounting extends SalaryProcessing {
  // abstract value
  val semantics: SalaryProcessing

  // define type to use the same semantics as the composed DSL
  type Salary = (semantics.Salary, semantics.Tax)

  def basic = semantics.basic
  def allowances = semantics.allowances
  def tax = semantics.tax

  // the accounting department needs both net and tax info
  def net(s: String) = {
    (semantics.net(s), tax)
  }
}

and here's how Accounting composes with SalaryComputation ..

object accounting extends Accounting {
  val semantics = salary
}

Now let's define the main program that processes the payroll for all the employees ..

def pay(semantics: SalaryProcessing,
  employees: List[String]): List[semantics.Salary] = {
  import semantics._
  employees map(net _)
}

The pay method accepts the semantics to be used for processing and returns a dependent type, which depends on the semantics passed. This is an experimental feature in Scala and needs to be used with the -Xexperimental flag of the compiler. This is an example where we publish just the right amount of constraints that's required for the return type. Also note the semantics of the import statement in Scala that's being used here. Firstly it's scoped within the method body. And also it imports only the members of an object that enbales us to use DSLish syntax for the methods on semantics, without explicit qualification.

Here's how we use the composed DSLs with the pay method ..

val employees = List(...)

// only SalaryComputation
println(pay(salary, employees))

// SalaryComputation composed with Accounting
println(pay(accounting, employees))

Sunday, July 05, 2009

Patterns in Internal DSL implementations

I have been thinking recently that classifying DSLs as Internal and External is too broadbased considering the multitude of architectural patterns that we come across various implementations. I guess the more interesting implementations are within the internal DSL genre, starting from plain old fluent interfaces mostly popularized by Martin Fowler down to the very sophisticated polymorphic embedding that has recently been demonstrated in Scala.

I like to use the term embedded more than internal, since it makes explicit the fact that the DSL piggybacks the infrastructure of an existing language (aka the host language of the DSL). This is the commonality part of all embedded DSLs. But DSLs are nothing more than well-designed abstractions expressive enough for the specific domain of use. On top of this commonality, internal DSL implementations also exhibit systematic variations in form, feature and architecture. The purpose of this post is to identify some of the explicit and interesting patterns that we find amongst the embedded DSL implementations of today.

Plain Old Smart APIs, Fluent Interfaces

Enough has been documented on this dominant idiom mostly used in the Java and C# community. Here's one of my recent favorites ..

ConcurrentMap<Key, Graph> graphs = new MapMaker()
  .concurrencyLevel(32)
  .softKeys()
  .weakValues()
  .expiration(30, TimeUnit.MINUTES)
  .makeComputingMap(
     new Function<Key, Graph>() {
       public Graph apply(Key key) {
         return createExpensiveGraph(key);
       }
     });


My good friend Sergio Bossa has recently implemented a cute DSL based on smart builders for messaging in Actorom ..

on(topology).send(EXPECTED_MESSAGE)
  .withTimeout(1, TimeUnit.SECONDS)
  .to(address);


Actorom is a full Java based actor implementation. Looks very promising - go check it out ..

Carefully implemented fluent interfaces using the builder pattern can be semantically sound and order preserving as well. You cannot invoke the chain elements out of sequence and come up with an inconsistent construction for your object.

Code generation using runtime meta-programming

We are seeing a great surge in mindshare in runtime meta-programming with the increased popularity of languages like Groovy and Ruby. Both these languages implement meta-object protocols that allow developers to manipulate meta-objects at runtime through techniques of method synthesis, method interception and runtime evals of code strings.

Code generation using compile time meta-programming

I am not going to talk about C pre-processor macros here. They are considered abominations compared to what Common Lisp macros have been offering since the 1960s. C++ offers techniques like Expression Templates that have been used successfully to generate code during compilation phase. Libraries like Blitz++ have been developed using these techniques through creation of parse trees of array expressions that are used to generate customized kernels for numerical computations.

But Lisp is the real granddaddy of compile time meta-programming. Uniform representation of code and data, expressions yielding values, syntactic macros with quasiquoting have made extension of Lisp language possible through user defined meta objects. Unlike C, C++ and Java, what Lisp does is to make the parser of the language available to the macros. So when you write macros in Common Lisp or Clojure, you have the full power of the extensible language at your disposal. And since Lisp programs are nothing but list structures, the parser is also simple enough.

The bottom line is that you can have a small surface syntax for your DSL and rely on the language infrastructure for generating the appropriate code during the pre-compilation phase. That way the runtime does not contain any of the meta-objects to be manipulated, which gives you an edge over performance compared to the Ruby / Groovy option.

Explicit AST manipulation using the Interpreter Pattern

This is yet another option that we find being used for DSL implementation. The design follows the Interpreter pattern of GOF and uses the host language infrastructure for creating and manipulating the abstract syntax tree (AST). Groovy and Ruby have now developed this infrastructure and support code generation through AST manipulation. Come to think of it, this is really the Greenspunning of Lisp, where you can program in the AST itself and use the host language parser to manipulate it. While in other languages, the AST is far away from the CST (concrete syntax tree) and you need the heavy-lifting of scanners and parsers to get the AST out of the CST.

Purely Embedded typed DSLs

Unlike pre-processor based code generation, pure embedding of DSLs are implemented in the form of libraries. Paul Hudak demonstrated this with Haskell way back in 1998, when he used the techniques of monadic interpreters, partial evaluation and staged programming to implement purely embedded DSLs that can be evolved incrementally over time. Of course when we talk about typed abstractions, the flexibility depends on how advanced type system you have. Haskell has one and offers functional abstractions based on its type system as the basis of implementation. Amongst today's languages, Scala offers an advanced type system and unlike Haskell has the goodness of a solid OO implementation to go along with its functional power. This has helped implementing Polymorphically Embeddable DSLs, a significant improvement over the capabilities that Hudak demonstrated with Haskell. Using features like Scala traits, virtual types, higher order generics and family polymorphism, it is possible to have multiple implementations of a DSL on top of a single surface syntax. This looks very promising and can open up ideas for implementing domain specific optimizations and interesting variations to coexist on the same syntax of the DSL.

Are there any interesting patterns of internal DSL implementations that are being used today ?

Sunday, June 21, 2009

scouchdb Views now interoperable with Scala Objects

In one of the mail exchanges that I had with Dick Wall before the scouchdb demonstration at JavaOne ScriptBowl, Dick asked me the following ..

"Can I return an actual car object instead of a string description? It would be killer if I can actually show some real car sale item objects coming back from the database instead of the string description."

Yes, Dick, you can, now. scouchdb now offers APIs for returning Scala objects directly from couchdb views. Here's an example with Dick's CarSaleItem object model ..

// CarSaleItem class
@BeanInfo
case class CarSaleItem(make : String, model : String, 
  price : BigDecimal, condition : String, color : String) {

  def this(make : String, model : String, 
    price : Int, condition : String, color : String) =
    this(make, model, BigDecimal.int2bigDecimal(price), condition, color)

  private [db] def this() = this(null, null, 0, null, null)

  override def toString = "A " + condition + " " + color + " " + 
    make + " " + model + " for $" + price
}


The following map function returns the car make as the key and the car price as the value ..

// map function
val redCarsPrice =
  """(doc: dispatch.json.JsValue) => {
        val (id, rev, car) = couch.json.JsBean.toBean(doc, 
          classOf[couch.db.CarSaleItem]);
        if (car.color.contains("Red")) List(List(car.make, car.price)) else Nil
  }"""


This is exciting. The following map function returns the car make as the key and the car object as the value ..

// map function
val redCars =
  """(doc: dispatch.json.JsValue) => {
        val (id, rev, car) = couch.json.JsBean.toBean(doc, 
          classOf[couch.db.CarSaleItem]);
        if (car.color.contains("Red")) List(List(car.make, car)) else Nil
  }"""


And now some regular view setup code that registers the views in the CouchDB design document.

// view definitions
val redCarsView = new View(redCars, null)
val redCarsPriceView = new View(redCarsPrice, null)

// handling design document stuff
val cv = DesignDocument("car_views", null, Map[String, View]())
cv.language = "scala"

val rcv = 
  DesignDocument(cv._id, null, 
    Map("red_cars" -> redCarsView, "red_cars_price" -> redCarsPriceView))
rcv.language = "scala"
couch(Doc(carDb, rcv._id) add rcv)


The following query returns JSON corresponding to the car objects being returned from the view ..

val ls1 = couch(carDb view(
  Views builder("car_views/red_cars") build))


On the client side, we can do a simple map over the collection that converts the returned collection into a collection of the specific class objects .. Here we have a collection of CarSaleItem objects ..

import dispatch.json.Js._;
val objs =
  ls1.map { car =>
    val x = Symbol("value") ? obj
    val x(x_) = car
    JsBean.toBean(x_, classOf[CarSaleItem])._3
  }
objs.size should equal(3)
objs.map(_.make).sort((e1, e2) => (e1 compareTo e2) < 0) 
  should equal(List("BMW", "Geo", "Honda"))


But it gets better than this .. we can now have direct Scala objects being fetched from the view query directly through scouchdb API ..

// ls1 is now a list of CarSaleItem objects
val ls1 = couch(carDb view(
  Views builder("car_views/red_cars") build, classOf[CarSaleItem]))
ls1.map(_.make).sort((e1, e2) => (e1 compareTo e2) < 0) 
  should equal(List("BMW", "Geo", "Honda"))


Note the class being passed as an additional parameter in the view API. Similar stuff is also being supported for views having reduce functions. This makes scouchdb more seamless for interoperability between JSON storage layer and object based application layer.

Have a look at the project home page and the associated test case for details ..

Thursday, June 18, 2009

Scala/Lift article available as a podcast

Myself and Steve Vinoski's article in IEEE Internet Computing (May/June issue) titled "Scala and Lift - Functional Recipes for the Web" is now available as a podcast. Here it goes ..

Thanks Steve, for the effort ..

Sunday, June 14, 2009

Code Reading for fun and profit

I still remember those days when APIs were not so well documented, and we didn't have the goodness that Javadocs bring us today. I was struggling to understand the APIs of the C++ Standard Library by going through the source code. Before that my only exposure to code reading was a big struggle to pile through reams of Cobol code that we were trying to migrate to the RDBMS based platforms. Code reading was not so enjoyable (at least to me) those days. Still I found it a more worthwhile exercise than trying to navigate through inconsistent pieces of crappy paperwork and half-assed diagrams that project managers passed on in the name of design documentation.

Exploratory Code Reading ..

C++ Standard library and Boost changed it all. C++ was considered to be macho enough those days, particularly if you can boast of your understandability of the template meta-programming that Andrei Alexandrescu first brought to the mainstream through his columns in C++ Report and his seemingly innocuously titled Modern C++ Design. Code reading became a pleasure to me, code understanding was more satisfying, particularly if you could reuse some of those code snippets in your own creations. It was the first taste of how dense C++ code could be, it was as if every sentence had some hidden idioms that you're trying to unravel. That was exploratory code reading - as if I was trying to explore the horizons of the language and its idioms as the experts documented with great care. I subscribed to the view that Code is the Design.

Collaborating with xUnit ..

Then came unit testing and the emergence of xUnit frameworks that proved to be the most complete determinants of the virtues of code reading. Code reading changed from being a passive learning vehicle to an active reification of thoughts. Just fire up your editor, load the unit testing framework and validate your understanding through testXXX() methods. It was then that I realized the wonders of code reading through collaboration with unit testing frameworks. It was as if you are doing pair programming with xUnit - together you and your xUnit framework are trying to understand the library that you're exploring. TDD was destined to be the next step, the only change being that instead of code understanding you're now into real world code writing.

Code Reading on the GO ..

Sometimes I enjoy reading code when I'm traveling or in a long commute. It's not painstaking, you do not have any specific agenda or you're not working against a strict timeline for the project. I found this habit very productive and in fact learnt quite a few tricks of the trade in some of these sessions. I still remember how I discovered the first instance of how to implement the Strategy pattern through Java enums browsing through Guice code in one of the flights to Portland.

Code Reading towards Polyglotism ..

When you're learning a new language, it helps a lot looking at existing programs in languages that you've been programming for long. And think how you could model it in the new language that you're learning. It's not a transliteration, often it results in knowing new idioms and lots of aha! moments as you explore through your learning process. This is one of the most invaluable side-effects of code reading - reading programs in language X makes you a better programmer in language Y. Stuart Halloway in his book on Clojure programming gives a couple of excellent examples of how thinking functionally while reading Java code makes you learn lots of idioms of the new paradigm.

Reading bad code ..

This is important too, since it makes you aware you of the anti-patterns of a language. It's a common misconception that using recursion in functional programs makes them more idiomatic. Recursion has its own problems, and explicit recursions are best hidden within the language offered combinators and libraries. Whenever you see explicit recursion in non trivial code snippets that can potentially get a large data set, think twice. You may be better off refactoring it some other way, particular when you have an underlying runtime that does not support tail call optimization. Code that do not read well, are not communicative to users. Code reading makes you aware of the importance of expressiveness, you realize that you'd not write code that you cannot read well.

Well, that was a drunken rant .. that I wrote as a side-effect in the midst of reading the Scala source for 2.8 Collections ..

Sunday, June 07, 2009

scouchdb Scala View Server gets "reduce"

scouchdb View Server gets reduce. After a fairly long hiatus, I finally got some time to do some hacking on scouchdb over the weekend. And this is what came out of a brief stint on Saturday evening ..

map was already supported in version 0.3. You could define map functions in Scala as ..

val mapfn = """(doc: dispatch.json.JsValue) => {
  val it = couch.json.JsBean.toBean(doc, classOf[couch.json.TestBeans.Item_1])._3;
  for (st <- it.prices)
    yield(List(it.item, st._2))
}"""


Now you can do reduce too ..

val redfn = """(key: List[(String, String)], values: List[dispatch.json.JsNumber], rereduce: Boolean) => {
  values.foldLeft(BigDecimal(0.00))
    ((s, f) => s + (match { case dispatch.json.JsNumber(n) => n }))
}"""


attach the map and reduce functions to a view ..

val view = new View(mapfn, redfn)


and finally fetch using the view query ..

val ls1 =
  couch(test view(
    Views.builder("big/big_lunch")
         .build))
ls1.size should equal(1)


reduce, by default returns only one row through a computation on the result set returned by map. The above query does not use grouping and returns 1 row as the result. You can also use view results grouping and return rows grouped by keys ..

val ls1 =
  couch(test view(
    Views.builder("big/big_lunch")
         .options(optionBuilder group(true) build) // with grouping
         .build))
ls1.size should equal(3)


For a more detailed discussion and examples have a look at the project home page documentation or browse through the test script ScalaViewServerSpec.

The current trunk is 0.3.1. The previous version has been tagged as 0.3 and available in tags folder.

Next up ..

  • JPA like collections of objects directly from scouchdb views

  • more capable reduce options (rereduce, collations etc.)

  • replication

  • advanced exception management with new dbDispatch


.. and lots of other features ..

Stay tuned!

Wednesday, June 03, 2009

scouchdb @ JavaOne

JavaOne script bowl was organized as a panel session to show off different scripting languages on the JVM. Tha languages considered were Jython, Groovy, Scala, JRuby and Clojure. As part of the Scala show, Dick Wall demonstrated scouchdb, the Scala driver for CouchDB. Cool .. and thanks Dick for choosing scouchdb ..

Alex Miller has more details here ..