Sunday, July 05, 2009

Patterns in Internal DSL implementations

I have been thinking recently that classifying DSLs as Internal and External is too broadbased considering the multitude of architectural patterns that we come across various implementations. I guess the more interesting implementations are within the internal DSL genre, starting from plain old fluent interfaces mostly popularized by Martin Fowler down to the very sophisticated polymorphic embedding that has recently been demonstrated in Scala.

I like to use the term embedded more than internal, since it makes explicit the fact that the DSL piggybacks the infrastructure of an existing language (aka the host language of the DSL). This is the commonality part of all embedded DSLs. But DSLs are nothing more than well-designed abstractions expressive enough for the specific domain of use. On top of this commonality, internal DSL implementations also exhibit systematic variations in form, feature and architecture. The purpose of this post is to identify some of the explicit and interesting patterns that we find amongst the embedded DSL implementations of today.

Plain Old Smart APIs, Fluent Interfaces

Enough has been documented on this dominant idiom mostly used in the Java and C# community. Here's one of my recent favorites ..

ConcurrentMap<Key, Graph> graphs = new MapMaker()
  .concurrencyLevel(32)
  .softKeys()
  .weakValues()
  .expiration(30, TimeUnit.MINUTES)
  .makeComputingMap(
     new Function<Key, Graph>() {
       public Graph apply(Key key) {
         return createExpensiveGraph(key);
       }
     });


My good friend Sergio Bossa has recently implemented a cute DSL based on smart builders for messaging in Actorom ..

on(topology).send(EXPECTED_MESSAGE)
  .withTimeout(1, TimeUnit.SECONDS)
  .to(address);


Actorom is a full Java based actor implementation. Looks very promising - go check it out ..

Carefully implemented fluent interfaces using the builder pattern can be semantically sound and order preserving as well. You cannot invoke the chain elements out of sequence and come up with an inconsistent construction for your object.

Code generation using runtime meta-programming

We are seeing a great surge in mindshare in runtime meta-programming with the increased popularity of languages like Groovy and Ruby. Both these languages implement meta-object protocols that allow developers to manipulate meta-objects at runtime through techniques of method synthesis, method interception and runtime evals of code strings.

Code generation using compile time meta-programming

I am not going to talk about C pre-processor macros here. They are considered abominations compared to what Common Lisp macros have been offering since the 1960s. C++ offers techniques like Expression Templates that have been used successfully to generate code during compilation phase. Libraries like Blitz++ have been developed using these techniques through creation of parse trees of array expressions that are used to generate customized kernels for numerical computations.

But Lisp is the real granddaddy of compile time meta-programming. Uniform representation of code and data, expressions yielding values, syntactic macros with quasiquoting have made extension of Lisp language possible through user defined meta objects. Unlike C, C++ and Java, what Lisp does is to make the parser of the language available to the macros. So when you write macros in Common Lisp or Clojure, you have the full power of the extensible language at your disposal. And since Lisp programs are nothing but list structures, the parser is also simple enough.

The bottom line is that you can have a small surface syntax for your DSL and rely on the language infrastructure for generating the appropriate code during the pre-compilation phase. That way the runtime does not contain any of the meta-objects to be manipulated, which gives you an edge over performance compared to the Ruby / Groovy option.

Explicit AST manipulation using the Interpreter Pattern

This is yet another option that we find being used for DSL implementation. The design follows the Interpreter pattern of GOF and uses the host language infrastructure for creating and manipulating the abstract syntax tree (AST). Groovy and Ruby have now developed this infrastructure and support code generation through AST manipulation. Come to think of it, this is really the Greenspunning of Lisp, where you can program in the AST itself and use the host language parser to manipulate it. While in other languages, the AST is far away from the CST (concrete syntax tree) and you need the heavy-lifting of scanners and parsers to get the AST out of the CST.

Purely Embedded typed DSLs

Unlike pre-processor based code generation, pure embedding of DSLs are implemented in the form of libraries. Paul Hudak demonstrated this with Haskell way back in 1998, when he used the techniques of monadic interpreters, partial evaluation and staged programming to implement purely embedded DSLs that can be evolved incrementally over time. Of course when we talk about typed abstractions, the flexibility depends on how advanced type system you have. Haskell has one and offers functional abstractions based on its type system as the basis of implementation. Amongst today's languages, Scala offers an advanced type system and unlike Haskell has the goodness of a solid OO implementation to go along with its functional power. This has helped implementing Polymorphically Embeddable DSLs, a significant improvement over the capabilities that Hudak demonstrated with Haskell. Using features like Scala traits, virtual types, higher order generics and family polymorphism, it is possible to have multiple implementations of a DSL on top of a single surface syntax. This looks very promising and can open up ideas for implementing domain specific optimizations and interesting variations to coexist on the same syntax of the DSL.

Are there any interesting patterns of internal DSL implementations that are being used today ?

2 comments:

Anonymous said...

lisp does parsing for you in the sense it gives to a list of symbols but *well formedness* (like DTD) of that list is enitrely left upto the programmer. so lisp only implements half of the parsing stage leaving rest to the programmer. making DSL's real easy has never really been attempted in lisp just as in other languages.

Patrick Mahoney said...

Anonymous,
lisp provides you with the reading stage. And yes, manipulation of the list of symbols which is the code passed into the macro is left entirely up to the programmer. But because macros operate before runtime at macro-expansion time, this allows for lots of checking to occur, ie. like checking the -arity of the code passed into the macro. Thus, you can ensure that arguments to a macro are well-formed in some respects in addition to the runtime type checking that lisp normally provides. Add its ultra-uniform syntax, and I would say that this is exactly what makes lisp the most internally consistent language for expressing internal DSLs. The resulting DSL looks like lisp, granted-but this is a strength when the DSL is embedded, no? What sort of features would you be interested in seeing within Lisp to make DSLs easier? I would be interested in hearing your requirements. Thanks!