Ruminations of a Programmer

Monday, April 23, 2007

Executable XML (aka Lisp)

In a project that I have been working on for quite some time, the back office system receives XML messages from the front and middle office systems for processing. It is a securities trading and settlement system for one of the big financial houses of the world - typical messages are trades, settlements, position etc. which reach the back-office after the trade is made. Like any sane architect we have designed the system based on the Java EE stack (nowadays you never get fired for choosing Java ..) centered around a message oriented middleware transporting XML messages with gay abandon. The system has gone live for many implementations and has been delivering satisfactory throughput all over.

No complaints whatsoever, on the architecture, on the Java EE backbone, on the multitudes of XML machinery that goes behind the engineering harness of the millions of messages generated every day. If I were to architect the same system today (the existing one had been architected 3 years back), I, possibly would have gone for a similar stack, just for the sheer stability and robustness of the XML based technology and the plethora of toolset that XML offers today.

Why am I writing this blog then ?

Possibly I have been having extra caffeine of late, which has been taking away most of my sleep at night. It is 1 AM in the morning and I am still glued to two of my newest possessions in my bookshelf.

I had read some parts of SICP long back - rereading it now is like a rediscovery of many of the Aha! moments that I had last time and of course, lots of ruminations and discoveries this time as well. Based on the new found lights of Lispy (and s-expy) knowledge, I hope that some day I will be able to infuse my today's dreams and rumblings into a real life architecture. I am not sure if we will ever reach the stage of human evolution when Lisp and Scheme will be considered the bricks and mortar of enterprise architecture. Till then they will exist as the sexy counterparts of Java and C++, and will continue to allure all developers who have once committed the sin of being there and done that.

The XML Message - Today's Brick and Mortar

Here is how a sample trade message (simplified for clarity) looks in our system :

<?xml version="1.0" encoding="utf-8" standalone="no"?>
<!DOCTYPE trd01 SYSTEM "trd01.dtd">
<trd01>
  <id>10234</id>
  <trade_type>equity</trade_type>
  <trade_date>2005-02-21T18:57:39</trade_date>
  <instrument>IBM</instrument>
  <value>10000</value>
  <trade_ccy>usd</trade_ccy>
  <settlement_info>
    <settle_date>2005-02-21T18:57:39</settle_date>
    <settle_ccy>usd</settle_ccy>
  </settlement_info>
</trd01>

We use XML parsers to parse this message, use all sorts of XPath expressions, XQuery and XSLT transformations to do all processing, querying and tearing apart the hierarchical structures that embody an XML message. The above XML message is *data* and we have tonnes of Java code processing the XML data, transforming them into business logic and persisting them in the database. So, we have the *code* (in Java) strictly separated from the *data* (in XML) using a small toolbox comprising of a bundle of XML parsers, XPath expressions, XSLT transformations, all packaged in a couple of dozens of third party jars (aka frameworks). The entire exercise is meant to make these *data* executable.

Executable Data - aka Lisp

Why not use a power that allows you to execute your data directly instead of creating unnecessary abstraction barriers in the name of OOP ? Steve Yeggey summarises it with elan :

The whole nasty "configuration" problem becomes incredibly more convenient in the Lisp world. No more stanza files, apache-config, .properties files, XML configuration files, Makefiles — all those lame, crappy, half-language creatures that you wish were executable, or at least loaded directly into your program without specialized processing. I know, I know — everyone raves about the power of separating your code and your data. That's because they're using languages that simply can't do a good job of representing data as code. But it's what you really want, or all the creepy half-languages wouldn't all evolve towards being Turing-complete, would they?

In Lisp, I can make the above data much more readable, clearer in intent, easier for the eyes and at the same time make it executable ..

(trd01
  (id 10234)
  (trade_type "equity")
  (trade_date "2005-02-21T18:57:39")
  (instrument "IBM")
  (value 10000)
  (trade_ccy "usd")
  (settle_info
    (settle_date "2005-02-24T18:57:39")
    (settle_ccy "usd"))))

Lisp is a language which was intended to be small with *no* syntax, where you have the power of macros to create your own syntax and roll it out into the language. I made each of the above tags separate Scheme functions (Oh! I was using Scheme btw), each one of which is capable of transforming itself into the desired functionality. As a result, the above data is also my code and directly executes. Another of those Aha! moments.

But my entire system is based on Java! Surely you are not telling me to change all of the guts to Scheme - are you ? My job will be at stake and I will never be able to convince my pointy haired boss that the trading back office system is running on Lisp. In fact, many people who dare to use Lisp in daytime projects are often careful to keep this a secret in the industry. And unless you have PG on your company board or blessed enough to get the favor of Y, this is a very useful tip.

Enter SISC

SISC is a lightweight, platform independent Scheme system targetting the Java Virtual Machine. It comes as a lightweight distribution (the core jar is 233 KB) and offers Scheme as a scripting language for Java. In SISC bridging is accomplished by a Java API for executing Scheme code and evaluating Scheme expressions, and a module that provides Scheme-level access to Java objects and implementation of Java interfaces in Scheme.

I can write Scheme modules and load it using Java api from my Java code, once I have bootstrapped the SISC runtime. Here is how I can initialize SISC from my Java application so as to enable my application use the Scheme functions :

// bootstrapping the SISC runtime
SeekableInputStream heap = new MemoryRandomAccessInputStream(
    getClass().getResourceAsStream("/sisc.shp"));
AppContext ctx = new AppContext();
ctx.addHeap(heap);
Interpreter interpreter = Context.enter(ctx);

and then I can use on the fly evaluation of Scheme functions as follows :

interpreter.eval("(load \"trd01.scm\")");
String s = interpreter.eval("(trd01 (id 10234) (trade_type "equity") ...)").toString();

There are quite a few variants of eval() that SISC offers, along with multiple modes of execution of Scheme code from the Java environment. For details have a look at their documentation. SISC also offers calling Java code from Scheme accessing Java classes through the extensible type system of SISC.

I do not dream about using SISC in any production code in the near foreseeable future. But just wanted to share my rants with all of you. In today's world, it is really raining programming languages - all scripting languages like Ruby, Python, JRuby, Groovy etc. are making inroads as the preferred glue language of today's enterprise architecture. But Lisp stands out as a solid robust language, with the exceptionally powerful code-as-data paradigm - I always felt Lisp was way ahead of its time. Possibly this is the time when Lisp needs to be reincarnated, all incompatibilities should be rubbed off the numerous versions and dialects of Lisp. Lisp code is executable data - it makes perfect sense to replace all frameworks that execute reams of code to process hierarchical structures as data by a single language.

Monday, April 16, 2007

Competition is healthy! Prototype Spring Bean Creation: Faster than ever before ..

In my last post I had mentioned about some performance benchmarks of Guice and Spring. In one of the applications which I had ported from Spring to Guice, I had an instance of a lookup-method injection, where a singleton service bean contained a prototype bean that needed to be looked up from the context. Here is the sample configuration XML :

<bean id="trade"
  class="org.dg.misc.Trade"
  scope="prototype">
</bean>

<bean id="abstractTradingService"
  class="org.dg.misc.AbstractTradingService"
  lazy-init="true">
  <lookup-method name="getTrade" bean="trade"/>
</bean>

I ran a performance benchmark suite to exercise 10,000 gets of the prototype bean :

BeanFactory factory = new XmlBeanFactory(
    new ClassPathResource("trade_context.xml"));
  
ITradingService ts = (ITradingService) factory.getBean(
    "abstractTradingService");
StopWatch stopWatch = new StopWatch();
stopWatch.start("lookupDemo");
    
for (int x = 0; x < 10000; x++) {
  ITrade trade = ts.getTrade();
  trade.calculateValue(null, null);
}
stopWatch.stop();
    
System.out.println("10000 gets took " + 
    stopWatch.getTotalTimeMillis() + " ms");

Spring 2.0.2 reported a timing of 359 milliseconds for the 10,000 gets. I performed the same exercise in Guice with a similar configuration :

public class TradeModule extends AbstractModule {
  @Override
  protected void configure() {
    bind(ITrade.class).to(Trade.class);
    bind(ITradingService.class).to(TradingService.class).in(Scopes.SINGLETON);
  }
}

and the corresponding test harness :

Injector injector = Guice.createInjector(new TradeModule());
  
long start = System.currentTimeMillis();
for(int i=0; i < 10000; ++i) {
  injector.getInstance(ITradingService.class).doTrade(injector.getInstance(ITrade.class));
}
long stop = System.currentTimeMillis();
System.out.println("10000 gets took " + (stop - start) + " ms");

For this exercise of 10,000 gets, Guice reported a timing of staggering 31 milliseconds.

Then a couple of days back Juergen Hoeller posted in the release news for Spring 2.0.4, that repeated creation of prototype bean instances has improved up to 12 times faster in this release. I decided to run the benchmark once again after a drop-in replacement of 2.0.2 jars by 2.0.4 ones. And voila ! Indeed there is a significant improvement in the figures. The same test harness now takes 109 milliseconds on the 2.0.4 jars. Looking at the changelog, you will notice several lineitems that have been addressed as part of improving bean instantiation timings.

This is what competition does even for the best .. Keep it up Spring guys ! Spring rocks !

Updated: Have a look at the comments by Bob and the followups for some more staggering benchmark results.

Monday, April 09, 2007

Guiced! Experience Porting a Spring Application to Guice

Yes, I got one of my Spring-Hibernate-JPA applications ported to use Guice for Dependency Injection. The application is a medium sized one and did not contain many of the corner features which have been discussed aggressively amongst the blogebrities. But now that I have the satisfaction of porting one complete application to Guice, I must say there are truly *lots of* goodies that this crazy bob creation has in it. Some of them I had mentioned in my earlier rants on Guice, many of them were hiccups, which came up primarily because I had only been a newbie with Guice - many of my questions and confusions were clarified by the experts, in my blog comments as well as in the Guice developer's mailing list. Guice is definitely an offering to look out for in the space of IoC containers. I liked what I saw in it .. in this post I will share some of my experiences.

Disclaimer : I will only focus on issues that I faced and solved in course of my porting of the application. The application did not have many blocker features for Guice - hence I do not claim that *all* applications can be ported completely using Guice 1.0.

Really Guicy!

Before I go into the details, here are some of the guicy attributes of Guice as a Java framework ..

The Java 5 usage - I always believe that backward compatibility is not a do-all end-all in a framework evolution. At some stage u need to educate the users as well, to migrate to newer versions and use the advanced features of your framework, which will make their applications more performant and maintainable. This has been one of my complaints against Java as well. It is really heartening to see Guice designers base their engine on Java 5 and use all advanced features like metadata and generics to the fullest. This has definitely made Guice more concise, precise and DRY.

Type-safety - This is possibly the loudest slogan of Guice as a DI container. It's Java generics all the way and although you can subvert the typesystem (more on this later) and hide some of your bindings from Guice, it is more of an exception. All api s in Guice are strongly typed, hence your application remains blessed with the safety of typed injections that Guice encourages.

Concise, minimal, well-designed api set with extremely verbose and explanatory error messages.

Let's Guice it up ..

Here it is. The application has been running happily in a Spring-Hibernate-JPA architecture. I took up the porting exercise purely out of academic interest and to get a first hand feel of trying to validate Guice against a real life non trivial application. I was somewhat aware of the nuances that I needed to figure out beforehand, and I classified my injection points into the following three groups :

points that I had complete control of and where I knew I would be able to inject my annotations

services with multiple implementations being used in the same application - luckily I did not have many such instances

third party POJOs that I could not invade into

The first ones were pretty cool and I happily added @Inject with appropriate bindings in the module. The injection points became very explicit and the class became more readable as far as external dependencies were concerned.

I did not have many occurences of multiple implementations of the same service being used in the same application deployment. As far as the application is concerned, we needed different implementations for different deployments, and hence I had different modules in place for them. In one of the cases, I needed to address the problem within the same instance of the application, which I did the usual way, using annotations like the following :

bind(IBarService.class)
  .to(BarService.class)
  .in(Scopes.SINGLETON);
  
bind(IBarService.class)
  .annotatedWith(Gold.class)
  .to(SpecialBarService.class)
  .in(Scopes.SINGLETON);

Injecting into third party POJOs is one of the issues that has been debated over a lot in the various blogs and forums. Here are some of the cases and how I addressed them in my application :

Case 1: Use Provider<> along with constructor injection : I used this pattern to address POJOs which we were using as part of another component and which used constructor injection. e.g.

public class ThirdPartyBeanProvider implements Provider<ThirdPartyBean> {
 
  final private IFooService fooService;
  final private IBarService barService;
 
  @Inject
  public ThirdPartyBeanProvider(final IFooService fooService, final IBarService barService) {
    this.fooService = fooService;
    this.barService = barService;
  }

  public ThirdPartyBean get() {
    return new ThirdPartyBean(fooService, barService);
  }
}

Case 2: These beans were using setter injection and I had some cases where multiple implementations of a service where being used in the same deployment of the application. Use Provider<> along with annotations to differentiate the multiple implementations of an interface. Luckily I did not have many of these cases, otherwise it would have been a bit troublesome with annotation explosion. But, at the same time, I think there may not be lots of use cases which need this in typical applications for a single deployment. e.g.

public class AnotherThirdPartyBeanProvider implements Provider<AnotherThirdPartyBean> {

  final private IFooService fooService;
  final private @Inject @Gold IBarService barService;
 
  @Inject
  public AnotherThirdPartyBeanProvider(final IFooService fooService, final IBarService barService) {
    this.fooService = fooService;
    this.barService = barService;
  }

  public AnotherThirdPartyBean get() {
    AnotherThirdPartyBean atb = new AnotherThirdPartyBean();
    atb.setFooService(fooService);
    atb.setBarService(barService);
    return atb;
  }
}

Case 3: Here I had lots of POJOs using setter injections that needed to be handled the same way. I would have to write lots of providers, but for this dynamic gem from Kevin which I dug up in a thread in the developer's mailing list. Here the type system is a bit subverted, and Guice does not have full information of all bindings. But, hey .. for porting applications, AutowiringProvider<> gave me a great way to solve this issue. Here's straight out of the class javadoc :

A provider which injects the instances it provides using an "auto-wiring" approach, rather than requiring {@link Inject @Inject} annotations. This provider requires a Class to be specified, which is the concrete type of the objects to be provided. It must be hand-instantiated by your {@link com.google.inject.Module}, or subclassed with an injectable constructor (often simply the default constructor).

And I used it like a charm to set up the bindings of my POJOs.

Finally, here is a snapshot of a representative Module class, with actual class names changed for demonstration purposes :

public class MyModule extends AbstractModule {

  @Override
  protected void configure() {
    bind(IFooService.class)
      .to(FooService.class)
      .in(Scopes.SINGLETON);
  
    bind(IBarService.class)
      .to(BarService.class)
      .in(Scopes.SINGLETON);
  
    bind(IBarService.class)
      .annotatedWith(Gold.class)
      .to(SpecialBarService.class)
      .in(Scopes.SINGLETON);
  
    bind(ThirdPartyBean.class)
      .toProvider(ThirdPartyBeanProvider.class);
  
    bind(YetAnotherThirdPartyBean.class)
      .toProvider(new AutowiringProvider<YetAnotherThirdPartyBean>(YetAnotherThirdPartyBean.class));
  
    bind(AnotherThirdPartyBean.class)
      .toProvider(AnotherThirdPartyBeanProvider.class);
  }
}

Injecting the EntityManager

In an implementation of the Repository pattern (a la Domain Driven Design), I was using JPA with Hibernate. I had an implementation of a JpaRepository, where I was injecting an EntityManager through the annotation @PersistenceContext. This was working with normal Java EE application servers where the container injects the appropriate instance of the EntityManager.

public class JpaRepository extends RepositoryImpl {
 
  @PersistenceContext
  private EntityManager em;
  // ..
  // ..
}

Spring also supports this annotation both at field and method level if a PersistenceAnnotationBeanPostProcessor is enabled. Using Guice I had to write a Provider<> to have this same functionality implemented in my Java SE application.

@Singleton
public class EntityManagerProvider implements Provider<EntityManager> {
 
  private static final EntityManagerFactory emf =
    Persistence.createEntityManagerFactory("GuiceJpaGettingStarted");
 
  public EntityManager get() {
    return emf.createEntityManager();
  }
}

The Guice Way

Guice is opinionated .. yes, it really is. And through this porting exercise I have learnt it. It encourages some practices and adds syntactic vinegars trying to subvert the recommendations. e.g. For injection, you either annotate with @Inject or write Providers. Provider<> is a wonderful tiny abstraction and it's amazing how powerful it can get in real life applications. Use Providers to implement custom instantiation policies, multiple injections per dependency and even can have custom scopes for injecting providers. For porting applications, you can use AutowiringProvider<>, but that's not really what Guice encourages a lot.

Guicy Performance

In the Spring based version, I had been using quite a few lookup-method-injections to design singleton services that have prototype beans injected. While porting, I didn't have to do anything special in Guice, apart from specifying the appropriate scopes during binding in modules. And these prototype beans were heavily instantiated within the application. I did some benchmarking and found that Guice proved to be much more performant than Spring for these use cases. In some cases I got 10 times better performance in injection and repeated instantiation of prototype beans within singleton services. I admit that Spring has much richer support for lifecycle methods and I would not venture into the hairy territory of trying to compare Spring with Guice. But if I would have to select an IoC container just for dependency injection, Guice will definitely be up there as a strong contender.

Friday, March 30, 2007

Making Classes Unit-Testable

I had been working on the code review of one of our Java projects, when the following snippet struck me as a definite smell in one of the POJOs :

class TradeValueCalculator {
  // ..
  // ..

  public BigDecimal calculateTradeValue(final Trade trade, ..) {
    // ..
    BigDecimal tax = TradeUtils.calculateTax(trade);
    BigDecimal commission = TradeUtils.calculateCommission(trade);
    // .. other business logic to compute net value
  }
  // ..
  // ..
}

What is the problem with the above two innocuous looking Java lines of code ? The answer is very simple - Unit Testability of the POJO class TradeValueCalculator ! Yes, this post is about unit testability and some tips that we can follow to design classes that can be easily unit tested. I encountered many of these problems while doing code review of a live Java project in recent times.

Avoid Statics

When it comes to testability, statics are definitely not your friends. In the above code snippet, the class TradeValueCalculator depends on the implementation of the static methods like TradeUtils.calculateTax(..) and TradeUtils.calculateCommission(..). Any change in these static methods can lead to failures of unit tests of class TradeValueCalculator. Hence statics introduce unwanted coupling between classes, thereby violating the principle of easy unit-testability of POJOs. Avoid them, if you can, and use standard design idioms like composition-with-dependency-injection instead. And while using composition with service components, make sure they are powered by interfaces. Interfaces provide the right level of abstraction for multiple implementations and are much easier to mock while testing. Let us refactor the above snippet to compose using service components for calculating tax and commission :

class TradeValueCalculator {

  // .. to be dependency injected
  private ITaxCalculator taxCalculator;
  private ICommissionCalculator commissionCalculator;

  // ..
  // ..

  public BigDecimal calculateTradeValue(final Trade trade, ..) {
    // ..
    BigDecimal tax = taxCalculator.calculateTax(trade);
    BigDecimal commission = commissionCalculator.calculateCommission(trade);
    // .. other business logic to compute net value
  }
  // ..
  // ..
}

interface ITaxCalculator {
  BigDecimal calculateTax(..);
}

interface ICommissionCalculator {
  BigDecimal calculateCommission(..);
}

We can then have concrete instances of these service contracts and inject them into the POJO TradeValueCalculator :

class DefaultTaxCalculator implements ITaxCalculator {
  // ..
}

class DefaultCommissionCalculator implements ICommissionCalculator {
  // ..
}

Using standard IoC containers like Guice or Spring, we can inject concrete implementations into our POJO non-invasively through configuration code. In Guice we can define Modules that bind interfaces to concrete implementations and use Java 5 annotation to inject those bindings in appropriate places.


// define module to configure bindings
class TradeModule extends AbstractModule {

  @Override
  protected void configure() {
  bind(ITaxCalculator .class)
      .to(DefaultTaxCalculator .class)
      .in(Scopes.SINGLETON);
  
    bind(ICommissionCalculator .class)
      .to(DefaultCommissionCalculator .class)
      .in(Scopes.SINGLETON);
  }
}

and then inject ..

class TradeValueCalculator {

  // .. 
  @Inject private ITaxCalculator taxCalculator;
  @Inject private ICommissionCalculator commissionCalculator;

  // ..
  // ..
}

How does this improve testability of our class TradeValueCalculator ?

Just replace the defined Module by another one for unit testing :

// define module to configure bindings
class TestTradeModule extends AbstractModule {

  @Override
  protected void configure() {
    bind(ITaxCalculator .class)
      .to(MockTaxCalculator .class)
      .in(Scopes.SINGLETON);
  
    bind(ICommissionCalculator .class)
      .to(MockCommissionCalculator .class)
      .in(Scopes.SINGLETON);
  }
}

What we have done just now is mocked out the service interfaces for tax and commission calculation. And that too without a single line of code being changed in the actual class! TradeValueCalculator can now be unit-tested without having any dependency on other classes.

Extreme Encapsulation

I have come across many abuses of FluentInterfaces, where developers use chained method invocations involving multiple classes. Take this example from this Mock Objects paper, which discusses this same problem :

dog.getBody().getTail().wag();

The problem here is that the main class Dog is indirectly coupled with multiple classes, thereby violating the Law of Demeter and making it totally unsuitable for unit testing. The situation is typically called "The Train Wreck" and has been discussed extensively in the said paper. The takeway from this situation is to minimize coupling with neighbouring classes - couple only with the class directly associated with you. Think in terms of abstracting the behavior *only* with respect to the class with which you collaborate directly - leave implementation of the rest of the behavior to the latter.

Privates also need to be Unit-Tested

There is a school of thought which espouses the policy that *only* public api s need to be unit-tested. This is not true - I firmly believe that all your methods and behaviors need unit testing. Strive to achieve the maximum coverage of unit testing in your classes. Roy Osherove thinks that we may have to bend some of the rules of pure OOD to make our design implementations more testable e.g. by exposing or replacing private instances of objects using interfaces, injection patterns, public setters etc. Or by discouraging default sealing of classes allowing overriding in unit tests. Or by allowing singletons to be replaced in tests to break dependencies. I think, I agree to many of these policies.

Fortunately Java provides a useful access specifier that comes in handy here - the package private scope of access. Instead of making your implementation members *private*, make them *package private* and implement unit test classes in the same package. Doing this, you do not expose the private parts to the public, while allowing access to all unit test classes. Crazy Bob has more details on this. Another useful trick to this may be usage of AOP. As part of unit test classes, you can introduce additional getters through AOP to access the implementation artifacts of your class. This can be done through inter-type declarations, and the test classes can access all private data at gay abandon.

Look out for Instantiations

There are many cases where the class that is being unit tested needs to create / instantiate objects of the collaborating class. e.g.

class TradeController {

  // ..
  // ..

  public void doTrade(TradeDTO dto, ..) {
    Trade trade = new Trade(dto);
    // .. logic for trade
  }
  // ..
}

Increase the testability of the class TradeController by separating out all creation into appropriate factory methods. These methods can then be overridden in test cases to inject creation of Mock objects.

class TradeController {
  TradeDTO dto;

  // ..
  // ..

  public void doTrade() {
    Trade trade = createTrade(dto);
    // .. logic for trade
  }
  // ..

  protected Trade createTrade(TradeDTO dto) {
    return new Trade(dto);
  }
}

and create MockTrade in test cases ..

class TradeControllerTest extends TestCase {

  // ..

  public void testTradeController(..) {
    TradeController tc = new TradeController() {
      protected Trade createTrade(TradeDTO dto) {
        return new MockTrade(dto);
      }
    }
    tc.doTrade();
  }
}

The Factory Method pattern proves quite helpful in such circumstances. However, there are some design patterns like The Abstract Factory, which can potentially introduce unwanted coupling between classes, thereby making them difficult to unit-test. Most of the design patterns in GOF are built on composition - try implementing them using Interfaces in Java, so that they can be easily mocked out. Another difficult pattern is the Singleton - I usually employ the IoC container to manage and unit-test classes that collaborate with Singletons. Apart from static methods, which I have already mentioned above, static members are also problematic cases for unit testing. In many applicatiosn they are used for caching (e.g. ORMs) - hence an obvious problem child for unit testing.

Wednesday, March 21, 2007

Using Guice as the DI Framework - some hiccups

Finally got the time to lay my hands on Guice, the new IoC container from Google. I ported one of my small applications based on Spring to Guice - nothing much too complicated, but just to explore the features of Guice and some user level comparison with Spring. I have been a long time Spring user (and an admirer too), hence the comparison is just an automatic and involuntary sideeffect of what I do with any of the other IoC containers on earth.

My first impression with Guice has been somewhat mixed. It is really refreshing to work with the extremly well-designed api's, packed with the power of generics and annotations from Java 5. Fluent interfaces like binder.bind(Service.class).annotatedWith(Blue.class).to(BlueService.class)
make ideal DSLs in Java and give you the feeling that you are programming to Guice rather than to Java. This is a similar feeling that you get when programming to Rails (as opposed to programming in Ruby). However, I came across some stumbling blocks which have been major irritants for the problem that I was trying to solve. Just to point out that I have only fiddled around with Guice for a couple of days, and may have missed out lots of details which can offer better solutions to the problems. Any advice, suggestions from the experts will be of great help. Here are some of the rants from my exercise with porting an existing application into Guice :

A Different way to look at Configuration

Many people have blogged about the onerous XML hell of Spring and how Guice gets rid of these annoyances. I think the main difference between Guice and Spring lies in the philosophy of how they both look at dependencies and configuration. Spring preaches the non-invasive approach (my favorite) and takes a completely externalized view towards object dependencies. In Spring, you can either wire up dependencies using XML or Spring JavaConfig or Groovy-Spring DSL or some other option like using Spring-annotations. But irrespective of the techniques you use, dependencies are always externalized :

@Configuration
public class MyConfig {
  @Bean
  public Person rod() {
    return new Person("Rod Johnson");
  }

  @Bean(scope = Scope.PROTOTYPE)
  public Book book() {
    Book book = new Book("Expert One-on-One J2EE Design and Development");
    book.setAuthor(rod());  // rod() method is actually a bean reference !
    return book;
  }
}

The above is an example from Rod Johnson's blog post - the class MyConfig is an externalized rendition of bean configurations. It uses Java 5 annotations to define beans and their scopes, but, at the end of the day, all it does is equivalent to spitting out the following XML :

<bean id="rod" class="Person" scope="singleton">
  <constructor-arg>Rod Johnson</constructor-arg>
</bean>

<bean id="book" class="Book" scope="prototype">
  <constructor-arg>Expert One-on-One J2EE Design and Development</constructor-arg>
  <property name="author" ref="rod"/>
</bean>

Guice, on the other hand, treats configuration as a first class citizen of your application model and allows them right into your domain model code. Guice modules indicate what to inject, while annotations indicate where to inject. You annotate the class itself with the injections (through @Inject annotation). The drawback (if you consider it to be one) is that you have to import com.google.inject.* within your domain model. But it ensures locality of intentions, explicit semantics of insitu injections through metadata programming.

// what to inject : a sample Module
public class TradeModule extends AbstractModule {
  protected void configure() {
    bind(Trade.class).to(TradeImpl.class);
    bind(Balance.class).to(BalanceImpl.class);
    bindConstant().annotatedWith(Bond.class).to("fixed income");
    bindConstant().annotatedWith(I.class).to(5);
  }
}

// where to inject : a sample domain class
public class TradingSystem {
  @Inject Trade trade;
  @Inject Balance balance;

  @Inject @Bond String tradeType;

  int settlementDays;

  @Inject
  void setSettlementDays(@I int settlementDays) {
    this.settlementDays = settlementDays;
  }
}

Personally I would like to have configurations separated from my domain code - Guice looked to be quite intrusive to me in this respect. Using Spring with XML based configuration allows a clean separation of configuration from your application codebase. If you do not like XML based configurations, use Spring JavaConfig, which restricts annotations to configuration classes only. Cool stuff.

Annotations! Annotations!

Guice is based on Java 5 annotations. As I mentioned above, where-to-inject is specified using annotations only. The plus with this approach is that the intention is explicit and locally specified, which leads to good maintenability of code. However, in some cases, people may jump into overdose of annotations. Custom annotations should be restricted to minimum and should be used *only* as the last resort. Guice provides a Provider<T> abstraction to deal with fine grained instantiation controls. In fact Provider<T> is an exceptionally simple abstraction, but can be used very meaningfully to implement lazy variants of many design patterns like Factory and Strategy. In my application I have used Provider<T> successfully in implementing a Strategy, which I initially implemented using custom annotations. Lots of custom annotations is a design smell - try refactoring your design using abstractions like Provider<T> to minimize them.

Problems with Provider<T>

However, I hit upon a roadblock while implementing some complex strategies using Provider<T>. In many cases, my Strategy needed access to contextual information in order to decide upon the exact concrete strategy to be instantiated. In the following example, I need different strategy instances of CalculationStrategy depending on the trade type.

interface CalculationStrategy {
  void calculate();
}

public class TradeValueCalculation {

  private CalculationStrategy strategy;
  private Trade trade;

  // need different instances of strategy depending on trade type
  public TradeValueCalculation(Trade trade, CalculationStrategy strategy) {
    this.trade = trade;
    this.strategy = strategy;
  }

  public void calculate() {
    strategy.calculate();
  }
}

I cannot use any custom annotation on constructor argument strategy, since I need the polymorphic behavior for different instances of the same class. I tried with the Provider<T> approach :

public class TradeValueCalculation {

  private Provider<CalculationStrategy> strategy;
  private Trade trade;

  @Inject
  public TradeValueCalculation(Trade trade, Provider<CalculationStrategy> strategy) {
    this.trade = trade;
    this.strategy = strategy;
  }

  public void calculate() {
    strategy.get().calculate();
  }
}

Still the Provider does not have the context information .. :-( and my problem is how to pass this information to the Provider. Any help will be appreciated ..

On the contrary, solving this in Spring is rather simple by declaring multiple bean configurations for the same class. And this works like a charm in the XML as well as the Java variant of configuring Spring beans.

Some other pain points ..

Guice user guide recommends using Provider<T> to inject into third party classes. This looked quite obtuse to me since it goes against the philosophy of less code that Guice preaches. Spring provides much elegant solutions to this problem because of its non-invasiveness property. Guice, by virtue of being an intrusive framework, had to provide this extra level of indirection to DI into classes for which I do not have the source code.

One specific irritant in Guice is the literal injection, which forced me to use a custom annotation everytime I wanted to inject a String literal.

Another feature which would have been very useful for me is the ability to override bindings through Module hierarchies. In one of my big applications, I have multiple components where I thought I can organize my Guice Modules in a similar hierarchy with specific bindings being overridden in specific modules. This is definitely the DRY approach towards binding implementations to interfaces. Guice did not allow me to do this - later I found the similar topic discussed in the development mailing list, where a patch is available for overriding bindings. I am yet to try it out though ..

It will be extremely helpful if some of the Guice experts address these issues and suggest workarounds. I like the terseness of the framework, the api's are indeed very intuitive and so are the error messages. The Javadocs are extremly informative, though we need more exhaustive documentation on best practices of Guice. Guice is really lightweight and is published to be very fast (I am yet to test on those benchmarks with Spring though). I hope the Google guys look into some of the pain points that early adopters have been facing with Guice ..

Wednesday, March 07, 2007

Programming Passion and the FizzBuzz Madness

Of late there have been lots of buzzing around with FizzBuzz in many of the programming blogs. It all started with Imran on Tech posing the following fizzbuzz question to interview developers who grok coding ..

Write a program that prints the numbers from 1 to 100. But for multiples of three print “Fizz” instead of the number and for the multiples of five print “Buzz”. For numbers which are multiples of both three and five print “FizzBuzz”.

The programming community took it up from there and started a deluge of self proclamation trying to establish the most efficient way of FizzBuzzing. Have a look at the comments section of all these blogs - passionate programmers have used all *official* languages to get their version of FizzBuzz going.

And the madness continues .. someone positions FizzBuzz as the the programming task of choice for discriminating hackers. And guess what ? One hacker responded with a Ruby code interpreting Prolog interpreting Lisp interpreting Lisp solving FizzBuzz.

The intent of the initial post was definitely to bring into light some of the concerns and issues that interviewers face hiring good programmers. The community responded otherwise, as Giles Bowkett aptly mentions ..

if you were absurdly cynical, you might expect programmers to start coding FizzBuzz in every language from Haskell to SQL, and even to start holding FizzBuzz coding contests.

Are programmers by nature too passionate ? I guess yes, and that is why we find such madness in the community when someone throws in some problem which has a programming smell. But is this passion justified ? Now, this is cynical, since you can never justify passion. But I think we would have been much more pragmatic focusing on the main problem that Imran on Tech had raised - the issue of hiring good programmers through a pragmatic process of interviewing. Is asking candidates to solve screening programming assignments the right way to judge a programmer in an interview ? We follow these practices rampantly, we eliminate candidates through FizzBuzz problems, yet many of the software projects fail because of bad programming and inefficient programmers. Isn't it time we step back and introspect ? Have a look at this heartfelt critique of the FizzBuzz screening question .. (reference from RaganWorld)

Monday, February 26, 2007

Rails Applications and Law of Demeter

Jay Fields talks about Law of Demeter violations in Rails programs and suggests a fix using the Ruby module Forwardable, which allows you implement delegation without too much littering in your codebase. Along the same line Jeffrey Allan Hardy introduces the class method delegate in Ruby on Rails to automate delegation tasks. Both of these techniques allow syntax extensibility to keep your code clean from the manual delegate methods.

My question is .. does this really fix the violation of the Law of Demeter in Rails applications ?

The basic philosophy behind the concepts of Adaptive Programming or the Law of Demeter is to provide a better separation between the behavior and object structure in OO programs. A typical Rails application is tied strongly to the underlying ActiveRecord model. The domain model for a Rails application follows the same structure as the database model - hence the navigation of the object graph is a replica of the underlying relational table structure. To put it bluntly, there is no domain object model, which abstracts the behavior of the system. The constructs like Forwardable or class method delegate offer syntactic sugars that may help in code refactoring, but do not add to the reusability of the model. In fact the principle of least knowledge that LoD preaches is violated the moment you make your behavioral model knowledgable about the underlying persistence structure.

I am not an expert in Ruby or Rails - I would like to know what others feel about this ..

Tuesday, February 20, 2007

Domain Driven Design : Use ORM backed Repository for Transparent Data Access

In my last post, I had discussed about having Repositories as a higher level of abstraction in domain driven design than vanilla DAOs. Many people have questioned about how the getOutstationEmployees() method in EmployeeRepositoryImpl could have been more performant using plain old SQL instead of having it abstracted behind the layers of DAOs and Repositories. Actually, the main idea behind the post was to establish the fact that Repositories are a more natural way to interface domain Aggregates with the underlying database than DAOs. Let us see why ..

The Data Access Object pattern evolved as part of the core J2EE design patterns as a means of handling bean managed persistence, where every business object was mapped to a DAO. And for relational databases, every DAO had a natural mapping to the database table. Hence the DAO pattern enforced a stronger coupling with the underlying database structure. This strategy encourages the Transaction Script pattern of modeling a domain, which is definitely not what DDD preaches.

Repository provides a more domain centric view of data access, where the client uses the Ubiquitous Language to access the data source. The DAOs, OTOH, provide a more database centric view, which is closer to the implementation than the domain.

Repositories provide controlled access to the underlying data in the sense that it exposes only the Aggregate roots of the model, which the clients should be using. When we model an Order domain entity, it makes sense to expose LineItems only in the context of the Order, and not as separate abstractions. Repositories are mapped to the Aggregate root level and ensure that the client gets a coherent view of the Order entity.

Using ORM solutions with DDD

DDD advocates a strong domain model and ORM encourages transparent persistence of domain objects. In case of an RDBMS backed application, both the paradigms aim towards decoupling the relational data layer from the object oriented domain layer. Hence it is more natural that they will complement each other when we think of scalable application architectures with a strong domain model. And when we consider the combination of DDD and ORM, the DAO paradigm looks deprecated, because the domain layer is no longer concerned about bean level persistence - it deals with Aggregate level persistence. We talk about persisting an Order as a whole, not individual LineItems. Hence we talk about data access and persistence in terms of Repositories, which is a higher level of abstraction than DAOs. And when we talk about transaction control, synchronization of a unit of work and transparent persistence of domain entities, naturally we think of Hibernate Sessions or JPA Entity Managers. So, the concept of a Repository fits this deal like a glove - suddenly you feel like programming at your domain level, relational tables are something that can be managed by the DBA.

Towards a Generic Repository Implementation

How would you like to design a Repository, which can participate in multiple implementations across various ORMs, exposing domain contracts in the Ubiquitous Language ? Clearly the design needs to be extensible on both sides -

On the abstraction side, we need to have extensibility for the domain. All domain repositories will be part of this hierarchy.

On the implementation side, we need to have extensibility for multiple implementations, e.g. JPA, Hibernate etc.

Think Bridge design pattern, which allows us to decouple an abstraction from its implementation so that the two can vary independently.

On the abstraction side we have

public interface IRepository<T> {
  List<T> read(String query, Object[] params);
}

and a base class, which delegates to the implementation ..

public class Repository<T> implements IRepository<T> {
 
  private RepositoryImpl repositoryImpl;
 
  public List<T> read(String query, Object[] params) {
    return repositoryImpl.read(query, params);
  }

  public void setRepositoryImpl(RepositoryImpl repositoryImpl) {
    this.repositoryImpl = repositoryImpl;
  }
}

On the implementation side of the Bridge, we have the following base class

public abstract class RepositoryImpl {
  public abstract <T> List<T> read(String query, Object[] params);
}

and an implementation based on JPA ..

public class JpaRepository extends RepositoryImpl {

  // to be injected through DI in Spring
  private EntityManagerFactory factory;

  @Override
  public <T> List<T> read(String query, Object[] params) {
    JpaTemplate jpa = new JpaTemplate(factory);

    if (params == null) {
      params = ArrayUtils.EMPTY_OBJECT_ARRAY;
  }

    try {
      @SuppressWarnings("unchecked")
      List<T> res = jpa.executeFind(new GenericJpaCallback(query, params));
      return res;
    } catch (org.springframework.dao.DataAccessException e) {
      throw new DataAccessException(e);
    }
  }
}

Similarly we can have a Hibernate based implementation ..

public class HibernateRepository extends RepositoryImpl {
  @Override
  public <T> List<T> read(String query, Object[] params) {
    // .. hibernate based implementation
  }
}

But the client can work based on the contract side of the Bridge, with the implementation being injected through Spring ..
Here's a sample client repository contract based on the domain model of Richardson in POJOs in Action ..

public interface IRestaurantRepository {
  List<Restaurant> restaurantsByName(final String name);
  List<Restaurant> restaurantsByStreetName(final String streetName);
  List<Restaurant> restaurantsByEntreeName(final String entreeName);
  List<Restaurant> restaurantsServingVegEntreesOnly();
}

for the aggregate root Restaurant having the following model :

@Entity
public class Restaurant {
  /**
   * The id.
   */
  @Id
  @GeneratedValue(strategy = GenerationType.AUTO)
  private long id;

  /**
   * The name.
   */
  private String name;

  /**
   * The {@link Address}.
   */
  @OneToOne(cascade = CascadeType.ALL)
  private Address address;

  /**
   * Set of {@link Entree}.
   */
  @ManyToMany
  @JoinTable(inverseJoinColumns = @JoinColumn(name = "ENTREE_ID"))
  private Set<Entree> entrees;

  // all getters and setters removed for clarity
}

It uses JPA annotations for the object relational mapping. Note the one-to-one relationship with Address table and the many-to-many relationship with Entree. Clearly, here, Restaurant is the Aggregate root, with Entree and Address being part of the domain entity. Hence the repository is designed from the Aggregate root, and exposes the collections of Restaurants based on various criteria.

Provide an implementation of IRestaurantRepository using the abstraction side of the Bridge ..

public class RestaurantRepository extends Repository<Restaurant> 
  implements IRestaurantRepository {
  public List<Restaurant> restaurantsByEntreeName(String entreeName) {
    Object[] params = new Object[1];
    params[0] = entreeName;
    return read(
      "select r from Restaurant r where r.entrees.name like ?1", 
      params);
  }
  // .. other methods implemented
}

finally the Spring beans configuration, that injects the implementation ..

<bean id="repoImpl"
  class="org.dg.inf.persistence.impl.jpa.JpaRepository">
  <property name="factory" ref="entityManagerFactory"/>
   </bean>

   <bean id="restaurantRepository"
  class="org.dg.repository.impl.jpa.RestaurantRepository"
  lazy-init="true">
  <property name="repositoryImpl" ref="repoImpl"/>
   </bean>

Here is the complete implementation model of the Repository pattern :

Note how the above design of RestaurantRepository provides access to the Aggregate root Restaurant as collections without exposing the other entities like Entree and Address. Clearly the Repository pattern, if implemented properly, can actually hide the underlying database structure from the user. Here RestaurantRepository actually deals with 3 tables, but the client is blissfully unaware of any of them. And the underlying ORM makes it even more transparent with all the machinery of automatic synchronization and session management. This would not have been possible with the programming model of DAOs, which map naturally 1:1 with the underlying table. This is what I mean when I say that Repositories allow us to program the domain at a higher level of abstraction.

Monday, February 12, 2007

Domain Driven Design - Inject Repositories, not DAOs in Domain Entities

There are some discussions in Spring forum, of late, regarding injection of repositories in the domain objects. And in the context of the data access layer, there appears to be some confusion regarding the difference between DAOs and repositories. A data access object(DAO) is the contract between the relational database and the OO application. A DAO belongs to the data layer of the application, which encapsulates the internals of CRUD operations from the Java application being developed by the user using OO paradigms. In the absence of an ORM framework, the DAO handles the impedance mismatch that a relational database has with OO techniques.

A Look at the Domain Model

In the domain model, say, I have an entity named Organization, which needs to access the database to determine statistics regarding its employees ..

class Organization {

  private String name;
  private Address corporateOffice;
  // .. other members

  public long getEmployeeCount() {
    // .. implementation
  }

  public List<Employee> getOutstationEmployees() {
    // .. access Employee table and Address table in database
    // .. and find out employees who are in a diff city than corporate office
  }
}

and I need access to the employee details deep down in my database to get the above details mandated by the Organization entity. A typical EmployeeDao will look like the following :

public interface EmployeeDao {
  List<Employee> getAllEmployees();
  Employee getEmployeeById(long id);
  List<Employee> getEmployeesByAddress(Address address);
  List<Employee> getEmployeesByName(String name);
  // .. other methods
}

Using this DAO definition and an appropriate implementation, we can have the method getOutstationEmployees() as follows :

@Configurable("organization")
class Organization {

  // implementation needs to be injected
  private EmployeeDAO employeeDao;

  // ..
  // ..

  public List<Employee> getOutstationEmployees() {
    List<Employee> emps = employeeDao.getEmployeesByAddress(corporateOffice);
    List<Employee> allEmps = employeeDao.getAllEmployees();
    return CollectionUtils.minus(allEmps, emps);
  }
  // ..
}

Note the usage of the Spring annotation @Configurable to ensure dependency injection of the employeeDao into the domain object during instantiation. But how clean is this model ?

Distilling the Domain Model

The main problem with the above model is that we have lots of unrelated concerns polluting the domain. In his book on Domain Driven Design, Eric Evans says in the context of managing the sanctity of the domain model :

An object should be distilled until nothing remains that does not relate to its meaning or support its role in interactions

In the above design, the code snippet that prepares the list of outstation employees contains lots of logic which deals with list manipulations and data fetching from the database, which do not appear to belong naturally to the domain abstraction for modeling an Organization. This detailed logic should be part of some other abstraction which is closer to the data layer.

This is the ideal candidate for being part of the EmployeeRepository, which is a separate abstraction that interacts with the data accessors (here, the DAOs) and provides "business interfaces" to the domain model. Here we will have one repository for the entire Employee aggregate. An Employee class may collaborate with other classes like Address, Office etc., forming the entire Aggregate, as suggested by Eric in DDD. And it is the responsibility of this single Repository to work with all necessary DAOs and provide all data access services to the domain model in the language which the domain understands. So the domain model remains decoupled from the details of preparing collections from the data layer.

public interface EmployeeRepository {
  List<Employee> getOutstationEmployees(Address address);
  // .. other business contracts
}

public class EmployeeRepositoryImpl implements EmployeeRepository {

  private EmployeeDAO employeeDao;

  public List<Employee> getOutstationEmployees(Address address) {
    List<Employee> emps = employeeDao.getEmployeesByAddress(corporateOffice);
    List<Employee> allEmps = employeeDao.getAllEmployees();
    return CollectionUtils.minus(allEmps, emps);
  }
}

The main differences between the Repository and the DAO are that :

The DAO is at a lower level of abstraction than the Repository and can contain plumbing codes to pull out data from the database. We have one DAO per database table, but one repository per domain type or aggregate.

The contracts provided by the Repository are purely "domain centric" and speak the same domain language.

Repositories are Domain Artifacts

Repositories speak the Ubiquitous Language of the domain. Hence we contracts which the repositories publish must belong to the domain model. OTOH the implementation of a Repository will conatin many plumbing codes for accessing DAOs and their table specific methods. Hence it is recommended that the pure domain model should depend *only* on the Repository interfaces. Martin Fowler recommends the Separated Interface pattern for this.

Injecting the Repository instead of the DAO results in a much cleaner domain model :

@Configurable("organization")
class Organization {
  private String name;
  private Address corporateOffice;
  // .. other members

  // implementation needs to be injected
  private EmployeeRepository employeeRepo;

  // ..
  // ..

  public List<Employee> getOutstationEmployees() {
    return employeeRepo.getOutstationEmployees(corporateOffice);
  }
  // ..
}

In the next part, I will have a look at the Repository implementations while using an ORM framework like Hibernate. We do not need DAOs there, since the object graph will have transparent persistence functionality backed up by the ORM engine. But that is the subject for another discussion, another day ..

Monday, January 29, 2007

Thinking Differently with Design Patterns, Java and Accidental Complexity

Bob Lee has posted a very useful tip on performant singletons. Have a look if you are writing concurrent Java applications for the enterprise. From plain old synchronization to DCL to IODH - I guess this pattern implementation has come a full circle in Java. If you are into Java, follow this idiom .. possibly this is the best you can get for a fast, thread-safe, lazily loaded singleton with JLS guarantee.

This post is not about singletons, although we start with the code for implementing the same pattern based on post Java 5 specifications :

public class Singleton {
  static class SingletonHolder {
    static Singleton instance = new Singleton();
  }

  public static Singleton getInstance() {
    return SingletonHolder.instance;
  }
}

Everytime you need a singleton in your application, make use of the above idiom. Unless you are coding a trivial application, very soon you will feel the spiralling cost of the growing number of classes. This is a basic problem of many of the Java idioms and design patterns when we try to force functional programming paradigms through nested classes or anonymous inner classes. This is, what many refer to as accidental complexity in modeling, which, very often, tends to overshadow the domain complexity, thereby resulting in lots of glue codes.

I am not in the league to snub Java. Myself, I am a Java programmer and have been doing OO with Java and C++ for the last 10 years. The GOF design patterns book has been my bible and all my thoughts have, so far, been soaked in the practices and principles that the book professes. With Ruby and Lisp, I have started to think about programming a bit differently. And as Alan Perlis has epigrammed, "A language that doesn’t affect the way you think about programming, is not worth knowing".

Singleton Pattern Elsewhere

require 'singleton'
class Foo
  include Singleton
end

That's it ! Ruby's powerful mixin functionality automatically makes our class a singleton - the new method is rendered private and we get an instance method for getting the object.

Another new generation language Scala offers the object keyword for implementing singletons.

object SensorReader extends SubjectObserver {
  // ..
}

In this example the declaration for SensorReader creates a singleton class that can have a single instance.

Design Patterns in Java and Accidental Complexity

As a language, Java does not offer powerful functional abstractions that Ruby, Scala or Lisp provides. Hence many design patterns which look invisible in these languages stand out as elaborate design constructs in Java. These add to the *accidental complexity* in a Java application codebase and often turns out more difficult to manage than the *actual complexity*, which is the complexity of the domain that you are trying to model. Technologies like aspects and metadata based annotations are attempts to improve the abstraction level of the Java programming language. Unfortunately these can never give the programmer that seamlessness in extending the syntax of the core language. The programmer will never be able to program bottom up or carve out a DSL as elegant as Rails using Java. Norvig has an excellent presentation on how dynamic languages make many of the GOF patterns invisible within them. The presentation illustrates how macros can make the implementation of Interpreter design pattern easier, method combinations can make Observers seamless and multi-methods can ease the implementation of Builder design pattern. Mark Dominus has posted a very thought provoking essay that concludes that patterns are signs of weakness in programming languages. What he means is that, languages where we need to write repetitive code to implement solutions to recurring problems lack in the abstraction power. The very fact that we have to repeat the code for implementing the Strategy design pattern for every instance of application of the pattern in Java, implies that the language lacks the extensibility to imbibe the design construct within itself. And by doing so, the implementation inherits lots of *accidental complexity* or yellow marker as part of the codebase. OTOH, in a typical functional implementation, the strategy is a simple variable whose value is a function, and with first class functions, the pattern is invisible.

The singleton pattern implementation in Java, despite providing a performant solution, also contributes to this accidental complexity of the codebase. This is more true for many other pattern implementations in Java or C++.

Thinking Differently

I cannot imagine myself writing about lack of abstractions in OO languages had I not been exposed to Ruby, Scala or Lisp. I realize the truth in the 19th epigram of Alan Perlis - these languages have really affected my thinking on programming at large. Now I can appreciate why Steve Yeggey thinks design patterns [in Java] as mostly pounding star shaped pegs into square holes.

Thursday, January 18, 2007

Syntax Extensibility, Ruby Metaprogramming and Lisp Macros

Over the last few days I have been feeling a bit Lispy. It's not that I have been immersed in Lisp programming, I still do Java for my day job and enjoy the process of staring at reams of standard object oriented api calls and the big gigantic frameworks that provide the glue code for the enterprise software. Java is still my favorite programming language, I still enjoy writing Java and have been recently working on bigger commitments to write more Java with more Spring and more Hibernate.

The only difference is that I have started reading Paul Graham's On Lisp again !

I am convinced that I will not be programming production level business applications in Lisp in the near foreseeable future. But reading Lisp makes me think differently, the moment I start writing event listeners in Java Swing, I start missing true lexical closures, I look forward to higher level functions in the language. Boilerplates irritate me much more and make me imagine how I could have modeled it better using Scheme macros. True, I have been using the best IDE and leave it to its code generation engine to generate all boilerplates, I have also put forth a bit of an MDA within my development environment that generates much of the codes from the model. I am a big fan of AOP and have been using aspects for quite some time to modularize my designs and generate write-behind logic through the magic of weaving bytecodes.

The difference, once again, is that, I have been exposed to the best code generator of all times, the one with simple uniform syntax having access to the whole language parser, that gets the piece of source code in a single uniform data structure and knows how to munch out the desired transformation in a fail-safe manner day in and day out - the Lisp macro.

Abstractions - Object Orientation versus Syntax Construction

For someone obsessed with OO paradigm, thriving on the backbones of objects, virtual functions and polymorphism, I have learnt to model abstractions in terms of objects and classes (the kingdom of nouns). I define classes on top of the Java language infrastructure, add data members as attributes, add behavior to the abstractions through methods defined within the classes that operate on the attributes and whenever need be, I invoke the methods on an instantiated class object. This is the way I have, so far, learnt to add abstraction to an application layer. Abstraction, as they say, is an artifact of the solution domain, which should ultimately bring you closer to the problem domain. We have :

Machine Language -> High Level language -> Abstractions in the Solution Domain -> Problem Domain

In case of object oriented languages like Java, the size of the language is monstrous, add to that at least a couple of gigantic frameworks, and abstractions are clear guests on top of the language layer. Lisp, in its original incarnation, was conceived as a language with very little syntax. It was designed as a programmable programming language, and developing abstractions in Lisp, not only enriches the third block above, but a significant part of the second block as well. I now get what Paul Graham has been talking about programming-bottom-up, the extensible language, build-the-language-up-toward-your-program.

Take this example :

I want to implement dolist(), which effects an operation on each member of a list. With a Lisp implementation, we can have a natural extension of the language through a macro

dolist (x '(1 2 3)) (print x) (if (evenp x) (return)))

and the moment we define the macro, it blends into the language syntax like a charm. This is abstraction through syntax construction.

And, the Java counterpart will be something like :

// ..
Collection<..> list = ... ;
CollectionUtils.dolist(list, 
    new Predicate() {
      public boolean evaluate() {
        // ..
      }
    });
// ..

which provides an object oriented abstraction of the same functionality. This solution provides the necessary abstraction, but is definitely not as seamless an extension of the language as its Lisp counterpart.

Extending Extensibility with Metaprogramming

Metaprogramming is the art of writing programs which write programs. Languages which offer syntax extensibility provide the normal paths to metaprogramming. And Java is a complete zero in this regard. C offers more trouble to programmers through its whacky macros, while C++'s template metaprogramming facilities are no less hazardous than pure black magic.

Ruby offers excellent metaprogramming facilities through its eval() family of methods, the here-docs, open classes, blocks and procedures. Ruby is a language with very clean syntax, having the natural elegance of Lisp and extremely powerful metaprogramming facilities. Ruby metaprogramming capabilities have given a new dimension to the concept of api design in applications. Have a look at this example from a sample Rails application :

class Product < ActiveRecord::Base
  validates_presence_of :title, :description, :image_url
  validates_format_of :image_url,
    :with => %r{^http:.+\.(gif|jpg|png)$}i,
    :message => "must be a URL for a GIF, JPG, or PNG image"
end

class LineItem < ActiveRecord::Base
  belongs_to :product
end

It's really cool DSL made possible through syntax extension capabilities offered by Ruby. It's not much of OO that Rails exploits to offer great api s, instead it's the ability of Ruby to define new syntactic constructs through first class symbols that add to the joy of programming.

How will the above LineItem definition look in Lisp's database bindings ? Let's take this hypothetical model :

(defmodel <line_item> ()
    (belongs_to <product>))

The difference with the above Rails definition is the use of macros in the Lisp version as opposed to class functions in Rails. In the Rails definition, belongs_to is a class function, which when called defines a bunch of member functions in the class LineItem. Note that this is a commonly used idiom in Ruby metaprogramming where we can define methods in the derived class right from the base class. But the main point here is that in the Lisp version, the macros are replaced in the macro expansion phase before the program runs and hence provides an obvious improvement in performance compared to its Rails counterpart.

Another great Lispy plus ..

Have a look at the following metaprogramming snippet in Ruby, incarnated using class_eval for generating the accessors in a sample bean class :

def self.property(*properties)
  properties.each do |prop|
    class_eval <<-EOS
      def #{prop} ()
        @#{prop}
      end
      def #{prop}= (val)
        @#{prop} = val
      end
    EOS
  end
end

Here the code which the metaprogram generates is embedded within Ruby here-docs as a string - eval ing on a string is not the recommended best practice in the Ruby world. These stringy codes are not treated as first class citizens, in the sense that IDEs do not respect them as code snippets and neither do the debuggers. This has been described in his usual style and detail by Steve Yeggey in this phenomenal blog post. Using define_method will make it IDE friendlier, but at the expense of readability and speed. The whacky class_eval runs much faster than the define_method version. A rough benchmark indicated that the class_eval version ran twice as fast on Ruby 1.8.5 than the one using define_method.

def self.property(*properties)
  properties.each do |prop|
    define_method(prop) {
      instance_variable_get("@#{prop}")
    }
            
    define_method("#{prop}=") do |value|
      instance_variable_set("@#{prop}", value)
    end
  end
end

Anyway, all these are examples of dynamic metaprogramming in Ruby since everything gets done at runtime. This is a big difference with Lisp, where the code templates are not typeless strings - they are treated as valid Lisp data structures, which the macro processor can process like normal Lisp code, since macros, in Lisp operates on the parse tree of the program. Thus code templates in Lisp are IDE friendly, debugger friendly and real first class code snippets. Many people have expressed their wish to have Lisp macros in Ruby - Ola Bini has some proposals on that as well. Whatever little I have been through Lisp, Lisp macros are really cool and a definite step forward towards providing succinct extensibility to the language through user defined syntactic control structures.

OO Abstractions or Syntax Extensions ?

Coming from an OO soaked background, I can only think in terms of OO abstractions. Ruby is, possibly the first language that has pointed me to situations when syntax extensions scale better than OO abstractions - Rails is a live killer example of this paradigm. And finally when I tried to explore the roots, the Lisp macros have really floored me with their succinctness and power. I do not have the courage to say that functional abstractions of Lisp and Ruby are more powerful than OO abstractions. Steve Yeggey has put it so subtly the natural inhibition of OO programmers towards extended syntactic constructs :

Lots of programmers, maybe even most of them, are so irrationally afraid of new syntax that they'd rather leaf through hundreds of pages of similar-looking object-oriented calls than accept one new syntactic construct.

My personal take will be to exploit all features the language has to offer. With a language like Ruby or Scala or Lisp, syntax extensibility is the natural model. While Java offers powerful OO abstractions - look at the natural difference of paradigms in modeling a Ruby on Rails application and a Spring-Hibernate application. This is one of the great eye-openers that the new dynamic languages have brought to the forefront of OO programmers - beautiful abstractions are no longer a monopoly of OO languages. Lisp tried to force this realization long back, but possibly the world was not ready for it.

Monday, January 08, 2007

Why I should learn Lisp

At the beginning of 2006, I had promised myself that I will learn Ruby and the tricks of the trade of functional programming. I do Java for a day job and get paid for consulting on enterprise Java architectures. I like Java, I love the Java community, I am a big fan of some of the great cool Java frameworks that are out there. I used to do C++ as well five years back and took great pride in designing allocators and smart pointers. All these were part of the application codebase, and despite using productive libraries like Boost, infrastructure code management (aka memory management and memory leaks) took away most of my night's sleep, at the expense of the geek feeling that I am doing C++. Java was the language that took away from me the pride of writing destructors and allocators. But in course of this sense of loss, I realized that I was now programming at a higher level of abstraction with the entire memory management left to the programming runtime. I was deep into encapsulation and better object orientation and embraced each successive release of Java with great enthusiasm.

One day, after reading a few pages of the pickaxe book and doing some hunting on the internet for Ruby evangelists, I came up with the following piece of Ruby code as the implementation of the Strategy Design Pattern :

class Filter
  def filter(values)
    new_list = []
    values.each { |v| filter_strategy(v, new_list) }
    new_list
  end
end

class EvenFilter < Filter
  def even?(i)
    i%2 == 0
  end
    
  def filter_strategy(value, list)
    if even?(value) 
      list << value 
    end
  end
end

of = EvenFilter.new
array = [1,2,3,4,5]
puts of.filter(array)

On further introspection, more reading of the pickaxe book and more rummaging through the musings of Java bashers in LtU, the light of lambda dawned on me. Looked like I was going through the enlightenment of the functional programming paradigms, the enhanced expressivity and abstraction that higher order procedures add to the programs. I could appreciate the value of lexical closures, bottom-up programming and functional abstractions. The new class for Strategy implementation is adapted from Nathan Murray's excellent presentation on Higher Order Procedures in Ruby :

class FilterNew
  def filter(strategy)
    lambda do |list|
      new_list = []
      list.each do |element|
        new_list << element if strategy.call(element)
      end
      new_list
    end
  end
end

of = FilterNew.new
filter_odds = of.filter( lambda{|i| i % 2 != 0} )
array = [1,2,3,4,5]
puts filter_odds.call(array)

The Disappearing Strategy Classes

In the new implementation, where is the strategy class that is supposed to be hooked polymorphically in the context and provide the flexible OO implementation ?

It has disappeared into the powerful abstraction of the language. The method filter() in the second example does not return the newly created list, unlike the first one - it returns a procedure, which can act on other sets of data. The second example is an implementation at a much higher level of abstraction, which adds to the expressivity of the intended functionality.

In fact with functional programming paradigms, many of the design patterns which GOF have carefully listed in the celebrated book on Design Patterns, simply go away in a language that allows user to program at a higher level of abstraction. Have a look at this excellent presentation by Norvig.

As Paul rightly mentions in his post, the paradigms of functional programming hides a lot of accidental complexity mainly because of the following traits, which the language offers :

Higher level of abstraction, which leads to lesser LOC, and hence reduced number of bugs

Side-effect free pure functional code, which liberates the programmer from managing state and sequence of execution

Improved concurrency and scalability because of the stateless and side-effect-free programming model

Ruby or Lisp ?

People look upon Lisp as the language of the Gods, someone has mentioned Ruby as an acceptable Lisp, many others consider Ruby as lying midway between Java and Lisp. Ruby is an object-oriented language with functional programming capabilities, while Lisp came into being in 1958 with the landmark 'eval' function of John McCarthy. As Paul Graham says :

With macros, closures, and run-time typing, Lisp transcends object-oriented programming.

Lisp and Smalltalk have been the main inspirations to Matz behind designing the Ruby language. May be Ruby is more pragmatic than Lisp, but the roots of Ruby are definitely engrained within the concepts of pure macros, lexical closures and extensibility mechanisms that Lisp provides. Lisp is the true embodiment of "code-as-data" paradigm. Lispers claim that Lisp (or any of its dialects) is definitely more expressive than Ruby, Lisp macros can extend the language more seamlessly than Ruby blocks. I am not qualified enough to comment on this. But my only observation is that behind the nice Lispy DSL that Rails provide, its implementation looks really clumsy and possibly would have been much more cleaner in Lisp.

Not only Ruby, functional programming constructs are beginning to make their appearence in modern day OO languages as well. C# and Visual Basic already offer lambdas and comprehensions, Java will have closures in the next release - the Lisp style is promising to come back.

Still I do not think Lisp is going to make mainstream, yet I need to learn Lisp to be a better fit in today's world of changing programming paradigms.