Tuesday, August 08, 2006

XML Integration in Java and Scala

During my trip to JavaOne 2006, I had missed out the session by Mark Reinhold where he discussed Java's plan of integrating XML into the Java programming language. There have been lots of discussions in various forums about the possibilities of this happening in Dolphin - Kirill Grouchnikov has blogged about his thoughts on what he would like to see as part of native XML support in Java. The community, as usual, is divided on this subject - many people feel that integrating XML into the Java language will be a serious compromise on the simplicity of the language. Look at the comments section of this posting in JavaLobby. This feeling of compromise has gained more momentum in view of the upcoming integration of the scripting languages like Javascript, ECMA, Rhino etc. with Java (JSR 223).

Anyway, I think Java will have the first cut integration of XML in Dolphin. In the JavaOne session, Mark had discussed some of the options which they plan to offer in java.lang.XML, so as to make XML processing simpler in Java and liberate the programmers from the hell of dealing with DOM apis. Microsoft has already published its implementation of XML integration into C# and VB in the form of XLinq. I tried my hands at it using the June CTP and found it to be quite elegant. In fact the whole stuff looks seamless with the entire LINQ family and Microsoft's plan of fixing the infamous ROX triangle. Java has been lagging behind in this respect and is trying to make its last attempt to catch up - though expect nothing till Dolphin! I appreciate the fact that considering the millions of user base that Java has today and its committments to the community as being the default choice for enterprise platform (unless u r Bruce Tate, of course!), it is not easy to veto a change in the language. Still, better late, than never.


A few days ago, I was browsing through some of the slides of Mark from JavaOne, when I thought that it will be a worthwhile exercise to find out how these could be implemented in Scala, which, in fact offers the most complete XML integration as part of the language. I have repeatedly expressed my views about Scala in my blog (see here) and how positive I feel about saying Hello Scala. XML integration in Scala is no exception - in fact the nicest part of this integration is that the designers did not have to do much extra to push XML as a first class citizen in the Scala world. The elements of Scala that make it a nice host to XML integration are some of the core features of the language itself :

  • Scala being a functional language suppports higher order functions, which provides a natural medium to handle recursive XML trees

  • Scala supports pattern matching, which can model algebraic data types and be easily specialized for XML data

  • For-comprehensions in Scala act as a convenient front end syntax for queries

Go through this Burak Emir paper for more on how XML integration in Scala offers scalable abstractions for service based architectures.

For brevity, I am not repeating the snippets as Mark presented. They can be found in the JavaOne site for the session TS-3441. I will try to scratch the head with some of the equivalent Scala manifestations.

Disclaimer: I am no expert in Scala, hence any improvements / suggestions to make the following more Scala-ish is very much welcome. Also I tested these codes with the recent drop of 2.1.7-patch8283.

Construction : XML Literals

This example adds more literals to an existing XML block. Here's the corresponding snippet in Scala:

val mustang =
    <name>Method to find free disk space</name>

def addReviewer(feature: Node, user: String, time: String): Node =
  feature match {
    case <feature>{ cs @ _* }</feature> =>
      <feature>{ cs }<reviewed>
      <who>{ user }</who>
      <when>{ time }</when>


The highlights of the above implementation are the brevity of the language, mixing of code and XML data in the method addReviewer() and the use of regular expression pattern matching which can be useful for non-XML data as well. In case u wish, u can throw in some Java expressions within XML data as well.

Queries, Collections, Generics, Paths

This snippet demonstrates the capabilities of XML queries in various manifestations including XPath style queries. One major difference that I noticed is that the Scala representation of runtime XML is immutable, while the assumption in Mark's example was that java.lang.XML is mutable. I am not sure what will be the final Java offering, but immutable data structures have their own pros, and I guess, the decision to make XML runtime representation immutable was a very well thought out one by the Scala designers. This adds little verbosity to the Scala code below compared to its Java counterpart.

val mustangFeatures =
      <name>Method to find free disk space</name>
      <name>Improve painting (fix gray boxes)</name>
      <name>Zombie references</name>

def isOpen(ft: Node): Boolean = {
  if ((ft \ "state").text.equals("approved"))

def rejectOpen(doc: Node): Node = {

  def rejectOpenFeatures(features: Iterator[Node]): List[Node] = {
    for(val ft <- features) yield ft match {

      case x @ <feature>{ f @ _ * }</feature> if isOpen(x.elements.next) =>
        <id>{(x.elements.next \ "id").text}</id>
        <name>{(x.elements.next \ "name").text}</name>
        <engineer>{(x.elements.next \ "engineer").text}</engineer>

      case _ => ft

  doc match {
    case <feature-list>{ fts @ _ * }</feature-list> =>
      <feature-list>{ rejectOpenFeatures(fts.elements) }</feature-list>

val pp = new PrettyPrinter( 80, 5 );

The observations on the XML querying support in Scala are :

  • Use of for-comprehensions (in rejectOpenFeatures()) adds to the brevity and clarity of the clarity of the code

  • XPath methods (in isOpen() .. remember in Scala ft \ "state" becomes ft.\("state")) allows XQuery style of programming.

Another example which combines both of the above features and makes it a concise gem, is the following from another Burak Emir presentation:

for (val z <- doc(“books.xml”)\“bookstore”\“book”;
    z \ “price” > 30)
yield z \ “title”

Streaming In and Out

Mark showed an example of formatting XML output after summarizing all approved features from the input XML. We can have a similar implementation in Scala as follows :

def findApproved(doc: Node): Node = {

  def findApprovedFeatures(features: Iterator[Node]): List[Node] = {
    for(val ft <- features; (ft \ "state").text.equals("approved"))
      yield ft

  doc match {
    case <feature-list>{ fts @ _ * }</feature-list> =>
      <feature-list>{ findApprovedFeatures(fts.elements) }</feature-list>

Console.println(new PrettyPrinter(80, 5)

Along with formatted output, the snippet above also demonstrates loading of XML from a stream.

On the whole, Scala's support for XML processing is very rich, more so, because of the support that it gets from the underlying features of the language. Scala offers powerful abstractions for transformations (scala.xml.transform), parsing, validations, handling XML expressions, XPath projections, supporting XSLT style transformations and XQuery style querying. The Scala XML library is fairly comprehensive - most importantly it is alive and kicking. Till u have the same support in Java (Dolphin is still at least one year away), enjoy <scala/xml>.


Mukund said...

you may want to check out XJ (http://www.research.ibm.com/xj). We have literals + XML type checking.

Jeremy Hughes said...

XMLisp may also be worth a look.

Tony Morris said...

An example of tidying up your code. Here is your isOpen method:

def isOpen(ft: Node) = ft \ "state" != approved

Points to note:
* We have type inference in Scala so we needn't type annotate everything :)
* == and != use Object.equals
* No need for {} especially in pure functions such as this one
* no need for if(condition) false else true (in any language), since !condition