Tuesday, March 28, 2006

Non Java Languages on the JVM

In one of my earlier posts, I had a brief discussion on the support of the milieu of dynamic languages on the JVM. Inspite of the fact that both the JVM and the .NET CLR had been conceived as being static-type friendly, we are seeing lots of movements towards supporting dynamic languages on these platforms. In this entry I plan to continue on my earlier thread with some more technical details on some of the features that language implementers expect to be supported by the virtual machines. There are still confusions regarding what should be parts of the language and what should be supported at the VM level - but on the whole, some of them are heavily discussed in all major forums. Both JVM and .NET CLR being turing complete, support for all of the features should be, very well possible - it is just a question of the ease of implementation and use and the speed of execution on the VM platforms.

invokedynamic in JVM

Duck Typing makes its way to the JVM through this new bytecode. invokedynamic will make it easy for dynamically typed languages to be ported and run on the JVM. Dynamically typed languages like Python, Ruby etc. perform method dispatch by name, and not by type - invokedynamic-enabled JVM will ensure
that the verifier won’t insist that the type of the target of the method invocation (the receiver, in Smalltalk speak) be known to support the method being invoked, or that the types of the arguments be known to match the signature of that method. Instead, these checks will be done dynamically.

In case the method lookup fails at runtime, there will be a trap mechanism as a fallback. Gilad Bracha covers the details of this new bytecode in his blog here. Parrot already has support for an "invoke dynamic" instruction, which does runtime lookup along with exception handling.

Hotswapping

The main idea is to allow code changes on the fly, while they are running. The full capability of hotswapping implies any kind of change to be supported, addition/modification/removal of methods and attributes including changes in inheritance hierarchy. For statically typed languages, this is still an area of active research, but Gilad is hopeful of delivering full hotswapping at least for the dynamic languages. As he mentions in his blog
If we can get both invokedynamic and full hotswapping for dynamically typed languages working, implementations of languages like Python or Ruby could use the JVM object model directly, and really take advantage of the great performance of JVMs.

Could not agree more to the scepticism of Gilad. IANA expert on language or VM design, but the thought of implementing hotswapping of static variables makes me black!
Microsoft has come up with extension methods in C# 3.0 for a limited support of adding methods to an existing type without recompilation. But this is purely a C# feature and not a feature of the CLR.

invokedynamic and Hotswapping have been officialized as JSR 292, chartered to extend JVM support for dynamically typed languages.

Tail Calls

Just a small prelude for the new generation of programmers who missed the elegance of Lisp or Scheme and have been writing for-loops since. Functional language programmers use recursion to implement loops - however elegant it may look, recursive calls consume lots of stack space. Hence these languages employ all sorts of optimizations to make efficient loop implementation possible with constant stack space. Tail call optimization is one such technique, which replaces calls in tail position with jump statements. A call is said to be in a tail position if it is the last statement of a function.

Enough of beating around the bush - let's come straight to the point. Implementers want tail call support in the JVM. This is not as simple as it may seem. Recursive tail calls, which calls the function to which it belongs, can be optimized easily by introducing local jumps, and does not require any support from the target language. Hence language implementers targetting the JVM usually compiles these recursive tail calls as a goto bytecode directly, since Java source does not have any support of a goto statement. However, problem arises with non-recursive tail calls, since the JVM does not support the equivalents of C's setjmp/longjmp functions - hence non-local jumps cannot be achieved efficiently within the JVM. Implementors claim that supporting tail call optimization interferes with Java's stack walking security mechanism.

Various techniques have been proposed and used in the last decade or so for generic tail call optimization. But none of them have been found to be suitable for an efficient implementation on the JVM. The following are some of the commonly used techniques :

  1. Place the whole program in a big function and change all non local calls to direct jumps or switch statements inside this function. This method cannot work for the JVM because of the limit of 64 KB on method size, though it may work for the .NET CLR. But, overall, this is not a very scalable technique.

  2. Use trampoline. “A trampoline is an outer function which repeatedly calls an inner function. Each time the inner function wishes to tail call another function, it does not call it directly but simply returns its identity (e.g. as a closure) to the trampoline, which then does the call itself.” Use of trampolines limit indefinite stack growth, but at the expense of performance, since all calls made by the trampoline are statically unknown method dispatches.

  3. Cheney on the M.T.A. This technique by Henry Baker converts the program to CPS style, thereby turning all calls into tail-calls. Implementing this technique in the JVM implies simulating non-local jumps with chains of RETURNs or exceptions and doing CPS conversion on the entire program, which proves to be too expensive to be practical.


In summary, implementing tail-calls is expensive and Sun has no plans to support it yet in the JVM. Some of the JVMs like IBM's support optimization of recursive tail calls. CLR has an explicit tail call instruction which instructs the compiler to discard the caller’s stack frame prior to making the call. However, as Meijer and Miller observes, if the call is from untrusted code to trusted code, the frame cannot be fully discarded for security reasons.Regarding the support of tail-calls in CLR, here is what Gilad Bracha has to say
As far as tail recursion support in .Net, I understand that using it is rather expensive, because it forces a stack crawl to compute the effective protection domain (which otherwise could be computed lazily). So it has much less real value than one might expect.

Continuations

Yet another devilish elegance popularized by the functional paradigm, the Continuations Passing Style (CPS) of calling functions. Modern day Java programmers who have been inducted into programming with stack based function calls have already started smelling Greek! Don't worry, just think of CPS as yet another semantics of function call implementation, where instead of returning value to the calling function, the parent function tells the child function which function to pass the result to. No function call ever returns. The child function does this with an object named Continuation, which it gets from the parent. A continuation is an object which takes a snapshot of the current function's lexicals and control stack - when invoked, the complete state is restored from the calling chain. Once again, Parrot has this cool feature, and Dan has the commentary on this subject. Continuation based programming has got a real boost with the increasing popularity of the concept of continuation server, a radically different paradigm of Web development. Bruce Tate discusses this new paradigm in detail in the Crossing Borders article.
By offering a stateful programming model without giving up the scalability inherent in statelessness, continuation servers make Web application development much easier.


As a result people porting their dynamic languages would love to see CPS implemented in the JVM. Once again, the real challenge is stack management. Since dynamic languages like Scheme and Ruby have full support for continuations, implementers apprehend that running these languages under the JVM through invokedynamic may not get the best performance if they have to do their own stack management for supporting continuations. In the Kawa implementation for compiling dynamic languages to the JVM, a restricted version of Scheme's call-with-current-continuation has been implemented using Java exceptionsand can be used to prematurely exit (throw), but not to implement co-routines (which should use threads anyway).

.NET CLR also does not have explicit support for continuations. Don Box tries his hand in this blog post. But whatever he does is not a CLR support - it is a compiler hack exploiting the iterator feature of C# as has been rightly pointed out in the Comments section of Cedric Otaku's response to Don Box.
C# doesn't support continuations, instead it has the iterator feature, which allows you to use the yield() keyword. This isn't a special VM feature, but instead it's a compiler hack; the method that holds a yield is turned into an object, and all local vars are moved into fields, so that the state of the method is actually recorded in the object.

These things are not continuations, because the yield() only returns to the caller, whereas with continuations you could hand off control to some other functions (you can only call subroutines, which is different). This means, you don't get the full power of continuations but instead a specialized language feature that... well, allows you to write iterators.

I think the biggest challenge of implementing continuations support in the JVM is to follow the principle of "pay for it only if you need it", since not many languages actually need them. Gilad Bracha also predicts the support for continuations in the JVM as a "tall order" - till then we will have to live with the broken hand crafted implementation strategies with serious performance overheads.

2 comments:

Unknown said...

Just after I posted the entry, there is a report (source: ACM TechNews) from The ServerSide Java Symposium, where panelists have agreed that dynamic languages such as Ruby are mounting a threat to Java, but that the language itself can be improved and the ability of Java Virtual Machine could reach to dynamic languages.. Check the full report here

Unknown said...

[comments from Jens Axel Søgaard via mail : On your blog post "Non Java Languages on the JVM"] I enjoyed reading your post on non Java languages on the JVM. You mention the problem of tail calls and security on the JVM. John Clements and Matthias Felleisen have found a solution:
http://www.ccs.neu.edu/scheme/pubs/esop2003-cf.pdf
http://www.ccs.neu.edu/scheme/pubs/cf-toplas04.pdf