Monday, July 31, 2006

Inside the New ConcurrentMap in Mustang

Tiger has offered a large number of killer goodies for Java developers. Some of them have enjoyed major focus in the community, like generics, enhanced for-loop, autoboxing, varargs, type-safe enums etc. But none has had the sweeping impact as the new java.util.concurrent. Thanks to Doug Lea, Java now boasts of the best library support for concurrent programming in the industry. Martin Fowler mentions about an interesting anecdote in his report on OOPSLA 2005, where he mentions
While I'm on the topic of concurrency I should mention my far too brief chat with Doug Lea. He commented that multi-threaded Java these days far outperforms C, due to the memory management and a garbage collector. If I recall correctly he said "only 12 times faster than C means you haven't started optimizing".

Indeed the concurrency model in Tiger has brought to mainstream programming the implementation of non-blocking algorithms and data structures, based on the very important concept of CAS. For a general introduction to CAS and nonblocking algorithms in Java 5, along with examples and implementations, refer to the Look Ma, no locks! article by Brian Goetz.

Lock Free Data Structures

The most common way to synchronize concurrent access to shared objects is usage of mutual exclusion locks. While Java has so long offerred locking at various levels as this synchronization primitive, with Tiger we have non blocking data structures and algorithms based on the Compare and Set primitive, available in all modern processors. CAS is a processor primitive which takes three arguments - the address of a memory location, an expected value and a new value. If the memory location holds the expected value, it is assigned the new value atomically. Unlike lock based approaches, where we may have performance degradation due to delay of the lock-holder thread, lock-free implementations guarantee that of all threads trying to perform some operations on a shared object, at least one will be able to complete within a finite number of steps, irrespective of the other threads' actions.

java.util.concurrent provides ample implementations of lock free algorithms and data structures in Tiger. All of these have been covered extensively in Brian Goetze's excellent book Java Concurrency In Practice, released in JavaOne this year - go get it, if u haven't yet.


I must admit that I am not a big fan of the management of Sun Microsystems, and the confused state of mind that Schwartz and his folks out there portray to the community. Innovation happens elsewhere - this has never been more true of the way the Java community has been moving. And this is what has kept Sun moving - the vibrant Java community have been the real lifeblood behind Java's undisputed leadership in the enterprise software market today (Ruby community - r u listening ?). The entire Java community is still working tirelessly to improve Java as the computing platform. Lots of research are still going on to increase the performance of memory allocation and deallocation in the JVM (see this). Lots of heads are burining out over implementing generational garbage collection, thread-local allocation blocks and escape analysis in Java. Doug Lea is still working on how to make concurrent programming easier for the mere mortals. This, I think is the main strength of the Java community - any other platform that promises more productivity has to walk (or rail) more miles to come up with something similar.

In this post, I will discuss one such innovation that has been bundled into Mustang. I discovered it only recently while grappling with the Mustang source drop and thought that ruminating on this exceptional piece of brilliance has its own worth and deserves a separate column of its own.

The New ConcurrentMap in Mustang

In Tiger we had ConcurrentHashMap as an implementation of the ConcurrentMap. Mustang comes with another variant of the map - the contract for ConcurrentNavigableMap and a brilliant implementation in ConcurrentSkipListMap. Have a look at the source code for this beast - you will never regret that data structures are there to encapsulate the guts and provide easy-to-use interfaces to the application developers.

Concurrent programming has never been easy and lock-free concurrency implementation is definitely not for the lesser mortals. We have been blessed that we have people like Doug Lea to take care of these innards and expose easy-to-use interfaces to us, the user community. Despite the fact that research for lock-free data structures has been going on for more than a decade, the first efficient and correct lock-free list-based set algorithm (CAS based) that is compatible with lock-free memory management methods, came out only in 2002. Lea's implementation of ConcurrentSkipListMap is based on this algorithm, although it uses a slightly different strategy for handling deletion of nodes.

Why SkipList ?

The most common data structure to implement sorted structures is a form of balanced tree. The current implementation of ConcurrentSkipListMap goes away from this route and uses the probabilistic alternative - skip lists. As Bill Pugh says

Skip lists are a data structure that can be used in place of balanced trees. Skip lists use probabilistic balancing rather than strictly enforced balancing and as a result the algorithms for insertion and deletion in skip lists are much simpler and significantly faster than equivalent algorithms for balanced trees.


The verdict is not as clear as Bill says, but the main reason behind using skip lists in the current implementation is that there are no known efficient lock-free insertion and deletion algorithms for search trees (refer JavaDoc for the class). The class uses a two-dimensionally linked skip list implementation where the base list nodes (holding key and data) form a separate level than the index nodes.

Lock-Free Using CAS Magic

Any non-blocking implementation has a core loop, since the compareAndSet() method relies on the fact that one of the threads trying to access the shared resource will complete. Here is the snippet from Brian Goetz article (look at the increment() method of the counter) :

public class NonblockingCounter {
  private AtomicInteger value;

  public int getValue() {
    return value.get();
  }

  public int increment() {
    int v;
    do {
      v = value.get();
    }
    while (!value.compareAndSet(v, v + 1));
    return v + 1;
  }
}


Similarly, the implementation methods in ConcurrentSkipListMap all have a basic loop in order to ensure a consistent snapshot of the three node structure (node, predecessor and successor). Repeated traversal is required in this case because the 3-node snapshot may have been rendered inconsistent by some other threads either through deletion of the node itself or through removal of any of its adjaccent nodes. This is typical CAS coding and can be found in implementation methods doPut(), doRemove(), findNode() etc.

Handling Deletion

The original designers of these algorithms for list based sets used mark-bits and lazy removal for deletion of nodes. Doug Lea made a clever improvization here to use a marker node (with direct CAS'able next pointers) instead, which will work faster in the GC environment. He however, retains the key technique of marking the next pointer of a deleted node in order to prevent a concurrent insertion. Here's the sequence of actions that take place in a delete :

  1. Locate the node (n)

  2. CAS n's value to null

  3. CAS n's next pointer to point to a marker node

  4. CAS n's predecessor's next pointer over n and the marker

  5. Adjust index nodes and head index level


Any failure can lead to either of the following consequences :

  1. Simple retry when the current thread has lost to a race with another competing thread

  2. Some other thread traversing the list hits upon the null value and helps out the marking / unlinking part


The interesting point is that in either case we have progress, which is the basic claim of the CAS based non-blocking approach. Harris and Maged has all the gory details of this technique documented here and here.

Postscript

The code for the implementation of ConcurrentSkipListMap is indeed very complex.Firstly it deals with a multilevel probabilistic data structure (Skip List) and secondly it makes that piece concurrent using lock-free techniques of CAS. But, on the whole, for anyone who enjoys learning data structure implementations, this will definitely be a very good learning experience. The devil is in the details - it could not have been more true than this exquisite piece from Doug Lea!

No comments: