Which implementation of a data structure you should use depends on your use case. If you are more concerned about the overall time complexity of a sequence of operations on the data structure, choose the one with the best amortized complexity bounds. When you have an amortized data structure with a time bound of O(n) for a sequence of n operations, there may be some individual operations that take more than O(1). But you are ok so long the overall bound is maintained at O(n) for the whole sequence.
However, there are situations where you cannot afford amortized data structures. Many real time systems need predictability that can only be guaranteed through worst case time bounds. For such cases, you need to use a data structure implementation that gives you the worst case promise.
In my last post I discussed many functional data structures that come with the guarantee of being persistent. Which means that even if you make changes in the data structure, you will still be able to access all the earlier versions at no added performance cost. In that post I briefly touched upon some of the techniques that implementation of persistence employs that ensures there's no brutal copying going on behind the scenes. Clojure does that to great effect in implementing its family of persistent data structures. Scala 2.8 also has an implementation of Bagwell's Hash Mapped Array Trie, the secret sauce behind Clojure's persistent hash map and vectors. But this is not a post for discussing Clojure or Scala data structure implementations .. so let's move on ..
Amortized data structures, in a sense, walk the middle path between Dick Lipton's Genius Channels and Dumb Channels. You have an adversary that makes some of the operations complicated threatening to pull down your promise. But at the same time you have the friendly operations that act as the counter-balancing force thereby maintaining the amortized bounds over the whole sequence of operations.
Consider Okasaki's
BatchedQueue
implementation, that uses a pair of lists to implement a queue data structure. The list of elements in the queue is split across the two lists, front
and rear
. front
contains the half of the elements in the correct order while rear
contains the other half in the reverse order. Dequeue takes place from the front
(see head below), while enqueue takes place at the rear
(see snoc
below). Here's the Haskell implementation of the BatchedQueue
taken from Okasaki's book ..Clearly
head
offers worst case O(1). But tail
has worst case O(n), since it has to reverse the rear list when front becomes empty. But with an amortized analysis using either the credits of the Banker's method or the potential of the Physicist's method, you can prove that all operations offer amortized O(1). Okasaki's book contains the proof of the amortized bounds for BatchedQueue
.The problem with
BatchedQueue
is that all bounds break in a persistent setting. Remember we talked about multiple logical futures for a persistent data structure. For one specific instance of the BatchedQueue
, when you invoke tail persistently, you are actually doing the reverse multiple times, once for each of the logical futures of the operation that create the new object. Have a look at the following diagram that illustrates how BatchedQueue
cannot offer an amortized O(1) in a persistent setting.In the figure above, we have the original queue containing the numbers from 1 to 10 distributed evenly across front and rear lists. After 4 invocations of
tail
, we have the second scenario with only a single element left in the front list. If we have 2 invocations of tail
in a persistent setting, each will invoke reverse
separately despite the fact that the rear list only has 5 credits that can be spent by one of those operations.In an upcoming post, I will discuss BankersQueue, an alternative implementation of the queue data structure that offers O(1) amortized complexity under persistent setting.
No comments:
Post a Comment