Tuesday, October 02, 2007

Refactoring - Only for Boilerplates ?

There are some bloggers who make you think, even if you do subscribe to an orthogonal view of the world. Needless to say, Steve Yegge is one of them. Quite some time back he introduced us to Fowler's Refactoring bible through this extremely thought provoking essay. Even if you are a diehard Java fan, his post forces you to think about the Caterpillar Butterfly conundrum. Seven months later, another great blogger, Raganwald, builds upon Yegge's post and discusses the various agonies that mutable local variables bring into your way of Extract Method refactoring. Raganwald's post led me thinking for the second time when I first read it. Last weekend, I came back to both of the links accidentally, through my favorite search engine (as if there are others also!) and re-read both of them. This post is an involuntary rant of that weekend reading.

Yegge says ..
Automated code-refactoring tools work on caterpillar-like code. You have some big set of entities — objects, methods, names, anything patterned. All nearly identical. You have to change them all in a coordinated way, like a caterpillar’s crawl, moving all the legs or lines this way or that.

Raganwald ends his post with the scriptum ..
This is exactly why languages with more powerful abstractions are more important than adding push-button variable renaming to less powerful languages. Where's the button that refactors a method with lots of mutable variables into one with no mutable variables? No button? Okay, until you invent one, don't you want a language that makes it easy to write methods and functions without mutable local variables?

Jokes apart, sentiments on the wayside, both of them target Java as the language of the Blub programmers, a language with less powerful abstractions, a language that generates caterpillar like code for push-button refactoring. They make you think deeply the possible reasons why you don't come across the term refactoring as often in other *powerful* languages like Lisp and Ruby. While you as a Java programmer consider yourself agile, when the menu item Refactor drops down into your favorite IDE and renames a public method correctly and with complete elan.

Is Refactoring only for Java ?

Martin Fowler defines refactoring as the process of changing a software system in such a way that it does not alter the external behavior of the code yet improves its internal structure. We all want to do so on the software that we write - correct ? Sure, then why the heck, the Java guys boast on something that should, in principle, be the normal way of life irrespective of the language that you use ? Can it be the case that with other languages, programmers write the best optimized code the very first time and need not improve upon the design and code organization in future ? Too optimistic a claim even for Paul Graham.

OO Organization and Modularization

Java is an OO language and offers a wealth of various ways to organize your modules and subsystems. Being a class based language, Java offers various relationships to be established between your abstractions and organize them through polymorphic hierarchies. You can have inheritance of interface as well as implementation, containment and delegation and a slew of frameworks to work on various instantiation models and factories. You can have flexible packaging at the development level as well as deployment level. Java compiles into bytecode and runs in the most powerful VM on the planet - apart from pure Java you can drop into your codebase snippets of other scripting languages that can reside and run within the JVM. The moot point is that all these flexibilities provide you, the Java programmer, with a slew of options at every stage of design and development. As client's requirements change, as your codebase evolves, it is only natural that you optimize your code organization using the best possible modularization strategy. Does this establish refactoring as a necessity for well-designed software and not a mere tool ? More so, when you are using a programming language that offers a rich repertoire of code and module organization. Sure, you need to promote some state into an inner class increasing the level of abstraction, or refactor a slice of code snippet from an existing method into another reusable function.

And Automated Refactoring Tools ?

So, we now agree that refactoring is necessary and it leads us to the holy grail of well-designed software. Do we need automated refactoring tools ? May not be, if we are working on a small set of codebase which you can cache into your memory all at a time. The moment you start having page faults, you feel the necessity of automation. And typical Java enterprise systems are bound to cross the limits of you memory segment and lead to continuous thrashing. Obviously not something that you would want.

But what do you need to have automated refactoring capabilities into your IDE ? Type information, which unfortunately is missing from code written in most dynamic languages. Without type information, it is impossible to have automated full-proof refactoring. Cedric has a nice post which drums this topic to the fullest.

In short, with Java's rich platter of code organization policies, you have the flexibility of merciless refactoring and with Java's static type system you have the option of plugging in automated refactoring tools with your IDE - so nice.

What about Mutable Local Variables ?

It is a known fact that mutable variables, at any level, are an antipattern for functional programming. And a functional program makes it a mathematician's dream for all sorts of analyses. However, in Java, we are programming real world problems, which has enough of a state to model. And Java is a language which offers the power of assignment and mutability. Assignments are not always bad, and is a natural idiom for programming imperative languages. Try modeling every real world problem with a stack containing objects with nested life times and with a constant value during their entire life time. Dijkstra has the following observation while talking about the virtues of assignments and goto statements ..
.. the only way to store a newly formed result is by putting it on top of the stack; we have no way of expressing that an earlier value becomes now obsolete and the latter's life time will be prolonged, although void of interest. Summing up: it is elegant but inadequate. A second objection --which is probably a direct consequence of the first one-- is that such programs become after a certain, quickly attained degree of nesting, terribly hard to read.

It is all but natural that mutable local variables make automated refactoring of methods difficult. But, to me, the more important point is the locality of reference for those local variables. It all depends on the way the mutation is used and the closure of code that is affected by the mutation. No process can handle ill-designed code - disciplined usage of mutability on local variables can be handled quite well by the refactoring process. It all boils down to the problem of how well your code is organized and the criteria used in decomposing systems into modules (reference Parnas). After all, we are modeling state changes using mutable local variables - Raganwald suggests a coarse level state management using objects. This is what the State Design Pattern gives us. He mentions in one of his comments that Object oriented programming is, at its heart, all about state. if you write objects without state, you are basically using objects as namespaces. So true. But at the same time when you need to handle state changes through mutability, always apply them at the lowest level of abstraction, so that even if you need to synchronize for concurrent access, you do so by locking at the minimum level of granularity. So, why not mutable local variables, as long as the closure is well understood and justified by the domain for which it is being applied.

Mutable local variables, when inappropriately weaved into the sphagetti of a method, makes refactoring very difficult. At the same time when the programmer discovers this difficulty, she can try to redesign her method taking advantage of the numerous levels of abstraction that Java offers. From this point of view, Refactoring is also a great teacher towards the ultimate objective of well-designed software.


Carsten said...

In the essence it is like that: As your program contains some repetitive element that could be refactoring, you are not using the appropiate abstractions (or the "wrong" programming language)

I cannot relate to this, this would mean that the best possible implem,entaion would be a monster of 20k-20Mloc which contains no inner structure to give you a foothold for analysis.

I prefer having 5 times more code with a bit redundance that I can manage mostly with an IDE. This seems more reasonable to me: Have a look at the redundancies in natural langauges, they are there for a good reason - this reason has been verified by thje only reliable test available: The test of time

Dan Nugent said...

I think that a lot of programmers are ignoring an important point when people talk about reducing code repetition on large projects.

Part of the idea is that large projects are intrinsically *wrong*. That you should be looking at making a number of smaller projects that are composable, even if you never end up reusing one of those smaller projects elsewhere.

The *only* counterargument to better modularization and smaller projects that I know of is optimization. But common, when you're going towards optimization as one of your primary project goals you already have to acknowledge that you're entering a whole new world of pain

Anonymous said...

Java is an OO language and offers a wealth of various ways to organize your modules and subsystems.

Even ignoring that the first half of this statement pretty much precludes the second, Java is not particularly wealthy in this regard even in comparison to other recent OO languages.

But what do you need to have automated refactoring capabilities into your IDE ? Type information

Absolutely! But again, Java's typing is pretty weak for a statically-typed language of it's era. On the plus side, the tools for Java are unsurpassed.

Dijkstra: .. the only way to store a newly formed result is by putting it on top of the stack; we have no way of expressing that an earlier value becomes now obsolete and the latter's life time will be prolonged, although void of interest.

As I read it, this is purely a performance concern, right? Thankfully these days we have tail calls and garbage collection.

Dijkstra: A second objection --which is probably a direct consequence of the first one-- is that such programs become after a certain, quickly attained degree of nesting, terribly hard to read.

By which he justifies the introduction of goto (necessarily coupled with assignment - aka mutable variables), then questions whether it really helped. He advocates replacing goto with "more powerful notations", which don't hinge so much on the use of assignment. By now it is almost universally accepted that he was right about goto, but we're only slowly noticing that assignment is still hanging around like a bad smell.

After all, we are modeling state changes using mutable local variables

No! That's precisely it. Local means disappearing at function exit. There is no local change of state, so it is not necessary to mutate local variables.
There is only external change of state which is already tracked externally. Modelling external state-change with local variables is merely caching - a performance hack only.

Because they increase coupling, mutable local variables should only be used when necessary as a performance hack, and they should be marked such.

Perhaps I phrase some of this too strongly, but I find it so very frustrating that people can read all these well-considered, well-argued and well-written texts and come away without understanding their fundamental points.

Debasish said...

I am not advocating usage of mutable local variables. What I meant was that so long as the mutation is done with a purpose and the closure clearly defined within the codebase, the Extract Method refactoring can be done easily. So, the main issue in refactoring is NOT the mutable local variables, but the code organization itself. And very often we find situations which can be more readable using assignments.

Anonymous said...


Unknown said...

You post some great stuff Debasish. Thanks!