Jul 15

After reading Ayende’s post Obsolete in isolation, I decided to write back in my bog on such topic that inspires me.

I’m in agreement with Ayende (yes again!), even if I would not use the same wording, I prefer to do deprecation and black boxing (which is almost the same at the end).

Back to the initial question: How to handle legacy code?

I’ve heard in the past, managers saying “what are the estimates to get ride of the C++ code? » This is simply stupid: arguing that code is wrong due to his language is simple a non-sense. More over, I would add that legacy code is obviously less buggy than brand new code as less tested (even if you are doing TDD). Some may argue that there are intrinsic reasons to get ride of C++ because of the language itself: the interoperability in a JEE stack for example. I can’t say it’s totally wrong but JNI works well here, and anyway my point is that managers tends to think that removing/rewriting a piece of code and keep the exact same feature is simple, costless, without danger and will allow the implementation of lots of wonderful new features. This is 100% wrong.

First, legacy code is often a spaghetti plate: lots of dependencies. Even if you are using Spring or Windsor, this is a question of maturity of the developers and best practices, more than tooling.

Secondly, legacy code often encapsulates the business logic. Replacing such code can’t be done at the exact same level of features. If your legacy code was not written using TDD (or at least with a very large coverage by tests)…which almost never the case…you will lose some features, or have different behaviour. The best thing that can occur is that you discover yourself the discrepancies before the customers.

The suggestion of Ayende to embed an isolated version of legacy code in a new project is interesting. But I think it should not be just isolated, but also black boxed: I mean the legacy code should be seen as external code base (as a middle ware for example). The idea behind this is that legacy code is less buggy that new code, so take it as is, do not look inside use the interface it exposes and that’s it! So I think there is no need for new developers to learn the legacy code, only the legacy behaviour. Obviously, you will find some defects in legacy code, so you need at least one or tow developers that know about this code base.

Ok, fine, we have a big black box in the middle of our code: will I keep it for years? Obviously not, so that’s why we have to make this black box smaller each time it’s possible. The overall strategy is to have features leaving the old legacy code base to be reimplemented in the new code base. When I say “reimplemented”, it can’t be with the exact same behaviour…the feature has to be better, embrace new requirement, be more appealing…etc. Having the exact same behaviour would be very hard and will include what is not really nice in the feature.

Should I remove the code of the features that I remove from legacy code base? No. Even removing some code can create defects. If it’s not the case, it may be painless to remove all the dependencies. So I prefer deprecate the feature, I mean: keep the code, document the feature is replaced by another one, remove it from all the user interfaces and check that the new code base do not use this feature. This is safe and costless than removing the code.

This approach has some issues, may not feet your requirements or is simply not accurate for you. But, I think it provide a reasonable approach for legacy code management. What is important to be aware is that legacy code is very often the code that makes $$ and the code that is less buggy.