{"id":12251,"date":"2019-08-27T16:10:48","date_gmt":"2019-08-27T13:10:48","guid":{"rendered":"https:\/\/mozaicworks.com\/?p=10837"},"modified":"2022-02-01T18:42:52","modified_gmt":"2022-02-01T16:42:52","slug":"a-new-method-for-fast-refactoring-of-legacy-code","status":"publish","type":"post","link":"https:\/\/mozaicworks.com\/software-engineering\/a-new-method-for-fast-refactoring-of-legacy-code","title":{"rendered":"A new method for fast refactoring of legacy code"},"content":{"rendered":"\n
In this article, I will present a method that I’ve tried in a few codebases in compiled languages for safely and quickly refactoring untested code. First, we will discuss the main problem we are trying to solve, quickly introduce the techniques coined by Michael Feathers, then discuss some shortcomings of the existing techniques, and finally describe the proposed technique.<\/p>\n\n\n\n
Briefly, the technique I’ve been experimenting with implies refactoring first towards pure functions using safe, mechanical refactoring steps, then testing the pure functions by quickly writing data-driven tests, property-based tests or by using the golden master technique, and finally refactoring the pure function towards the desired design. <\/p>\n\n\n\n\n\n\n\n
Please feel free to skip to the section “A New Method” if you are familiar with the legacy code problem and techniques. <\/p>\n\n\n\n
Many software projects go through a cycle like the following:<\/p>\n\n\n\n
You probably recognized the technical debt \/ legacy code problem. Once you’ve reached the final point, the options are limited. You can either <\/p>\n\n\n\n
But here’s the rub: refactoring involves changing code, and changing code can create even more problems. Also, given that you’ve written code that’s hard to change until now, what makes you think that you can suddenly create code that’s easier to change?<\/p>\n\n\n\n
Fortunately, Michael Feathers did the heavy lifting for us and found ways to safely refactor existing code. If you don’t know how, go and read his book “Working Effectively with Legacy Code”<\/a>. Then practice the techniques, at a workshop<\/a> or at a Legacy Coderetreat<\/a>. Only then, maybe you can start trying to apply the techniques on your code.<\/p>\n\n\n\n The basic technique goes as follows:<\/p>\n\n\n\n This solution works very well once you master the techniques. However, it has one problem: it’s quite slow and tedious. Sure, that should be expected – cleaning up a mess is rarely fast or easy. But the business often doesn’t have time to invest into the clean up.<\/p>\n\n\n\n Maybe there’s another way to do the same thing?<\/p>\n\n\n\n I’ve been pondering this problem for a long time. At the same time, I learned more and more about functional programming. Therefore, I’m proposing a new method that involves pure functions and passing functions as arguments, and that I believe to be faster.<\/p>\n\n\n\n Before we move on, I’d like to make it clear that, while I’ve played with this method in various code samples, I can’t claim that it’s fully studied and perfect. I plan to try it out with more people, and see what I can learn from it. I believe however that it’s promising enough to be described.<\/p>\n\n\n\n The method has three steps:<\/p>\n\n\n\n Let’s define a few terms, and move on to describe each step.<\/p>\n\n\n\n Before we move on, let’s define pure functions. A pure function is a function that returns the same output values when receiving the same input values, and changes nothing in the program state. For example the following function is pure:<\/p>\n\n\n\n while the following function is not pure:<\/p>\n\n\n\n As you can see, pure functions cannot be dependent on I\/O or on time, they cannot change the parameters they receive, and they are very predictable.<\/p>\n\n\n\n So how can we take advantage of pure functions?<\/p>\n\n\n\n I will postulate that any non-trivial program can be written as a set of pure functions combined with a few mutable functions<\/strong>. <\/p>\n\n\n\n For example, if your program is a web application, all the code that writes or reads from the database, all the code that creates the response and reads the request, and all the code that writes log files can be encapsulated in a few mutable functions. Everything else is easily written with pure functions.<\/p>\n\n\n\n If your program is a game, all the code that interacts with the graphical card, all the code that reads the player actions, and all the code that saves or loads the game is mutable. Everything else can be easily written with pure functions.<\/p>\n\n\n\n A more interesting effect is that this rule applies at different levels<\/strong>. It can apply to a class, it can apply to a module, to a set of classes, or to a method. The only time it fails is if we try to apply it on a very simple I\/O method.<\/p>\n\n\n\n That’s very powerful: it means that the pure functions representation of the program can be used no matter where we start or how large the code is<\/strong>. This is the first part of the puzzle.<\/p>\n\n\n\n The second part of the puzzle is: how can we safely refactor any code towards pure functions? <\/p>\n\n\n\n The basic technique is the following:<\/p>\n\n\n\n The end result is a pure function that receives a lot of parameters, either other functions or data parameters. Either way, we have refactored the function towards a pure function with all dependencies injected<\/strong>. This makes the function testable. <\/p>\n\n\n\n At this point, you can either move to phase 2, or refactor the function to reduce its parameters.<\/p>\n\n\n\n From my experience, each of these steps is mechanical and safe to do with modern IDEs<\/strong> (or with vim).<\/p>\n\n\n\n It’s time to write some tests.<\/p>\n\n\n\n At this point, we have at least one pure function we can test. Since the function doesn’t change its parameters, and returns the same outputs for the same inputs, we can use data-driven tests. In fact, the function is equivalent with a very large data table whose last column represents the output.<\/p>\n\n\n\n Moreover, the input data can be generated. We can use property-based testing and\/or the golden master technique to take advantage of input data generators, thus making the process faster.<\/p>\n\n\n\n As for the functions injected as variables, we can use in our tests stubbing or, for some of the I\/O code, mocking.<\/p>\n\n\n\n This leads us to phase 3.<\/p>\n\n\n\n Once covered by tests, our pure function can be easily refactored. The functions passed as parameters can be turned into interfaces that are injected. The function can be split, and the resulting functions moved into their own classes. Some of the input parameters can be passed to a constructor, while others remain function parameters. etc.<\/p>\n\n\n\n Or, we can go all in functional, extracting lambdas, composing functions, and using partial application to remove duplication. That’s up to you.<\/p>\n\n\n\n This takes us to a conclusion.<\/p>\n\n\n\n I have briefly presented in this article a new method to refactor existing, untested code, by refactoring first towards pure functions, then covering with data-driven or property-based tests, and finally refactoring the pure function towards the end goal. There are many more techniques that I found while trying this method, but for brevity I decided not to include them.<\/p>\n\n\n\n I have also made a few claims that require more investigation and experiments. Can an average programmer, once taught the basic techniques, apply them without changing the behavior of the code? Does this work for any type of code? etc.<\/p>\n\n\n\n I can only hope that you find this method interesting, and decide to try it out or ask questions about it.<\/p>\n\n\n\n For brevity, I have avoided a particular example in this article. If enough people are interested, I will create a few examples to showcase the technique. Until then, an example in C++ is detailed in my latest book “Hands-on Functional Programming with C++”<\/a>.<\/p>\n\n\n\nA New Method<\/h2>\n\n\n\n
What is a pure function?<\/h2>\n\n\n\n
int add(int first, int second){\n return first + second;\n};<\/code><\/pre>\n\n\n\n
int add(int first, int second){\n first += second; \/\/ this changes the value of first\n return first;\n};<\/code><\/pre>\n\n\n\n
It’s Pure Functions All the Way Down<\/h2>\n\n\n\n
Phase 1: Refactor towards pure functions<\/h2>\n\n\n\n
Phase 2: Write tests for the pure function<\/h2>\n\n\n\n
Phase 3: Refactor the pure function<\/h2>\n\n\n\n
Conclusion<\/h2>\n\n\n\n
An Example<\/h2>\n\n\n\n