MuScalpel Is an Algorithmic Code Transplantation Tool

A new system offers an automated way of reusing ("transplanting") existing code into new projects.

Computer scientists at University College London have developed a tool that automatically isolates and extracts code from one program and "transplants" it into another. The tool, known as MuScalpel, has the potential to, "ultimately change the understanding of the word 'programmer' considerably," as UCL systems engineer Mark Harman told Wired UK recently. It does so by offering a way around perhaps the most tedious aspect of software engineering, which is re-implementing solutions to problems that have already been solved by other programmers in other programs—"recreating the wheel," in Harman's words.

This may slightly underestimate the talents of programmers currently at work on those wheels. Good code is (already) modular, and this is a central tenet of programming. A good coder doesn't just sit down and bash out tens of thousands of lines of code in one giant torrent, they build their programs up piece by piece, always with an eye toward modularity. If I noticed that I'm rewriting something similar to what I've already written or am, heaven forbid, copy and pasting my own code, then it probably means that I need to make that segment of code into its own routine—a discrete package of code that is implemented only one time in one file, but can be invoked any number of times just by using the routine's name.

This works the other way too and it's why programming libraries exist. These are the vast storehouses of pre-written code, routines, and other features that are called on constantly in the software development process, even at the most basic ground levels. If I want my C++ program to open a file stored on a hard-drive, I don't recreate this entire process, I use a class called fstream, which comes with a pre-written function called open() that does this work for me. Much of programming is knowing how and when to call on this unfathomably large base of pre-existing code and how to put it together in useful ways.

This is essentially what MuScalpel wants to do for us, a process that Harman and his group liken to organ transplantation. The tool identifies first both a piece of code within a "donor" program and an entry point into that piece of code, or a "vein," which is the path leading from the beginning of the donor software to the beginning of the code-organ to be transplanted. This is trickier than it may sound.

"A programmer must first identify the entry point of code that implements a feature of interest," Harman and co. explain in a recent paper. "Given an entry point in the donor and a target implantation point in the host program, the goal of automated transplantation is to identify and extract an organ, all code associated with the feature of interest, then transform it to be compatible with the name space and context of its target site in the host."

The concept enabling MuScalpel to actually work is known as genetic programming. This is a biological evolution-inspired machine learning technique that takes a set of instructions and some evaluation criteria and lets an algorithm "find" the computer program best suited to a particular task. Eventually, just as evolution promotes traits beneficial to survival/reproduction, a genetic programming scheme promotes code that that is most beneficial (or is beneficial enough, rather) in terms of the given fitness function. In a sense, it's a mechanism by which programs design themselves according to our demands. It's probably the future, generally.

"While we do not claim automated transplantation is now a solved problem, our results are encouraging."

The first stage in the process is the identification and removal of the target organ, beginning with an "over-organ," which is all of the code in the donor that implements said organ. Next, the tool creates a system dependence graph, which is a way of breaking a program apart into its constituent pieces and charting out the interdependencies between them. This extracted material is then used to "grow" a new organ in the transplant recipient that both does what it's supposed to and doesn't mess anything up.

As in human organ transplantation, this is no easy task. The extracted organ comes with its own set of required inputs, or parameters, which may be representative of the donor and not the host. So, the first step in implantation is matching variables (which might be imagined as data blood vessels) with the organ's parameters. Via genetic programming, the over-organ is evolved in situ into a final organ product that's compatible with the host. Once it passes a given test suite, it's more or less good to go.

Most importantly, the MuScalpel algorithm works.

"While we do not claim automated transplantation is now a solved problem, our results are encouraging," Harman and his group write. "We report that in 12 of 15 experiments, involving 5 donors and 3 hosts (all popular real-world systems), we successfully autotransplanted new functionality and passed all regression tests. Autotransplantation is also already useful: in 26 hours computation time we successfully autotransplanted the H.264 video encoding functionality from the x264 system to the VLC media player; compare this to upgrading x264 within VLC, a task that we estimate, from VLC's version history, took human programmers an average of 20 days of elapsed, as opposed to dedicated, time."

So, does this mean a whole bunch of programmers are out of a job? Harman doesn't think so. It just means they won't be doing boring stuff anymore. They can be free to create and innovate. Etc.

"We want to free programmers from their shackles, not to make them redundant," he told Wired UK. He likens it to the human "computers" of the 1940s that electronic, programmed computers replaced. "The computation was tiresome and repetitive, but it required some of the most skilled and intelligent humans, since it had to be correct. Today, that meaning of the word computer is anachronistic."

Fair enough, but there are a whole lot of programmers that exist only because of the tedious jobs, the everyday coders skilled at banging together JavaScript but perhaps less so at developing innovative algorithms. A better comparison might be assembly line automation, where engineers and technicians may keep their jobs, but the factory floor winds up getting fucked.