code

New MIT Tool Automatically Rewrites Old Code for New Software

But take heart: It still requires human developers.

Michael Byrne

Michael Byrne

 whiteMocca/Shutterstock

Writing computer software involves a lot of cutting and pasting. It's just a part of normal workflow even―search Google for a problem, copy a kinda-sorta solution from Stack Overflow (or more proper documentation), and weld it into place in the new code. That last step naturally involves some amount of adaptation, rewriting, and debugging. (Sometimes it turns out that just writing your own solution in the first place would have been easier, but that's another story.)

Earlier this month at the Association for Computing Machinery's Symposium on the Foundations of Software Engineering in Paderborn, Germany, a team of computer science researchers from MIT's CSAIL lab unveiled a new system for automatically transplanting code from one program into another. The tool, dubbed CodeCarbonCopy (CCC), works by comparing the execution of both the new software and the "donor" software, and then updating things like variable names and data representations in the donor code to the new host code. So, if the host program calls some variable x, CodeCarbonCopy will find the matching variable in the new code and rename it.

That sounds kind of trivial, but it requires coming to some fundamental understanding about what each program actually does―like what a variable means given a certain program context and how that program uses it. Something similar is done to data representations within each program. It's an interesting problem. CCC solves it by feeding each program, the host and the donor, the same input file and watching each one do its thing. The result is a symbolic representation of every value that the two programs compute.

In addition, CCC is able to identify functionality within the donor code that's used by the donor program but isn't actually useful in the new host.

The MIT group did experiments with eight code transfers between six real-world programs, including VLC, mtPaint, and MPlayer. In seven transfers, functionality was successfully maintained through the transfer process. In the eighth transfer, from the program mtPaint to bmp2tiff, CCC just couldn't do the job thanks to some particularly quirky data structures.

Anecdotally, there's some amount of building angst among developers about automation, that we will soon be outsourced to code-writing machines. CCC then might not be the best news, but for now we can take heart that the system still requires a considerable amount of human input. Crucially, the human programmer has to identify where in the host program the code is to be transplanted, any irrelevant data, and, of course, the actual functionality to be transplanted. CCC also takes a whole lot of time to do its magic, with the longest transplantation seen in the MIT group's experiments topping out at over 12 minutes.