Catalog of low-level refactorings

The decompilation of a program can be carried out as the sequential application of basic program transformations to that program, where every transformation increases the abstraction level of the code while retaining its semantics. So each transformation can be considered as a Refactoring.

This is a catalog of refactorings for low-level (near Assembly) code for reverse engineering purposes. The refactorings here listed help on making the low-level code incrementally more intelligible. The combined and successive application of these refactorings can effectively bring a low-level machine code to a higher-level code, while retaining its semantics.

translation_refactoring.png

Some of the refactorings here listed are in all aspects identical to their higher-level languages counterparts. Other refactorings are specific to the traits of Assembly code. The following lists all refactorings of this proposed catalog.

The catalog is presented using the standard format described in (Fowler, 2000): a short synopsis, an example, the motivations for its use, and the application mechanics. The examples are written in almost syntactically correct C code, with some exceptions in order to faithfully represent near Assembly code. Namely, statements can also appear in the global scope, besides of appearing in function bodies.

Function prototyping

As in most imperative languages, functions constitute the basic reusable unit of Assembly code, and are usually generated from the higher-level source code on an one to one basis during compilation. But the information about the function bodies, arguments, and local variables is not properly retained by the Assembly code. The following refactorings allow to incrementally lift the bodies, prototypes, and frames of functions.

Organizing data

During compilation all the data flow is mapped to accesses from/to the processor registers, stack, and global memory. The following refactorings incrementally transpose that data flow in terms of local and global variables. They operate mostly on a function level.

Structuring control flow

All high-level language control structures (if, while, and for statements) are translated into jumps and conditional jumps on Assembly language. These refactorings incrementally recover the high-level control structure that match the jumps control-flow graph.