Tutorial

This tutorial will show on how to use the interactive decompilation tool to decompile a very simple Assembly program.

The main program of the interactive compilation is the file. After running it, the main window with an. The tool can load either Intel IA32 Assembly files (), or previously saved Intermediate Representation (IR) in textual ATerm format ().

In the subdirectory of the source distribution are included sample Assembly files generated from C sources via the compiler. From the File menu, let's open the Assembly file.

    .file    "factorial.c"
    .text
.globl factorial
    .type    factorial, @function
factorial:
    testl    %eax, %eax
    jne    .L2
    movl    $1, %edx
    jmp    .L4
.L2:
    movl    $1, %edx
.L5:
    imull    %eax, %edx
    decl    %eax
    jne    .L5
.L4:
    movl    %edx, %eax
    ret
    .size    factorial, .-factorial
    .ident    "GCC: (GNU) 4.1.2 20060715 (prerelease) (Debian 4.1.1-9)"
    .section    .note.GNU-stack,"",@progbits

The Assembly file is parsed and translated into the IR, and a pretty-printed view of the IR with syntax highlighting appears on the main window.

Either from the Refactor top-level menu, or from right-clicking on the code, a context-sensitive menu with a list of possible refactorings will appear .

Other views of the IR available from the View menu -- at the moment, the Control Flow Graph (CFG) view and the internal term view are available. Both views are linked with the main view, i.e., clicking in a CFG node or a term will select the respective code in all views. It is also possible to right-click on a CFG and access the Refactor pop-up menu from the CFG view.

The first step to reverse engineer is to extract the function. Many refactorings operate on a function scope, so it is imperative for the function signature to be reversed engineered by then. This can be accomplished by right-clicking on the label and choosing the Extract Function refactoring. A function named containing the statements between the label and the return statement.

The statement immediately before the return statement is an assignment to the register -- the register is being used to pass the function return value to the caller. This is a common calling convention in code compiled for the Intel IA32 architecture. To make this explicit, and update the function return type, apply the Set Function Return , specifying as the return symbol. will be added to the return statement, and the function return type will change from to .

The first statement inside the function reads the value of the register -- the is being used to pass an argument to the function. To make this explicit, and update the function prototype, apply the Add Function Argument refactoring, specifying the register as the argument symbol.

Passing arguments in registers is not the most common calling convention in Intel IA32 code -- usually function arguments are passed exclusively in the processor stack --, but some compilers for the IA32 architecture (such as Microsoft, Borland, and Watcom C++ compilers) have a fastcall option to use some registers to pass the first arguments of a function in registers, as that usually yields faster code. Other compilers (such as the compiler), allow to completely customize the calling convention. This was intentionally the case for this Assembly file, as the current implementation of the Add Function Argument refactoring in the interactive decompilation tool does not yet support function arguments passed in the stack.

At this point the function prototype is complete, and data flow analysis can be safely performed. We can now apply the Dead Code Elimination refactoring, to eliminate all those assignments to unused flag registers and temporaries . The Dead Code Elimination could not have been applied sooner -- applying before the Set Function Return refactoring would eliminate important code, as the refactoring would assume that the function had no return value, hence all assignments leading the final value would be erroneously eliminated.

The code is now less denser and easier to follow, but the existence of goto statements is an hindrance to the code flow understanding. The CFG view helps to realize the existence of an if-then-else statement in the first decision node (represented in the CFG by a diamond), and a loop after the second decision node.

Right-clicking on the statement presents the choice of structuring a if-then or a if-then-else statement. From the previous CFG inspection we will opt for the latter. Right-clicking on the statement presents only the choice of structuring a do-while statement. After structuring these control flow statements, no more goto statements will remain.

Although the control flow is now evident, the data flow is still unnecessarily complex, with an excessive use of temporary variables. These temporary variables can be eliminated with the application of the Inline Temp refactoring on the respective assignments.

The expressions are now more condensed, but there are some expressions resulting from compiler idiosyncrasies that can obviously be further simplified, such as the into simply , and ==

into simply =. These simplifications can be performed by applying the Simplify Expression refactoring on the respective expressions .

Now that both the control flow and data flow are clear it is easier to understand the role of the variables, and name them. Even if the name of the function hasn't hinted, it is clear now that the purpose of this function is to compute the factorial of an integer. Using the Rename Symbol refactoring let's rename the argument into , and the accumulator variable into .

See the comparison of the final code against the original C source from which was compiled.

Final Reverse Engineered Code:

signed int factorial(signed int n)
{
    if(n != 0)
    {
        f = 1;
        do
        {
            f = f * n;
            n = n - 1;
        }
        while(n != 0);
    }
    else
        f = 1;
    return f;
}

Original Source Code:

int factorial(int n)
{
    register int f;
    f = 1;
    while(n)
        f *= n--;
    return f;
}

Unfortunately it is not possible to apply the Structure While Statement refactoring due to the existence of the statements inside the if statements. The compiler duplicated this statement in both if branches, and it could be safely factored out, yielding the original source code, however such refactoring is not yet devised nor implemented.

You can see a video of IDC running the above example.