Apple2TC: an Apple II Binary to C Decompiler

Apple2TC: an Apple II Binary to C Decompiler - Part 3

This is part 3 in a series of posts describing Apple2TC.

The start is in Part 1.

In this part we get to actually generating some working C code.

"Simple C" Backend

As was established in the end of Part 2, our jump tracing disassembler is successfully able to statically disassemble large portions of binaries given to it, generating a reasonable looking assembler listing. The problem is that there is no way to truly evaluate the quality of said listing - it looks reasonable, but we can't tell whether it is actually correct or complete, and most importantly whether it contains enough data to start working on decompilation to C.

We could try to assemble it back into a binary and compare to the original, but that would be pointless because even a listing containing a sequence of byte values can be assembled correctly.

DFB $AD, $34, $12

; This correctly assembles to LDA $1234, but is not really useful.

We need some way to execute the disassembled code and ensure that its behavior matches the input binary. This is where the "Simple C" backend comes in.

The "Simple C" backend generates a 1 to 1 representation of the disassembled code in C, where every 6502 instruction is represented by the equivalent C code. In a sense, this is an Apple II emulator, but instead of an interpreter loop, every instruction has been "expanded" into what its case in the interpreter loop would have been. For example, the LDA $1234 above would generate the following C:

s_a = update_nz(peek(0x1234));

Here peek() is a function reading from the RAM array, while update_nz() updates the emulated CPU flags. s_a is the emulated 65002 accumulator register.

A lengthier example:

sty     $1722
dey
sty     $90
lda     #$5f

translates into the following C code:

/* STY */ poke(0x1722, s_y);
/* DEY */ s_y = update_nz(s_y - 1);
/* STY */ poke_zpg(0x90, s_y);
/* LDA */ s_a = update_nz(0x5f);

Note that the purpose of this backend is only to verify the correctness of our disassembly while remaining as simple as possible. The generated code is not meant to be efficient or readable. This is just a stepping stone we are using on the road to actual decompilation into readable C.

"Simple C Handling" of Branches

We hope it is pretty obvious how the above approach works for sequential code. But how can it handle branches, and in particular, indirect branches, like subroutine returns (returning from a subroutine is an indirect branch because the address to return to is not statically encoded, it is obtained dynamically from the stack)?

In other words, when the 6502 assembly executes "branch to address $1234", how do we know which C code corresponds to that address, and how do we implement the branch in C?

In order to solve this, during disassembly we create and maintain a collection of what we call branch targets - it contains the destination addresses of every branch instruction we have encountered. With the help of that collection, we can check whether the address of particular assembler instruction we are going to generate is a branch target. If it is, we can arrange the C code in a way that it is possible to jump directly to it. (Update: in later versions we instead maintain a list of basic blocks, which is essentially the same thing.)

Specifically, we arrange the generated C code in a giant loop with a switch inside. The switch switches on the value of the emulated program counter (the variable s_pc). Cases in the switch represent all known branch targets (basic blocks) and contain uninterrupted linear sequences of instructions. So, the number of switch cases corresponds to the number of branch targets (basic blocks) we know about.

Lets look at a simple example with a loop:

        ORG     $300
        LDA     #$00
        LDX     #$10
B1:     STA     $1000,X
        DEX
        BNE     B1
        RTS

This is the code that the "Simple C" backend generates for it:

for(;;) {
  switch (s_pc) {
  case 0x0300: // [$0300..$0303]    4 bytes
    /* $0300 LDA */ s_a = update_nz(0x00);
    /* $0302 LDX */ s_x = update_nz(0x10);
  case 0x0304: // [$0304..$0309]    6 bytes
    /* $0304 STA */ poke(0x1000 + s_x, s_a);
    /* $0307 DEX */ s_x = update_nz(s_x - 1);
    /* $0308 BNE */ s_pc = !(s_status & STATUS_Z) ? 0x0304 : 0x030a;
    break;
  case 0x030a: // [$030A..$030A]    1 bytes
    /* $030A RTS */ s_pc = pop16() + 1;
    break;
  default:
    fprintf(stderr, "Unknown code address: $%04X\n", s_pc);
    abort();
  }
}

We have exactly three branch targets and three cases in the switch - the entry point, the loop label B1 and the instruction after BNE, in case the branch is not taken.

We also have our default switch case report an error if at runtime we encounter a branch target we did not now about . This is very important since it tells us whether our disassembly is complete and correct. If that default case is never executed, we have complete knowledge of the control flow graph of the disassembled code. On the other hand, if it gets executed, and it often does, we have more work to do.

A Real Example

To whet our appetites, here is a link to the generated code for the Apple II ROM (be patient, the file is very large). It actually works - one can type and execute BASIC programs, enter the Monitor, etc. Here is a very short video of it in action:

There is no CPU emulator in that video. It is all done by C code generated by our Apple2TC tool. In the next part of this series we explain how we actually got it to work.

To Be Continued

In Part 4 we explain in detail the challenges we faced and what Apple II programming tricks we had to deal with.

Tzvetan Mikov's Blog

Search This Blog