Skip to main content

Apple2TC: an Apple II Binary to C Decompiler - Part 3

 This is part 3 in a series of posts describing Apple2TC. 

The start is in Part 1.

In this part we get to actually generating some working C code.

"Simple C" Backend

As was established in the end of Part 2, our jump tracing disassembler is successfully able to statically disassemble large portions of binaries given to it, generating a reasonable looking assembler listing. The problem is that there is no way to truly evaluate the quality of said listing - it looks reasonable, but we can't tell whether it is actually correct or complete, and most importantly whether it contains enough data to start working on decompilation to C.

We could try to assemble it back into a binary and compare to the original, but that would be pointless because even a listing containing a sequence of byte values can be assembled correctly.
DFB $AD, $34, $12
; This correctly assembles to LDA $1234, but is not really useful. 
We need some way to execute the disassembled code and ensure that its behavior matches the input binary. This is where the "Simple C" backend comes in.  

The "Simple C" backend generates a 1 to 1 representation of the disassembled code in C, where every 6502 instruction is represented by the equivalent C code. In a sense, this is an Apple II emulator, but instead of an interpreter loop, every instruction has been "expanded" into what its case in the interpreter loop would have been. For example, the LDA $1234 above would generate the following C:
s_a = update_nz(peek(0x1234));
Here peek() is a function reading from the RAM array, while update_nz() updates the emulated CPU flags. s_a is the emulated 65002 accumulator register.

A lengthier example:
sty     $1722
dey
sty $90
lda #$5f
translates into the following C code:
/* STY */ poke(0x1722, s_y);
/* DEY */ s_y = update_nz(s_y - 1);
/* STY */ poke_zpg(0x90, s_y);
/* LDA */ s_a = update_nz(0x5f);
Note that the purpose of this backend is only to verify the correctness of our disassembly while remaining as simple as possible. The generated code is not meant to be efficient or readable. This is just a stepping stone we are using on the road to actual decompilation into readable C.

"Simple C Handling" of Branches

We hope it is pretty obvious how the above approach works for sequential code. But how can it handle branches, and in particular, indirect branches, like subroutine returns (returning from a subroutine is an indirect branch because the address to return to is not statically encoded, it is obtained dynamically from the stack)?

In other words, when the 6502 assembly executes "branch to address $1234", how do we know which C code corresponds to that address, and how do we implement the branch in C? 

In order to solve this, during disassembly we create and maintain a collection of what we call branch targets - it contains the destination addresses of every branch instruction we have encountered. With the help of that collection, we can check whether the address of particular assembler instruction we are going to generate is a branch target. If it is, we can arrange the C code in a way that it is possible to jump directly to it. (Update: in later versions we instead maintain a list of basic blocks, which is essentially the same thing.)

Specifically, we arrange the generated C code in a giant loop with a switch inside. The switch switches on the value of the emulated program counter (the variable s_pc). Cases in the switch represent all known branch targets (basic blocks) and contain uninterrupted linear sequences of instructions. So, the number of switch cases corresponds to the number of branch targets (basic blocks) we know about.

Lets look at a simple example with a loop:
        ORG     $300
LDA #$00
LDX #$10
B1: STA $1000,X
DEX
BNE B1
RTS
This is the code that the "Simple C" backend generates for it:
for(;;) {
switch (s_pc) {
case 0x0300: // [$0300..$0303] 4 bytes
/* $0300 LDA */ s_a = update_nz(0x00);
/* $0302 LDX */ s_x = update_nz(0x10);
case 0x0304: // [$0304..$0309] 6 bytes
/* $0304 STA */ poke(0x1000 + s_x, s_a);
/* $0307 DEX */ s_x = update_nz(s_x - 1);
/* $0308 BNE */ s_pc = !(s_status & STATUS_Z) ? 0x0304 : 0x030a;
break;
case 0x030a: // [$030A..$030A] 1 bytes
/* $030A RTS */ s_pc = pop16() + 1;
break;
default:
fprintf(stderr, "Unknown code address: $%04X\n", s_pc);
abort();
}
}
We have exactly three branch targets and three cases in the switch - the entry point, the loop label B1 and the instruction after BNE, in case the branch is not taken.

We also have our default switch case report an error if at runtime we encounter a branch target we did not now about . This is very important since it tells us whether our disassembly is complete and correct. If that default case is never executed, we have complete knowledge of the control flow graph of the disassembled code. On the other hand, if it gets executed, and it often does, we have more work to do.

A Real Example

To whet our appetites, here is a link to the generated code for the Apple II ROM (be patient, the file is very large). It actually works - one can type and execute BASIC programs, enter the Monitor, etc. Here is a very short video of it in action:

There is no CPU emulator in that video. It is all done by C code generated by our Apple2TC tool. In the next part of this series we explain how we actually got it to work.

To Be Continued

In Part 4 we explain in detail the challenges we faced and what Apple II programming tricks we had to deal with.


Comments

Popular posts from this blog

You Don't Like Google's Go Because You Are Small

When you look at Google's presentations about Go, they are not shy about it. Go is about very smart people at Google solving very BIG problems. They know best. If you don't like Go, then you are small and are solving small problems. If you were big (or smart), you would surely like it. For example, you might naively think that printing the greater of two numbers should be as simple as std::cout << max(b(),c()) That is because you think small. What you really should want is: t1 := b() if t2 := c(); t1 < t2 { t1 = t2 } fmt.Print( t1 ) Isn't it much better? We didn't have to type all those extra semicolons that were killing productivity before. If you don't like it, you are small. If you wanted to extract an attribute of an optional parameter, you may be used to typing something like: a = p ? p->a : 0; or even: a = p && p->a You just make me sad because obviously what you really want is: a = 0 if p != nil { a = p->

How to speed up a micro-benchmark 300x

How to speed up a ubenchmark 300x Static Hermes: How to Speed Up a Micro-benchmark by 300x Without Cheating This is the first of a series of light blog posts about Static Hermes, covering topics that I find interesting or amusing. It is not intended to be a systematic introduction to Static Hermes design.Consider it more like a backstage pass to the quirks, features, and “aha!” moments that make Static Hermes unique. If you’re not familiar with Static Hermes, it’s an evolving project we’re working on to explore the confluence of static typing and JavaScript. It is work in progress, but we’re excited about the possibilities it opens up for performance improvements and more. For more background: Tweet with the slide deck of the Static Hermes announcement Previous talk about Hermes Contents: Meet interp-dispatch.js Let’s Run It! Static Hermes with Untyped Code Helping the Compiler Other Ways to Help the Compiler Some Observations Revisiting The Origina

Apple2TC: an Apple II Binary to C Decompiler - Part 1

This is a series of blog posts to serve as a log documenting my work on Apple2TC - an open source hobby project developed on GitHub:  https://github.com/tmikov/apple2tc . There are various interesting things that come up all the time and I thought it might be useful to record them for posterity. Part 2  describes the different components of the project Part 3 shows how we validate our disassembly by generating a working C representation of it. Part 4 shows the Apple II tricks we had to deal with to get to correct running code. Part 5 describes decompiling and running Robotron 2084 and Snake Byte. Part 6 introduces our SSA-based intermediate representation . Part 7 shows how to transform the IR to discover the original intent of the code.   What is Apple2TC? Apple2TC is an open source project to decompile original Apple II binaries, mostly games, into understandable and working modern C code,  completely automatically  by analyzing the runtime behavior of the code via custom softw