Skip to main content

Posts

Apple2TC: an Apple II Binary to C Decompiler - Part 7, Using The IR

 This is part 7 in a series of posts describing Apple2TC.  The start is in  Part 1 . Part 6 introduced our SSA-based Intermediate Representation. In this part we look at a running IR example and how it is transformed by our decompiler pipeline. Introducing The Example To demonstrate the utility of our IR, we have devised a slightly contrived example that nevertheless illustrates challenges encountered in real code.  The source of the example can be seen here:  https://github.com/tmikov/apple2tc/blob/master/blog/part6/ex.s . We assembled it with a6502: a6502 ex.s ex.b33 and disassembled it: apple2tc ex.b33 --asm > ex.lst Producing the following listing: /*0300*/ TSX /*0301*/ INX /*0302*/ INX /*0303*/ LDY M_0101,X ; $0101 /*0306*/ INY /*0307*/ STY M_90 ; $0090 /*0309*/ LDA M_91 ; $0091 /*030B*/ STA M_92 ; $0092 /*030D*/ L
Recent posts

Apple2TC: an Apple II Binary to C Decompiler - Part 6, The IR

 This is part 6 in a series of posts describing Apple2TC.  The start is in  Part 1 . Part 5 described getting two Apple II games to work when decompiled in "Simple C" mode. In this part, we introduce our new Intermediate Representation (IR) of the disassembled code, which unlocks much more sophisticated analysis and finally puts us on the road to real  decompilation. What is an IR? An IR is a way to represent executable code as a data structure in memory, in a manner that makes its semantics explicit, and thus easier to analyze and transform. Typically, one or more forms of IR are used by optimizing compilers to represent and optimize the source code they are given. We are coming from the opposite direction - we need to represent code that has already been compiled - but there is little difference. The same principles apply. Wikipedia has a high level article about Intermediate Representation , which, while not very informative on its own, can be used as a starting point by

Apple2TC: an Apple II Binary to C Decompiler - Part 5

 This is part 5 in a series of posts describing Apple2TC.  The start is in  Part 1 . Part 4 described the Apple II hacks we had to deal with, in order to get the decompiled version of Applesoft BASIC working. This part is a quick update on getting two Apple II games to work - Robotron 2084 and Snake Byte. Robotron 2048 After all the effort we went through for the BASIC ROM, Robotron 2048 was shockingly easy. First, we ran a2emu to collect runtime data: a2emu --collect --limit=30000000 robotron.b33 > robotron.json In the emulator, the game title screen appeared, we eagerly pressed spacebar, getting us to the control selection screen, selected keyboard, and we were off. The game is pretty challenging, but we managed to play a little before dying and closing the emulator. With the so collected runtime data, first we decompiled the binary to 6502 assembler: apple2tc --run-data=robotron.json robotron.b33 > robotron.lst Then to C (with the "simple" backend): apple2tc --simpl

Apple2TC: an Apple II Binary to C Decompiler - Part 4

This is part 4 in a series of posts describing Apple2TC.  The start is in Part 1 .  In this part we finally get to the interesting part and describe the Apple II hacks we found while trying to run the decompiled version of Apple II BASIC. Running the Decompiled Code In the previous part we described our technique for generating a "Simple C" representation of the disassembled 6502 code. Now it is time to actually build and run the generated code! Using our a2io and decapplib libraries, it is trivial to link the generated Simple C code into an executable.  We started by decompiling and running the Apple II ROM code, containing Applesoft BASIC and Monitor, because its disassembled source is already available and heavily commented  (thanks to Andy McFadden ). We can "cheat" and look at the listing to check whether we are doing something wrong. Problem 1: Indirect Branches In our first attempt, we decompiled the ROM statically and used the Simple C backend to convert

Apple2TC: an Apple II Binary to C Decompiler - Part 3

 This is part 3 in a series of posts describing Apple2TC.  The start is in  Part 1 . In this part we get to actually generating some working C code. "Simple C" Backend As was established in the end of Part 2 , our jump tracing disassembler is successfully able to statically disassemble large portions of binaries given to it, generating a reasonable looking assembler listing. The problem is that there is no way to truly evaluate the quality of said listing - it looks reasonable, but we can't tell whether it is actually correct or complete, and most importantly whether it contains enough data to start working on decompilation to C. We could try to assemble it back into a binary and compare to the original, but that would be pointless because even a listing containing a sequence of byte values can be assembled correctly. DFB $AD, $ 34 , $ 12 ; This correctly assembles to LDA $1234, but is not really useful.   We need some way to execute  the disassembled code and ensure tha

Apple2TC: an Apple II Binary to C Decompiler - Part 2

This is part 2 of a series of posts describing Apple2TC. Check out Part 1 for an introduction. In this part we describe the different parts of Apple2TC. Apple2TC Components Apple2TC encompasses several sub-projects: id - a simple interactive assembler/disassembler/binary editor for quick exploration/patching during development. a6502 - a two pass 6502 assembler capable of assembling the Apple II+ ROM . a2emu - our custom Apple II emulator, feature complete with the exception of floppy disk support. apple2tc - our tracing disassembler/decompiler. a2emu and apple2tc are the two main components - one producing runtime data, the other consuming it. We will go into more detail about them, while just briefly describing the rest. id is a convenience tool for exploration and patching of Apple II binaries. It can load and save binary images, evaluate simple expressions, disassemble ranges of bytes or print them as words and bytes, or assemble individual instructions and patch bytes.  a6