Apple2TC: an Apple II Binary to C Decompiler

Apple2TC: an Apple II Binary to C Decompiler - Part 2

This is part 2 of a series of posts describing Apple2TC. Check out Part 1 for an introduction.

In this part we describe the different parts of Apple2TC.

Apple2TC Components

Apple2TC encompasses several sub-projects:

id - a simple interactive assembler/disassembler/binary editor for quick exploration/patching during development.
a6502 - a two pass 6502 assembler capable of assembling the Apple II+ ROM.
a2emu - our custom Apple II emulator, feature complete with the exception of floppy disk support.
apple2tc - our tracing disassembler/decompiler.

a2emu and apple2tc are the two main components - one producing runtime data, the other consuming it. We will go into more detail about them, while just briefly describing the rest.

id is a convenience tool for exploration and patching of Apple II binaries. It can load and save binary images, evaluate simple expressions, disassemble ranges of bytes or print them as words and bytes, or assemble individual instructions and patch bytes.

a6502 is a simple two pass assembler, written to give us the ability to create our own Apple II binaries for testing or for creating binary sequences for patching of existing images (see https://github.com/tmikov/apple2tc/blob/master/decoded/bolo/README.md). While strictly speaking we probably didn't need to write our own assembler, we wanted to avoid the inconvenience of finding and integrating a convenient open source one.

a2emu

a2emu is a vanilla Apple II software emulator. It really doesn't try to be clever or efficient. However it is very portable and literally has no dependencies. It is basically a loop, reading an opcode from memory, with a big switch statement for every possible instruction. Cases look something like this:

case 0x51: // EOR (ind),Y
  a_ = updateNZ(a_ ^ peek_ind_y(OP8()));
  pc_ += 2;
  break;

case 0x0A: { // ASL A
  a_ = updateNZC(a_ << 1);
  pc_ += 1;
  break;
}

Each case implement the functionality of the corresponding instruction in a very straight forward manner. Registers are global variables (if we really cared about performance we would make them local, but that would make things a little more complex - it is already more than fast enough).

a2emu: CPU Flags

There are a few helper functions like updateNZ() and updateNZC() to update the emulated CPU flags with accordance to the 6502 instruction. For example, updateNZ() sets the zero flag (Z) and the negative flag (N) depending on whether its parameter is zero or negative respectively.

a2emu: Memory Access

Other helper functions read and write memory (we call them peek and poke out of respect to Applesoft BASIC), read and write the zero page, push and pop the stack.

When accessing memory, we need to account for ROM and for IO space. We are not trying to be clever about it, so we do a straight forward check inside the implementation of peek and poke, like this:

/// Write a 16-bit into memory, iospace, swoft switches, etc.
void poke(uint16_t addr, uint8_t value) {
  if (addr >= ioRangeStart_ && addr <= ioRangeEnd_)
    ioPoke(addr, value);
  else if (addr < romStart_)
    ram_[addr] = value;
}

If the write is to I/O space, it is forwarded to a separate routine dealing with emulating Apple II I/O. If the write is to ROM space, it is simply ignored.

The range of I/O addresses and the I/O routine itself are configurable - for good design we have split the emulation of the 6502 CPU itself from the emulation of the Apple II system.

a2emu: Cycle Accuracy

Many existing Apple II emulators have cycle accuracy (or perhaps even sub-cycle accuracy), meaning they keep track of exactly how many cycles every instruction takes, depending on details like addressing mode and whether the effective address crosses a page boundary and so on.

While this is perfectly doable and not that hard, we are focused on simplicity here. We decided that we don't really need such accuracy, especially given our goal to decompile 6502 into C. So, we eyeballed the list of 6502 instructions and timings for about 5 seconds and decided to just assume that every instruction takes exactly three cycles.

To this day we have no idea how close that value is to the actual average value of instruction cycles, but sound works well in our emulator and the performance visually appears similar to other emulators, so we are not that far off. (We expected that if the value was too low we would appear to be faster and if it was too high we would appear to be slower. Neither of these appears to be the case. Yes, 6502 is that simple to emulate!)

a2emu: I/O Handling

I/O on the Apple II is entirely memory mapped, in other words accessing different addresses in the IO range produces certain results. Handling this is very simple with a switch statement that looks like this:

switch (addr & 0xCFF0) {
case A2_KBD:
  if (io->debug & A2_DEBUG_IO1)
    fprintf(stdout, "[%u] KBD\n", cycles);
  return kbd(io);
case A2_KBDSTRB:
  if (io->debug & A2_DEBUG_IO2)
    fprintf(stdout, "[%u] KBDSTRB\n", cycles);
  kbdstrb(io);
  break;
case A2_TAPEOUT:
  if (io->debug & A2_DEBUG_IO1)
    fprintf(stdout, "[%u] TAPEOUT\n", cycles);
  break;
case A2_SPKR:
  if (io->debug & A2_DEBUG_IO1)
    fprintf(stdout, "[%u] SPKR\n", cycles);
  if (io->spkr_cb)
    io->spkr_cb(io->spkr_cb_ctx, cycles);
  break;

It would be conceptually simple to add floppy disk support (ignoring the "minor" detail of understanding how the Apple II floppy controller worked).

a2emu: Text and Graphics Output

The Apple II had an ... ahem ... interesting organization of video memory. It looks like it was created in order to torture programmers (in reality it was necessary in order to keep the hardware as cheap as possible). The exact details are not relevant here, but they are described starting here for text mode.

Still video memory is just part of regular RAM, with certain address ranges corresponding to page 0 and page 1 in different video modes: text, low graphics, high graphics.

The emulator is a regular event driven GUI application running on MacOS, Windows or Linux. It is asked to refresh its screen periodically (say 30 or 60 times per second) and at that time it simply decodes the entire active Apple II video memory. The memory is so small that it literally doesn't matter performance-wise.

By the way, credit and gratitude to the amazing Sokol library at https://github.com/floooh/sokol, which our emulator is using for cross platform GUI, graphics and sound support.

a2emu: Sound

Apple II had the simplest possible sound system - it literally could not be simpler. It is controlled by a single bit with flips the speaker membrane in the opposite direction. In order to generate sound (a square wave), one has to access a certain IO address repeatedly.

Emulating that is pretty simple. As we already described, we are keeping an "approximate" emulated cycle count so we know what the emulated time is supposed to be whenever the speaker bit is accessed. We simply generate the necessary number of samples into a sound queue which is asynchronously emptied by a sound callback.

a2emu: Conclusion

This covers most of the implementation of a2emu as it relates to actual emulation. Collection of data for decompilation purposes will be covered separately.

apple2tc

The apple2tc tool started its life as a regular offline disassembler. This is the part of its functionality that we will describe here. We will describe the dynamic parts separately as they relate to a2emu as well.

As a tracing disassembler, it couldn't be simpler. It starts from a known entry point and disassembles instructions sequentially following jumps recursively when they are encountered. (The implementation is not actually recursive, it uses a work list.)

This approach mostly guarantees that the disassembler will encounter only valid code, since it always follows valid branches starting from a known valid entry point. Unfortunately the qualifier "mostly" turns out to be sufficiently bad in practice as to dramatically complicate our lives. But more on that later. For now we can assume that it identifies valid code.

It actually does a pretty good job of identifying and disassembling a lot of code, but not nearly all of it. The problem is, of course, that it can't deal with indirect branches - when it encounters a dynamic branch, it doesn't know what the destination will be at runtime, so it just stops.

This is a sample of what the disassembled code looks like:

/*FA62*/ START:                  ; xref $FA62
/*FA62*/             CLD
/*FA63*/             JSR    L_FE84           ; $FE84
/*FA66*/             JSR    L_FB2F           ; $FB2F
/*FA69*/             JSR    L_FE93           ; $FE93
/*FA6C*/             JSR    L_FE89           ; $FE89
/*FA6F*/             LDA    $C058            ; $C058
/*FA72*/             LDA    $C05A            ; $C05A
/*FA75*/             LDA    $C05D            ; $C05D
/*FA78*/             LDA    $C05F            ; $C05F
/*FA7B*/             LDA    $CFFF            ; $CFFF
/*FA7E*/             BIT    $C010            ; $C010
/*FA81*/             CLD

It is pretty exciting seeing real disassembled Apple II code, though of course it is not that useful yet. We will describe how we made it useful in the next part.

To Be Continued

This is the end of Part 2, which described the "conventional" parts of the project. Part 1 was an introduction, and Part 3 covers how we validate our disassembly by generating working C code!

Tzvetan Mikov's Blog

Search This Blog