Skip to main content

Apple2TC: an Apple II Binary to C Decompiler - Part 2

This is part 2 of a series of posts describing Apple2TC. Check out Part 1 for an introduction.

In this part we describe the different parts of Apple2TC.

Apple2TC Components

Apple2TC encompasses several sub-projects:
  • id - a simple interactive assembler/disassembler/binary editor for quick exploration/patching during development.
  • a6502 - a two pass 6502 assembler capable of assembling the Apple II+ ROM.
  • a2emu - our custom Apple II emulator, feature complete with the exception of floppy disk support.
  • apple2tc - our tracing disassembler/decompiler.
a2emu and apple2tc are the two main components - one producing runtime data, the other consuming it. We will go into more detail about them, while just briefly describing the rest.

id is a convenience tool for exploration and patching of Apple II binaries. It can load and save binary images, evaluate simple expressions, disassemble ranges of bytes or print them as words and bytes, or assemble individual instructions and patch bytes. 

a6502 is a simple two pass assembler, written to give us the ability to create our own Apple II binaries for testing or for creating binary sequences for patching of existing images (see https://github.com/tmikov/apple2tc/blob/master/decoded/bolo/README.md). While strictly speaking we probably didn't need to write our own assembler, we wanted to avoid the inconvenience of finding and integrating a convenient open source one.

a2emu

a2emu is a vanilla Apple II software emulator. It really doesn't try to be clever or efficient. However it is very portable and literally has no dependencies. It is basically a loop, reading an opcode from memory, with a big switch statement for every possible instruction. Cases look something like this:
case 0x51: // EOR (ind),Y
a_ = updateNZ(a_ ^ peek_ind_y(OP8()));
pc_ += 2;
break;

case 0x0A: { // ASL A
a_ = updateNZC(a_ << 1);
pc_ += 1;
break;
}
Each case implement the functionality of the corresponding instruction in a very straight forward manner. Registers are global variables (if we really cared about performance we would make them local, but that would make things a little more complex - it is already more than fast enough).

a2emu: CPU Flags

There are a few helper functions like updateNZ() and updateNZC() to update the emulated CPU flags with accordance to the 6502 instruction. For example, updateNZ() sets the zero flag (Z) and the negative flag (N) depending on whether its parameter is zero or negative respectively.

a2emu: Memory Access

Other helper functions read and write memory (we call them peek and poke out of respect to Applesoft BASIC), read and write the zero page, push and pop the stack. 

When accessing memory, we need to account for ROM and for IO space. We are not trying to be clever about it, so we do a straight forward check inside the implementation of peek and poke, like this:
/// Write a 16-bit into memory, iospace, swoft switches, etc.
void poke(uint16_t addr, uint8_t value) {
if (addr >= ioRangeStart_ && addr <= ioRangeEnd_)
ioPoke(addr, value);
else if (addr < romStart_)
ram_[addr] = value;
}
If the write is to I/O space, it is forwarded to a separate routine dealing with emulating Apple II I/O. If the write is to ROM space, it is simply ignored.  

The range of I/O addresses and the I/O routine itself are configurable - for good design we have split the emulation of the 6502 CPU itself from the emulation of the Apple II system.

a2emu: Cycle Accuracy

Many existing Apple II emulators have cycle accuracy (or perhaps even sub-cycle accuracy), meaning they keep track of exactly how many cycles every instruction takes, depending on details like addressing mode and whether the effective address crosses a page boundary and so on.

While this is perfectly doable and not that hard, we are focused on simplicity here. We decided that we don't really need such accuracy, especially given our goal to decompile 6502 into C. So, we eyeballed the list of 6502 instructions and timings for about 5 seconds and decided to just assume that every instruction takes exactly three cycles.

To this day we have no idea how close that value is to the actual average value of instruction cycles, but sound works well in our emulator and the performance visually appears similar to other emulators, so we are not that far off. (We expected that if the value was too low we would appear to be faster and if it was too high we would appear to be slower. Neither of these appears to be the case. Yes, 6502 is that simple to emulate!) 

a2emu: I/O Handling 

I/O on the Apple II is entirely memory mapped, in other words accessing different addresses in the IO range produces certain results. Handling this is very simple with a switch statement that looks like this:
switch (addr & 0xCFF0) {
case A2_KBD:
if (io->debug & A2_DEBUG_IO1)
fprintf(stdout, "[%u] KBD\n", cycles);
return kbd(io);
case A2_KBDSTRB:
if (io->debug & A2_DEBUG_IO2)
fprintf(stdout, "[%u] KBDSTRB\n", cycles);
kbdstrb(io);
break;
case A2_TAPEOUT:
if (io->debug & A2_DEBUG_IO1)
fprintf(stdout, "[%u] TAPEOUT\n", cycles);
break;
case A2_SPKR:
if (io->debug & A2_DEBUG_IO1)
fprintf(stdout, "[%u] SPKR\n", cycles);
if (io->spkr_cb)
io->spkr_cb(io->spkr_cb_ctx, cycles);
break
It would be conceptually simple to add floppy disk support (ignoring the "minor" detail of understanding how the Apple II floppy controller worked).

a2emu: Text and Graphics Output

The Apple II had an ... ahem ... interesting organization of video memory. It looks like it was created in order to torture programmers (in reality it was necessary in order to keep the hardware as cheap as possible). The exact details are not relevant here, but they are described starting here for text mode.

Still video memory is just part of regular RAM, with certain address ranges corresponding to page 0 and page 1 in different video modes: text, low graphics, high graphics.

The emulator is a regular event driven GUI application running on MacOS, Windows or Linux. It is asked to refresh its screen periodically (say 30 or 60 times per second) and at that time it simply decodes the entire active Apple II video memory. The memory is so small that it literally doesn't matter performance-wise.

By the way, credit and gratitude to the amazing Sokol library at https://github.com/floooh/sokol, which our emulator is using for cross platform GUI, graphics and sound support. 

a2emu: Sound  

Apple II had the simplest possible sound system - it literally could not be simpler. It is controlled by a single bit with flips the speaker membrane in the opposite direction. In order to generate sound (a square wave), one has to access a certain IO address repeatedly. 

Emulating that is pretty simple. As we already described, we are keeping an "approximate" emulated cycle count so we know what the emulated time is supposed to be whenever the speaker bit is accessed. We simply generate the necessary number of samples into a sound queue which is asynchronously emptied by a sound callback.

a2emu: Conclusion

This covers most of the implementation of a2emu as it relates to actual emulation. Collection of data for decompilation purposes will be covered separately.

apple2tc

The apple2tc tool started its life as a regular offline disassembler. This is the part of its functionality that we will describe here. We will describe the dynamic parts separately as they relate to a2emu as well.

As a tracing disassembler, it couldn't be simpler. It starts from a known entry point and disassembles instructions sequentially following jumps recursively when they are encountered. (The implementation is not actually recursive, it uses a work list.)

This approach mostly guarantees that the disassembler will encounter only valid code, since it always follows valid branches starting from a known valid entry point. Unfortunately the qualifier "mostly" turns out to be sufficiently bad in practice as to dramatically complicate our lives. But more on that later. For now we can assume that it identifies valid code.

It actually does a pretty good job of identifying and disassembling a lot of code, but not nearly all of it. The problem is, of course, that it can't deal with indirect branches - when it encounters a dynamic branch, it doesn't know what the destination will be at runtime, so it just stops.

This is a sample of what the disassembled code looks like:
/*FA62*/ START:                  ; xref $FA62
/*FA62*/ CLD
/*FA63*/ JSR L_FE84 ; $FE84
/*FA66*/ JSR L_FB2F ; $FB2F
/*FA69*/ JSR L_FE93 ; $FE93
/*FA6C*/ JSR L_FE89 ; $FE89
/*FA6F*/ LDA $C058 ; $C058
/*FA72*/ LDA $C05A ; $C05A
/*FA75*/ LDA $C05D ; $C05D
/*FA78*/ LDA $C05F ; $C05F
/*FA7B*/ LDA $CFFF ; $CFFF
/*FA7E*/ BIT $C010 ; $C010
/*FA81*/ CLD 
It is pretty exciting seeing real disassembled Apple II code, though of course it is not that useful yet. We will describe how we made it useful in the next part.

To Be Continued

This is the end of Part 2, which described the "conventional" parts of the project. Part 1 was an introduction, and Part 3 covers how we validate our disassembly by generating working C code!

Comments

Popular posts from this blog

You Don't Like Google's Go Because You Are Small

When you look at Google's presentations about Go, they are not shy about it. Go is about very smart people at Google solving very BIG problems. They know best. If you don't like Go, then you are small and are solving small problems. If you were big (or smart), you would surely like it. For example, you might naively think that printing the greater of two numbers should be as simple as std::cout << max(b(),c()) That is because you think small. What you really should want is: t1 := b() if t2 := c(); t1 < t2 { t1 = t2 } fmt.Print( t1 ) Isn't it much better? We didn't have to type all those extra semicolons that were killing productivity before. If you don't like it, you are small. If you wanted to extract an attribute of an optional parameter, you may be used to typing something like: a = p ? p->a : 0; or even: a = p && p->a You just make me sad because obviously what you really want is: a = 0 if p != nil { a = p-...

How to speed up a micro-benchmark 300x

How to speed up a ubenchmark 300x Static Hermes: How to Speed Up a Micro-benchmark by 300x Without Cheating This is the first of a series of light blog posts about Static Hermes, covering topics that I find interesting or amusing. It is not intended to be a systematic introduction to Static Hermes design.Consider it more like a backstage pass to the quirks, features, and “aha!” moments that make Static Hermes unique. If you’re not familiar with Static Hermes, it’s an evolving project we’re working on to explore the confluence of static typing and JavaScript. It is work in progress, but we’re excited about the possibilities it opens up for performance improvements and more. For more background: Tweet with the slide deck of the Static Hermes announcement Previous talk about Hermes Contents: Meet interp-dispatch.js Let’s Run It! Static Hermes with Untyped Code Helping the Compiler Other Ways to Help the Compiler Some Observations Revisiting The Origina...

Apple2TC: an Apple II Binary to C Decompiler - Part 1

This is a series of blog posts to serve as a log documenting my work on Apple2TC - an open source hobby project developed on GitHub:  https://github.com/tmikov/apple2tc . There are various interesting things that come up all the time and I thought it might be useful to record them for posterity. Part 2  describes the different components of the project Part 3 shows how we validate our disassembly by generating a working C representation of it. Part 4 shows the Apple II tricks we had to deal with to get to correct running code. Part 5 describes decompiling and running Robotron 2084 and Snake Byte. Part 6 introduces our SSA-based intermediate representation . Part 7 shows how to transform the IR to discover the original intent of the code.   What is Apple2TC? Apple2TC is an open source project to decompile original Apple II binaries, mostly games, into understandable and working modern C code,  completely automatically  by analyzing the runtime behavior...