Apple2TC: an Apple II Binary to C Decompiler

Apple2TC: an Apple II Binary to C Decompiler - Part 4

This is part 4 in a series of posts describing Apple2TC.

The start is in Part 1.

In this part we finally get to the interesting part and describe the Apple II hacks we found while trying to run the decompiled version of Apple II BASIC.

Running the Decompiled Code

In the previous part we described our technique for generating a "Simple C" representation of the disassembled 6502 code. Now it is time to actually build and run the generated code!

Using our a2io and decapplib libraries, it is trivial to link the generated Simple C code into an executable.

We started by decompiling and running the Apple II ROM code, containing Applesoft BASIC and Monitor, because its disassembled source is already available and heavily commented (thanks to Andy McFadden). We can "cheat" and look at the listing to check whether we are doing something wrong.

Problem 1: Indirect Branches

In our first attempt, we decompiled the ROM statically and used the Simple C backend to convert it to C. The generated code runs for a little bit, but crashes pretty early with

Unknown code address: $FDF0

Fortunately we have augmented the decompiled code to print the address and register state at the beginning of every basic block, so we can get an idea of what is happening. The last couple of lines printed are:

FF3A: A=00 X=38 Y=1B SP=FD SR=......Z. PC=FF3A

FDED: A=87 X=38 Y=1B SP=FD SR=N....... PC=FDED

Searching for this address in https://6502disassembly.com/a2-rom/APPLE2.ROM.html immediately reveals the problem:

                   ; ==============================================================================
                   ; Character Output Routine: Jump (Indirectly via Vector) to User Output Routine
                   ; ==============================================================================
                   ; 
FDED: 6C 36 00     COUT            JMP   (CSWL)          ;Print Accumulator (A) to Output Device

We have an indirect branch, which our disassembler couldn't trace, so we attempted to jump to a branch target that we did not know about.

This was expected - it is notoriously difficult to deal with indirect branches and jump tables when performing automatic disassembly. The disassembler has to rely on various heuristics and complicated analysis to detect jump tables with varying degrees with success. That is why the core idea of our project is to rely on data collected at runtime instead. Runtime data gives us complete accuracy since it tells us what actually happened when the code ran.

Our emulator, a2emu, was designed to collect runtime data via an optional "debug" callback, invoked before every instruction. The callback can decode the instruction, examine its operands, and decide what and when to log. We expected that we would gradually add more and more data collection to the callback, as we identified the kind of runtime data we needed.

So, to start with, we add a check whether the current instruction is any kind of branch. If it is, we evaluate the value of its operand (the branch target), and save it in a std::unordered_set<>. When execution completes, we save the set to a JSON file.

Later, the disassembler loads the file, adds the recorded branch targets to its existing set, and traces the new ones recursively, as them may lead to identifying new targets and code blocks. It tells us that 22 new labels were identified at runtime, so our efforts weren't for naught!

To our complete shock, running the newly generated code proceeds much farther, as can be seen in this screenshot. It prints the Apple II name and we can actually type on the input line!

The problem is, unfortunately, as soon as we press Enter, we crash and burn with another error...

Problem 2: Dynamically Generated Code

This is what our log shows:

D444: A=00 X=FF Y=01 SP=F8 SR=........ PC=D444

00B1: A=00 X=FF Y=01 SP=F6 SR=.......C PC=00B1

Warning: INVALID at $00B1

In other words, we jumped to address $B1 in the zero page, where naturally we had not disassembled any code! We were genuinely surprised. What is Applesoft BASIC doing branching to the zero page? Isn't the zero page supposed to a very precious scarce resource for important variables? Are we putting actual code in there? Or is this a bug in our decompiler?

Looking for D444 in the disassembly shows us the code lading to it:

D444: 86 B8                        STX   TXTPTR          ;Set up CHRGET to Scan the Input Line
D446: 84 B9                        STY   TXTPTR+1        ;[(X,Y)={Low,High}]= CHRGET Next Chr/Tok Ptr 
D448: 46 D8                        LSR   ERRFLG          ;Defeat ONERR: Clear the Error Flag
D44A: 20 B1 00                     JSR   CHRGET          ;Get a Character

CHRGET itself is defined like this:

                   CHRGET          EQU   $B1             ;~$C8: Get Next Char/Token ZP-Routine (24B)

Aha, so it is not a bug! The ROM really is not only branching, but executing a subroutine at $B1, which it must have put there somehow!

This was surprising - while we expected to encounter some sorts of "shenanigans", we weren't fully mentally prepared to deal with code generated on the fly at runtime! Is all hope lost? Is this the end of our project?

Fortunately, we decided to not give up yet. Every problem has a solution. After all, we can record literally everything that happens in the emulated Apple II. Surely there is a way we can deal with this too?

In our emulator we can track when every byte of RAM is written to, and we can track when every instruction is executed. Perhaps we can track them simultaneously and detect regions of code that were first generated and then executed? Then we can record the contents of these regions and the disassembler can load them?

It is not perfect, but it is a starting point. So, we get to work and enhance the emulator to keep track of memory generations. Each generation contains all bytes that were written to and then executed without modifications in between. Here is a hypothetical sequence of events:

t1. Modify $1000..$1001

t2. Modify $0300..$0305

t3. Modify $1002..$1003

t4. Modify $0305..$0320

t5. Execute $0300..$0310

t6. Modify $1004

t7. Modify $0300

At point t7, we are modifying a byte that was previously modified and executed. This causes us to close the previous generation of memory. It contains all bytes that were both modified and executed, which in this case is the range $0300..$0310. We record that range, and start a new generation.

This approach allows us to track ranges of code that are generated at once. It would even work for dynamically loading parts of the code from disk, because first, lots of bytes are modified, then executed without modification.

We execute the enhanced emulator and lo and behold, it has automatically identified a 24 byte executable range at address $B1! It was a proud moment, since it truly happened automatically, without encoding any specific knowledge in the emulator!

"code": [
  {
    "addr": 177,
    "bytes": [
      230, 184, 208, 2, 230, 185, 173, 4, 2, 201, 58, 176, 10, 201, 32,
      240, 239, 56, 233, 48, 56, 233, 208, 96
    ]
  }
],

We update the decompiler to simply load the generated ranges, run it ... and again crash right away after pressing Enter, but this time with a different error:

DB00: A=0D X=10 Y=08 SP=F4 SR=.......C PC=DB00
D41C: A=F2 X=10 Y=08 SP=F6 SR=N......C PC=D41C
Unknown code address: $DB5A

Problem 3: Overlapping Instructions

Well, this is very strange. Examining the assembly listing produced by our own disassembler, we see that $DB5A is not a valid address!

/*DB57*/             LDA    #M_20            ; $0020
/*DB59*/             BIT    M_3FA9           ; $3FA9
/*DB5C*/ L_DB5C:                  ; xref $D423 $D810 $DAFD $DB49
/*DB5C*/             ORA    #$80             ; $0080
/*DB5E*/             CMP    #M_A0            ; $00A0
/*DB60*/             BCC    L_DB64           ; $DB64

$DB5A is in the middle of the BIT instruction! What the heck is going on? Is the entire listing incorrect? Doesn't look like it, the surrounding instructions seem legit.

Fortunately, the commented Apple II ROM listing is here to save us again!

DB57: A9 20        OUTSP           LDA   #' '            ;Print a Space via COUT
DB59: 2C                           DFB   BIT_ABS         ;Fake BIT OpCode to skip next line

                   ; ==============================================================================
                   ; Print a Question Mark Character:
                   ; ==============================================================================
DB5A: A9 3F        OUTQUES         LDA   #'?'            ;Print a Question Mark via COUT
                   ; 
                   ; ==============================================================================
                   ; Print Character in A-Reg
                   ; ==============================================================================
                   ; Note: POKE 243,32 [$20 in $F3 (FLASHBIT)] will convert output to lower case.
                   ;       This can be cancelled by: POKE 243,0 ($00), NORMAL ($00), INVERSE ($00);
                   ;       or FLASH ($40), POKE 243,64.
                   ; ==============================================================================
DB5C: 09 80        OUTDO           ORA   #%10000000      ;Set High Bit; Make Character High ASCII

Let's see what is going on there... They literally encoded the LDA inside the operand of the BIT instruction, so if you execute BIT, the LDA is skipped. However if you branch directly to the LDA, it executes normally. So, this is just a clever way to skip over the LDA when coming from one place, while executing it if coming from another.

Ordinarily dong this would require an unconditional branch, which is one or possibly two extra bytes. Discoveries like this make all of this worth it!

Fortunately, this is not that difficult to deal with in our disassembler, we don't even need runtime data, now that we know that it is happening. We simply need another disassembler pass checking for misaligned labels (labels inside an instruction). We record them, and later, when emitting Simple C code, emit both cases. Specifically for the code above:

case 0xdb57: // [$DB57..$DB59]    3 bytes
  /* $DB57 LDA */ s_a = update_nz(0x20);
  /* $DB59 BIT */ tmp = peek(0x3fa9), s_status = (s_status & ~(0xC0 | STATUS_Z)) | (tmp & 0xC0) | (s_a & tmp ? 0 : STATUS_Z);
  s_pc = 0xdb5c;
  break;
// WARNING: simple misaligned label
case 0xdb5a: // [$DB5A..$DB5B]    2 bytes
  /* $DB5A LDA */ s_a = update_nz(0x3f);
  s_pc = 0xdb5c;
  break;
case 0xdb5c: // [$DB5C..$DB61]    6 bytes
  /* $DB5C ORA */ s_a = update_nz(s_a | 0x80);

So, with this modification, we recompile the binary and run it!

This time it doesn't crash at Enter! In fact it doesn't crash at all (at least not right away), though it isn't behaving correctly either.

Still, it is progress!

Problem 4: Self-modifying Code

This problem was quite tricky to identify, though in retrospect it seems obvious. We hadn't really thought through why Applesoft BASIC chose to put the routine CHRGET in the zero page. We didn't even look closely at the code, we were so happy when we saw improvement that we didn't think twice about it.

The answer is, of course, they had a very good reason. They placed the routine in writable memory, so it can be modified! Otherwise they would have left it in ROM. Even looking at our own assembler listing makes it obvious:

/*00B1*/ L_00B1:
/*00B1*/             INC    M_B8             ; $00B8
/*00B3*/             BNE    L_00B7           ; $00B7
/*00B5*/             INC    M_B9             ; $00B9
/*00B7*/ L_00B7:
/*00B7*/             LDA    M_0204           ; $0204
/*00BA*/             CMP    #$3A             ; $003A

The instruction at address $00B1 is modifying the instruction at address $00B7! The subroutine is modifying itself! We have to admit, we were ecstatic to discover this. Those folks really knew what they were doing!

Examining the documented listing makes it even more clear - the variable pointing to the next BASIC character is at address $00B8. It is literally inside the LDA instruction.

                   TXTPTR          EQU   $B8             ;CHRGET's Next Char/Token Pointer      (2B)

One can't help but admire it. It wastes no cycles, no extra registers. That is how they were able to make software run on such a slow computer.

Fortunately for us, this is also fairly easy to handle. It turns out that we can statically identify when instructions are modifying the operands of other instructions.

The distinction between modifying the first byte versus the operands is important! If it is the former, we are in trouble, since we can't (easily and reliably) predict what the modified instruction is. But if it is the latter, it is possible to deal with. Since we have identified the instruction whose operand is being modified, we can generate slightly different code for it.

In this case, the instruction in question is:

/*00B7*/             LDA    M_0204           ; $0204

Before we knew it was modified, we generated this:

/* $00B7 LDA */ s_a = update_nz(peek(0x0204));

Now that we have identified it, we generate this:

// WARNING: operand self modification.
/* $00B7 LDA */ s_a = update_nz(peek(ram_peek16(0x00b8)));

Instead of embedding the disassembled operand as a literal in the emitted code, we fetch it from RAM where it would have been modified. Simple!

Armed with this self-modification detection, we build it an run it again.

Amazing! This is actual BASIC running without emulation, performing calculations.

Really excited, we try typing a FOR loop, and it crashes right away. In fact, entering almost any other command, besides HOME and PRINT, crashes.

Problem 5: Jump to a Subroutine

When we enter the command HGR, we get the following log:

D832: A=11 X=04 Y=06 SP=F6 SR=N....... PC=D832

00B1: A=E1 X=04 Y=22 SP=F4 SR=N....... PC=00B1

00B7: A=E1 X=04 Y=22 SP=F4 SR=........ PC=00B7

00BE: A=00 X=04 Y=22 SP=F4 SR=N....... PC=00BE

00C2: A=00 X=04 Y=22 SP=F4 SR=N....... PC=00C2

Unknown code address: $F3E2

Looking at the documented disassembly, we see that $F3E2 is indeed the address of the HGR subroutine:

F3E2: A9 20        HGR             LDA   #>HGR1SCRN      ;Get Hi-Res Screen Pg.1 Base-Address, High
F3E4: 2C 54 C0                     BIT   TXTPAGE1        ;Display Text Page1; R/W Main V-RAM

But why don't we know its address? When we "trained" our emulator, we made sure to execute almost every BASIC command, to allow the emulator to "collect" all addresses for the decompiler.

The error log provides an explanation. The error occurs when executing the RTS instruction of CHRGET (the "notorious" self-modifying routine in the zero page). So, we don't know the address to return to. But why? The previous address in the log is $D832. What do we have there?

D832: 0A                           ASL   A               ;YES, Double A-Reg to get Index
D833: A8                           TAY                   ;Set Indirect Addressing Index
D834: B9 01 D0                     LDA   TKADTBL+1,Y     ;Get Routine Address, High
D837: 48                           PHA                   ;Push Routine Address, High     [High 1st]
D838: B9 00 D0                     LDA   TKADTBL,Y       ;Get Routine Address, Low
D83B: 48                           PHA                   ;Push Routine Address, Low       [Low 2nd]
D83C: 4C B1 00                     JMP   CHRGET          ;Go get Next Char/Token; RTS to Routine

Wow, they tricked us again! Instead of using a JSR to call the subroutine, they are explicitly pushing the address the subroutine should return to, and then calling it with a JMP! That's why we don't know the address!

Fortunately, this is surmountable. We modify our emulator to record RTS addresses as branch target, and voila! We run the newly decompiled ROM and finally we have a fully functioning Apple II BASIC environment without emulation!

It took quite a few steps, but we finally got there. Now that we have a validated disassembly, the next step is to convert it to readable C.

To Be Continued

In Part 5 we will describe successfully decompiling and running two Apple II games.

Tzvetan Mikov's Blog

Search This Blog