Apple2TC: an Apple II Binary to C Decompiler

Apple2TC: an Apple II Binary to C Decompiler - Part 7, Using The IR

This is part 7 in a series of posts describing Apple2TC.

The start is in Part 1.

Part 6 introduced our SSA-based Intermediate Representation.

In this part we look at a running IR example and how it is transformed by our decompiler pipeline.

Introducing The Example

To demonstrate the utility of our IR, we have devised a slightly contrived example that nevertheless illustrates challenges encountered in real code.

The source of the example can be seen here: https://github.com/tmikov/apple2tc/blob/master/blog/part6/ex.s . We assembled it with a6502:

a6502 ex.s ex.b33

and disassembled it:

apple2tc ex.b33 --asm > ex.lst

Producing the following listing:

/*0300*/             TSX
/*0301*/             INX
/*0302*/             INX
/*0303*/             LDY    M_0101,X         ; $0101
/*0306*/             INY
/*0307*/             STY    M_90             ; $0090
/*0309*/             LDA    M_91             ; $0091
/*030B*/             STA    M_92             ; $0092
/*030D*/             LDY    #$00             ; $0000
/*030F*/             STY    M_1721           ; $1721
/*0312*/             DEY
/*0313*/             STY    M_93             ; $0093
/*0315*/             BNE    L_0350           ; $0350

Now, let's decompile it into IR and see what we get:

apple2tc ex.b33 ex.b33 --ir > ex-1.s

The generated output is quite voluminous, since, as we saw in the previous part, 6502 instructions break into quite a few steps. It is pretty hard to figure what is happening in that 80-line file, so let's examine only what was generated by the first two instructions TSX and INX:

          %14 = LoadSP
                StoreR8       STATUS_NotZ, %14
          %16 = And8          %14, 0x80
                StoreR8       STATUS_N, %16
                StoreR8       X, %14
/*$0301*/ %19 = LoadR8        X
          %20 = Add8          %19, 0x01
                StoreR8       STATUS_NotZ, %20
          %22 = And8          %20, 0x80
                StoreR8       STATUS_N, %22
                StoreR8       X, %20

The first five IR instructions load the value of SP, update the NotZ and N flags, and store it in X. The next six IR instructions load the value of X (which is a copy of SP), add the literal constant 0x01 to it, update the NotZ and N flags, and finally store the incremented value back into X.

There is a lot of redundancy here: we are unnecessarily updating the flag registers, only to be overwritten. We are also unnecessarily updating the value of X before overwriting it again. All of this is obscuring what the code is actually doing. Let's see whether we can do better.

Local SSA Conversion

The first thing we need to do to improve the situation is SSA conversion. The topic of SSA conversion is pretty involved, as can be gleaned from the link in the previous sentence. Without going into too much detail, it basically eliminates variable loads and stores, replacing them with direct references to the value that ought to be stored in the variable.

Looking at our example above, instruction 19 is loading the value of X, which was just updated in instruction 18 with the value of instruction 14. So, instead of storing %14 into X and then loading X, we can refer to %14 directly, because that's what ended up in X anyway. Then the "StoreR8 X,%14" itself can potentially be eliminated if X is later overwritten. So can the "StoreR8 STATUS_NotZ, %14".

Full SSA conversion deals with this across the body of the entire function, which is a complicated process producing amazing results. Unfortunately, we can't apply it yet, since we haven't recovered the full control flow of the input code (that will happen later). So, for now we apply a simpler version of SSA conversion, working only within individual basic blocks (sequences of instructions without labels or branches in the middle).

Applying this "local SSA conversion" to our IR, we are even able to fit the entire listing here:

          %4  = LoadSP
          %_  = And8          %4, 0x80
/*$0301*/ %6  = Add8          %4, 0x01
          %_  = And8          %6, 0x80
/*$0302*/ %8  = Add8          %6, 0x01
          %_  = And8          %8, 0x80
/*$0303*/ %10 = ZExt8t16      %8
          %11 = Add16         0x0101, %10
          %12 = RamPeek8      %11
          %_  = And8          %12, 0x80
/*$0306*/ %14 = Add8          %12, 0x01
          %_  = And8          %14, 0x80
/*$0307*/       RamPoke8      0x0090, %14
/*$0309*/ %17 = RamPeek8      0x0091
          %_  = And8          %17, 0x80
/*$030B*/       RamPoke8      0x0092, %17
/*$030D*/ %_  = And8          0x00, 0x80
/*$030F*/       RamPoke8      0x1721, 0x00
/*$0312*/ %22 = Sub8          0x00, 0x01
          %_  = And8          %22, 0x80
/*$0313*/       RamPoke8      0x0093, %22
/*$0315*/       JTrue         %22, %BB5, %BB3

Looking at the first two instructions, there is dramatic improvement. The StoreR8 and LoadR8 are gone. The updates of STATUS_NotZ are gone, since they were just storing a copy of the value.

We notice that the calculations of the value of STATUS_N are still there, but they are unused.

Dead Code Elimination

Dead code elimination (commonly referred to as "DCE") is a simple optimization which removes instructions that have no side effects (for example, don't write to memory) and whose result is unused. In our running example, the results of all of the And8 instructions are unused, since they calculate the value of the STATUS_N flag, but that flag itself is never checked.

Applying DCE to our IR we get:

          %4  = LoadSP
/*$0301*/ %5  = Add8          %4, 0x01
/*$0302*/ %6  = Add8          %5, 0x01
/*$0303*/ %7  = ZExt8t16      %6
          %8  = Add16         0x0101, %7
          %9  = RamPeek8      %8
/*$0306*/ %10 = Add8          %9, 0x01
/*$0307*/       RamPoke8      0x0090, %10
/*$0309*/ %12 = RamPeek8      0x0091
/*$030B*/       RamPoke8      0x0092, %12
/*$030F*/       RamPoke8      0x1721, 0x00
/*$0312*/ %15 = Sub8          0x00, 0x01
/*$0313*/       RamPoke8      0x0093, %15
/*$0315*/       JTrue         %15, %BB5, %BB3

This is starting to look quite good. It is about the same length as the original 6502 code, but expresses the code intent much clearer, because many redundant computations (like updating unneeded flags) have been eliminated.

Constant Folding

The most obvious remaining flaw is instruction %15, which is subtracting two constants. It originated in the DEY 6502 instruction. The 0x00 in instruction %15 came from storing 0 in register Y. During SSA conversion we eliminated both the StoreR8 Y, 0x00 and the LoadR8 Y, replacing subsequent references to Y with the 0x00.

This is typical of how SSA conversion enables more transformations in a natural way and how transformations stack together. In this case it exposed that the operands of the Sub8 are actually constant.

Constant folding simply scans the IR for side effect-free instructions with constant operands, performs the calculation and replaces all usages of the instruction with the calculated result. It "folds" the constant operands into a single value. So, we get:

          %4  = LoadSP
/*$0301*/ %5  = Add8          %4, 0x01
/*$0302*/ %6  = Add8          %5, 0x01
/*$0303*/ %7  = ZExt8t16      %6
          %8  = Add16         0x0101, %7
          %9  = RamPeek8      %8
/*$0306*/ %10 = Add8          %9, 0x01
/*$0307*/       RamPoke8      0x0090, %10
/*$0309*/ %12 = RamPeek8      0x0091
/*$030B*/       RamPoke8      0x0092, %12
/*$030F*/       RamPoke8      0x1721, 0x00
/*$0313*/       RamPoke8      0x0093, 0xff
/*$0315*/       Jmp           %BB5

The Sub8 has been eliminated and the RamPoke8 at $0313 is directly using the calculated constant 0xff (0x00 - 0x01). Clearly this was the intent of the original code, which saved a byte by using DEY (1 byte) instead of LDY #$FF (2 bytes).

Something unexpected also happened. The conditional branch at the end of the code used the 6502 instruction BNE. It branches depending on the 6502 Z flag, which is represented in the IR by the logical inversion of the STATUS_NotZ register.

As we mentioned in Part 6, STATUS_NotZ is simply a copy of the value that is being checked for zero/not-zero. In this case, the value was the result of the DEY instruction, which we just constant folded into the constant 0xFF. Said constant was propagated into the JTrue IR instruction:

/*$0315*/       JTrue         0xFF, %BB5, %BB3

Finally, JTrue with a constant operand has been folded into an unconditional branch:

/*$0315*/       Jmp           %BB5

Unquestionably, that was the intent of the original code. It saved a byte by replacing a 3-byte unconditional instruction JMP with a 2-byte conditional BNE, knowing that Z would have been cleared.

This is another example of how our IR allows us to recover the intent of the original code. A series of IR transformations (constant propagation, constant folding, instruction simplification) revealed that the intent of a seemingly conditional branch is to branch unconditionally.

IR In Tree Form

The IR is much more compact and easier to understand in tree form:

/*$0307*/       RamPoke8      0x0090, (Add8 (RamPeek8 (Add16 0x0101 (ZExt8t16 (Add8 (Add8 (LoadSP) 0x01) 0x01)))) 0x01)
/*$030B*/       RamPoke8      0x0092, (RamPeek8 0x0091)
/*$030F*/       RamPoke8      0x1721, 0x00
/*$0313*/       RamPoke8      0x0093, 0xff
/*$0315*/       Jmp           %BB5

In this form it is easy to see that the code is initializing four memory locations. 0x1721 and 0x0093 are initialized with constants. 0x0092 is initialized with the contents of 0x0091. Finally, we can see that the first instruction is reading the contents of the stack with an offset.

More Constant Folding

Looking at the tree form, we immediately notice this:

(Add8 (Add8 (LoadSP) 0x01) 0x01)

Clearly the intent here is to calculate SP + 2 (this is typical for 6502, where increments are single byte instructions, while addition is much more restrictive, requiring the A register and the C flag).

What is needed here is a more sophisticated form of constant folding. Instead of looking only at the two operands of an instruction, we need to check one lever deeper. Basically we want to perform the following transformation (and a few more similar ones):

(Add (Add V Const1) Const2) =>

(Add V (Add Const1 Const2))

Applying this transformation, we obtain the final version of the IR for now:

/*$0307*/       RamPoke8      0x0090, (Add8 (RamPeek8 (Add16 0x0101 (ZExt8t16 (Add8 (LoadSP) 0x02)))) 0x01)
/*$030B*/       RamPoke8      0x0092, (RamPeek8 0x0091)
/*$030F*/       RamPoke8      0x1721, 0x00
/*$0313*/       RamPoke8      0x0093, 0xff
/*$0315*/       Jmp           %BB5

Now we can clearly see that the first instruction is essentially:
*(0x0090) = *(0x101 + (uint8_t)(SP + 2));

To Be Continued

Hopefully by now you are convinced in the utility of our IR. The next part will describe the recovery of flow control.

Tzvetan Mikov's Blog

Search This Blog