Skip to main content

Apple2TC: an Apple II Binary to C Decompiler - Part 7, Using The IR

 This is part 7 in a series of posts describing Apple2TC. 

The start is in Part 1.

Part 6 introduced our SSA-based Intermediate Representation.

In this part we look at a running IR example and how it is transformed by our decompiler pipeline.

Introducing The Example

To demonstrate the utility of our IR, we have devised a slightly contrived example that nevertheless illustrates challenges encountered in real code. 

The source of the example can be seen here: https://github.com/tmikov/apple2tc/blob/master/blog/part6/ex.s . We assembled it with a6502:

a6502 ex.s ex.b33

and disassembled it:

apple2tc ex.b33 --asm > ex.lst

Producing the following listing:
/*0300*/             TSX
/*0301*/ INX
/*0302*/ INX
/*0303*/ LDY M_0101,X ; $0101
/*0306*/ INY
/*0307*/ STY M_90 ; $0090
/*0309*/ LDA M_91 ; $0091
/*030B*/ STA M_92 ; $0092
/*030D*/ LDY #$00 ; $0000
/*030F*/ STY M_1721 ; $1721
/*0312*/ DEY
/*0313*/ STY M_93 ; $0093
/*0315*/ BNE L_0350 ; $0350
Now, let's decompile it into IR and see what we get:

apple2tc ex.b33 ex.b33 --ir > ex-1.s

The generated output is quite voluminous, since, as we saw in the previous part, 6502 instructions break into quite a few steps. It is pretty hard to figure what is happening in that 80-line file, so let's examine only what was generated by the first two instructions TSX and INX:
          %14 = LoadSP
StoreR8 STATUS_NotZ, %14
%16 = And8 %14, 0x80
StoreR8 STATUS_N, %16
StoreR8 X, %14
/*$0301*/ %19 = LoadR8 X
%20 = Add8 %19, 0x01
StoreR8 STATUS_NotZ, %20
%22 = And8 %20, 0x80
StoreR8 STATUS_N, %22
StoreR8 X, %20
The first five IR instructions load the value of SP, update the NotZ and N flags, and store it in X. The next six IR instructions load the value of X (which is a copy of SP), add the literal constant 0x01 to it, update the NotZ and N flags, and finally store the incremented value back into X.

There is a lot of redundancy here: we are unnecessarily updating the flag registers, only to be overwritten. We are also unnecessarily updating the value of X before overwriting it again. All of this is obscuring what the code is actually doing. Let's see whether we can do better.

Local SSA Conversion

The first thing we need to do to improve the situation is SSA conversion. The topic of SSA conversion is pretty involved, as can be gleaned from the link in the previous sentence. Without going into too much detail, it basically eliminates variable loads and stores, replacing them with direct references to the value that ought to be stored in the variable.

Looking at our example above, instruction 19 is loading the value of X, which was just updated in instruction 18 with the value of instruction 14. So, instead of storing %14 into X and then loading X, we can refer to %14 directly, because that's what ended up in X anyway. Then the "StoreR8 X,%14" itself can potentially be eliminated if X is later overwritten. So can the "StoreR8 STATUS_NotZ, %14".

Full SSA conversion deals with this across the body of the entire function, which is a complicated process producing amazing results. Unfortunately, we can't apply it yet, since we haven't recovered the full control flow of the input code (that will happen later). So, for now we apply a simpler version of SSA conversion, working only within individual basic blocks (sequences of instructions without labels or branches in the middle).

Applying this "local SSA conversion" to our IR, we are even able to fit the entire listing here:
          %4  = LoadSP
%_ = And8 %4, 0x80
/*$0301*/ %6 = Add8 %4, 0x01
%_ = And8 %6, 0x80
/*$0302*/ %8 = Add8 %6, 0x01
%_ = And8 %8, 0x80
/*$0303*/ %10 = ZExt8t16 %8
%11 = Add16 0x0101, %10
%12 = RamPeek8 %11
%_ = And8 %12, 0x80
/*$0306*/ %14 = Add8 %12, 0x01
%_ = And8 %14, 0x80
/*$0307*/ RamPoke8 0x0090, %14
/*$0309*/ %17 = RamPeek8 0x0091
%_ = And8 %17, 0x80
/*$030B*/ RamPoke8 0x0092, %17
/*$030D*/ %_ = And8 0x00, 0x80
/*$030F*/ RamPoke8 0x1721, 0x00
/*$0312*/ %22 = Sub8 0x00, 0x01
%_ = And8 %22, 0x80
/*$0313*/ RamPoke8 0x0093, %22
/*$0315*/ JTrue %22, %BB5, %BB3
Looking at the first two instructions, there is dramatic improvement. The StoreR8 and LoadR8 are gone. The updates of STATUS_NotZ are gone, since they were just storing a copy of the value.

We notice that the calculations of the value of STATUS_N are still there, but they are unused.

Dead Code Elimination

Dead code elimination (commonly referred to as "DCE") is a simple optimization which removes instructions that have no side effects (for example, don't write to memory) and whose result is unused. In our running example, the results of all of the And8 instructions are unused, since they calculate the value of the STATUS_N flag, but that flag itself is never checked.

Applying DCE to our IR we get:
          %4  = LoadSP
/*$0301*/ %5 = Add8 %4, 0x01
/*$0302*/ %6 = Add8 %5, 0x01
/*$0303*/ %7 = ZExt8t16 %6
%8 = Add16 0x0101, %7
%9 = RamPeek8 %8
/*$0306*/ %10 = Add8 %9, 0x01
/*$0307*/ RamPoke8 0x0090, %10
/*$0309*/ %12 = RamPeek8 0x0091
/*$030B*/ RamPoke8 0x0092, %12
/*$030F*/ RamPoke8 0x1721, 0x00
/*$0312*/ %15 = Sub8 0x00, 0x01
/*$0313*/ RamPoke8 0x0093, %15
/*$0315*/ JTrue %15, %BB5, %BB3
This is starting to look quite good. It is about the same length as the original 6502 code, but expresses the code intent much clearer, because many redundant computations (like updating unneeded flags) have been eliminated. 

Constant Folding

The most obvious remaining flaw is instruction %15, which is subtracting two constants. It originated in the DEY 6502 instruction. The 0x00 in instruction %15 came from storing 0 in register Y. During SSA conversion we eliminated both the StoreR8 Y, 0x00 and the LoadR8 Y, replacing subsequent references to Y with the 0x00.

This is typical of how SSA conversion enables more transformations in a natural way and how transformations stack together. In this case it exposed that the operands of the Sub8 are actually constant.

Constant folding simply scans the IR for side effect-free instructions with constant operands, performs the calculation and replaces all usages of the instruction with the calculated result. It "folds" the constant operands into a single value. So, we get:
          %4  = LoadSP
/*$0301*/ %5 = Add8 %4, 0x01
/*$0302*/ %6 = Add8 %5, 0x01
/*$0303*/ %7 = ZExt8t16 %6
%8 = Add16 0x0101, %7
%9 = RamPeek8 %8
/*$0306*/ %10 = Add8 %9, 0x01
/*$0307*/ RamPoke8 0x0090, %10
/*$0309*/ %12 = RamPeek8 0x0091
/*$030B*/ RamPoke8 0x0092, %12
/*$030F*/ RamPoke8 0x1721, 0x00
/*$0313*/ RamPoke8 0x0093, 0xff
/*$0315*/ Jmp %BB5
The Sub8 has been eliminated and the RamPoke8 at $0313 is directly using the calculated constant 0xff (0x00 - 0x01). Clearly this was the intent of the original code, which saved a byte by using DEY (1 byte) instead of LDY #$FF (2 bytes). 

Something unexpected also happened. The conditional branch at the end of the code used the 6502 instruction BNE. It branches depending on the 6502 Z flag, which is represented in the IR by the logical inversion of the STATUS_NotZ register. 

As we mentioned in Part 6, STATUS_NotZ is simply a copy of the value that is being checked for zero/not-zero. In this case, the value was the result of the DEY instruction, which we just constant folded into the constant 0xFF. Said constant was propagated into the JTrue IR instruction:
/*$0315*/       JTrue         0xFF, %BB5, %BB3
Finally, JTrue with a constant operand has been folded into an unconditional branch:
/*$0315*/       Jmp           %BB5
Unquestionably, that was the intent of the original code. It saved a byte by replacing a 3-byte unconditional instruction JMP with a 2-byte conditional BNE, knowing that Z would have been cleared.

This is another example of how our IR allows us to recover the intent of the original code. A series of IR transformations (constant propagation, constant folding, instruction simplification) revealed that the intent of a seemingly conditional branch is to branch unconditionally.

IR In Tree Form

The IR is much more compact and easier to understand in tree form:
/*$0307*/       RamPoke8      0x0090, (Add8 (RamPeek8 (Add16 0x0101 (ZExt8t16 (Add8 (Add8 (LoadSP) 0x01) 0x01)))) 0x01)
/*$030B*/ RamPoke8 0x0092, (RamPeek8 0x0091)
/*$030F*/ RamPoke8 0x1721, 0x00
/*$0313*/ RamPoke8 0x0093, 0xff
/*$0315*/ Jmp %BB5
In this form it is easy to see that the code is initializing four memory locations. 0x1721 and 0x0093 are initialized with constants. 0x0092 is initialized with the contents of 0x0091. Finally, we can see that the first instruction is reading the contents of the stack with an offset.

More Constant Folding

Looking at the tree form, we immediately notice this:
(Add8 (Add8 (LoadSP) 0x01) 0x01)
Clearly the intent here is to calculate SP + 2 (this is typical for 6502, where increments are single byte instructions, while addition is much more restrictive, requiring the A register and the C flag). 

What is needed here is a more sophisticated form of constant folding. Instead of looking only at the two operands of an instruction, we need to check one lever deeper. Basically we want to perform the following transformation (and a few more similar ones):

(Add (Add V Const1) Const2) => 
    (Add V (Add Const1 Const2))

Applying this transformation, we obtain the final version of the IR for now:
/*$0307*/       RamPoke8      0x0090, (Add8 (RamPeek8 (Add16 0x0101 (ZExt8t16 (Add8 (LoadSP) 0x02)))) 0x01)
/*$030B*/ RamPoke8 0x0092, (RamPeek8 0x0091)
/*$030F*/ RamPoke8 0x1721, 0x00
/*$0313*/ RamPoke8 0x0093, 0xff
/*$0315*/ Jmp %BB5
Now we can clearly see that the first instruction is essentially:
*(0x0090) = *(0x101 + (uint8_t)(SP + 2));

To Be Continued

Hopefully by now you are convinced in the utility of our IR. The next part will describe the recovery of flow control.

Comments

Popular posts from this blog

You Don't Like Google's Go Because You Are Small

When you look at Google's presentations about Go, they are not shy about it. Go is about very smart people at Google solving very BIG problems. They know best. If you don't like Go, then you are small and are solving small problems. If you were big (or smart), you would surely like it. For example, you might naively think that printing the greater of two numbers should be as simple as std::cout << max(b(),c()) That is because you think small. What you really should want is: t1 := b() if t2 := c(); t1 < t2 { t1 = t2 } fmt.Print( t1 ) Isn't it much better? We didn't have to type all those extra semicolons that were killing productivity before. If you don't like it, you are small. If you wanted to extract an attribute of an optional parameter, you may be used to typing something like: a = p ? p->a : 0; or even: a = p && p->a You just make me sad because obviously what you really want is: a = 0 if p != nil { a = p-...

How to speed up a micro-benchmark 300x

How to speed up a ubenchmark 300x Static Hermes: How to Speed Up a Micro-benchmark by 300x Without Cheating This is the first of a series of light blog posts about Static Hermes, covering topics that I find interesting or amusing. It is not intended to be a systematic introduction to Static Hermes design.Consider it more like a backstage pass to the quirks, features, and “aha!” moments that make Static Hermes unique. If you’re not familiar with Static Hermes, it’s an evolving project we’re working on to explore the confluence of static typing and JavaScript. It is work in progress, but we’re excited about the possibilities it opens up for performance improvements and more. For more background: Tweet with the slide deck of the Static Hermes announcement Previous talk about Hermes Contents: Meet interp-dispatch.js Let’s Run It! Static Hermes with Untyped Code Helping the Compiler Other Ways to Help the Compiler Some Observations Revisiting The Origina...

Apple2TC: an Apple II Binary to C Decompiler - Part 1

This is a series of blog posts to serve as a log documenting my work on Apple2TC - an open source hobby project developed on GitHub:  https://github.com/tmikov/apple2tc . There are various interesting things that come up all the time and I thought it might be useful to record them for posterity. Part 2  describes the different components of the project Part 3 shows how we validate our disassembly by generating a working C representation of it. Part 4 shows the Apple II tricks we had to deal with to get to correct running code. Part 5 describes decompiling and running Robotron 2084 and Snake Byte. Part 6 introduces our SSA-based intermediate representation . Part 7 shows how to transform the IR to discover the original intent of the code.   What is Apple2TC? Apple2TC is an open source project to decompile original Apple II binaries, mostly games, into understandable and working modern C code,  completely automatically  by analyzing the runtime behavior...