3
My initial idea for the disassembler was to do everything in one pass, but
4
I've decided to change to a two-pass approach. One pass would require lots
5
of extra memory to store everything I want to output.
7
Keep track of flags for every address. Possible values are EMPTY, CODE,
8
DATA, and LABELLED. Everything starts out as EMPTY. Everything loaded from
9
the hex file is DATA by default.
11
Pass 1: Start at entry point 0000. Follow code through all branches. Note
12
any change to interrupt vectors. Decode interrupt vector entry points as
15
As I go, mark all decoded addresses as CODE. Mark all jump destinations as
16
LABELLED. Give them a label number.
18
Pass 2: Start at address 0000. Output "CSEG AT 0000h". Increment through
19
"memory" to address 65535. Everything marked as LABELLED gets a label
20
looked up from a label table. Everything marked as CODE gets decoded and
21
output. Everything marked as DATA gets a DB command. If EMPTY is
22
encountered, the next non-EMPTY code or data is preceded by at CSEG AT
25
Output "END" at the end.
29
The pass1/pass2 thing worked well. The one "issue" the program has is with
30
indirect jumps (JMP @A+DPTR). There is no way to "follow" the code with an
31
indirect jump. It also seems that at least one 8051 C compiler (Keil
32
uVision 1) generates a number of these indirect jumps. My program will take
33
multiple entry points on the command line, so someone could examine the
34
assembly output by hand, and make an educated guess about where the jump
35
table begins, and then re-disassemble the program with the additional entry
38
I decode SFR addresses, both in byte- and bit-addressed memory, into their
39
symbolic names. This helps make the assembly output a bit more readable. I
40
think the only feature to add is one that examines writes to the interrupt
41
registers and makes a guess about which interrupt vectors are being used.