X86 Assembly: How do Disassemblers know how to break instructions?

How does the x86 disassembler know where to break instructions?

I am looking at instruction set 8088 . For example, the move command has 7 variations, which range from 2 to 4 bytes. The instructions themselves seem to be out of order. Another reason for why x86 is ugly? .

For instance:

                        76543210  76543210  76543210  76543210
reg/mem to/from reg     100010dw  ||regr/m  
imm to reg/mem          1100011w  ||000r/m  dat       dat w=1
imm to reg              1011wreg  data      dat w=1
imm to accum            1010000w  addr-low  addrhigh
accum to mem            1010001w  addr-low  addrhigh
reg/mem to seg          10001100  ||0ssr/m
seg to reg/mem          10001100  ||0ssr/m

Legend:
||=mod {NO-DISP=0,DISP-LOW,DISP-HIGH,REG}
ss=seg enum{es=0,cs,ss,ds}
reg=enum{ax=0,bx,cd,dx,bx,sp,bp,si,di (if w=1)} enum{al,bl...} (if w=0)
r/m=reg or mem (mod=3 then REG, else mem)

in the first byte many commands can overlap:

                        76543210  76543210  76543210  76543210
push                    11111111  ||110r/m
inc                     1111111w  ||000r/m

Bitmasks seem to have an arbitrary purpose. How does a disassembler break instructions?

This question is a subset of How to write a disassembler.

+3
source share
1 answer

8086/8088 (ISBN 1-55512-010-5), ... A 0b00000000 0b11111111. , . , , , xor, cmp .. , , , alu .

.

, , 0xFF, , , , . 8 ( - undefined) 3 .

, x86 . , . , x86 6502, , , .

:

?

, . , . , , , , , . . , ( ).

- , , , , . , , , , , , . , . , , , . , - , , , , .

, , 0 . undefined, , - .

x86 - , , LAST , , . - . , pic arm/thumb. msp430 , , , 6502 (, , ..). , , , x86, . 8088/8086, , , 386.

push vs inc , - , , msp430.

+8

Source: https://habr.com/ru/post/1770650/


All Articles