How to determine x86 opcode values โ€‹โ€‹based on real mode and address offsets?

I am trying to write the source codes of machine code as 0 and 1 to a text file and execute it as through BIOS.

I have some problems understanding how addressing, multiplication, offset, addressing, operands and instructions work in combinatorial devices, i.e. difference between MOV AL, 07 and MOV BL, AL .

I mean, this makes sense in Assembly, but in machine code it becomes very difficult to get an idea of โ€‹โ€‹the parameters.

So what I want to know is: How can I better understand this? There are no tutorials that I found that accurately explain / describe 0s and 1s from combinatorial correlation instructions or connections between data transfer, MMIO, addressing modes, arithmetic, etc.

On this site http://ref.x86asm.net/coder32.html#x00 he is trying, but I do not understand this.

EXAMPLE: Say I want to move 5 to AL ... I would specify the letter "5" in binary format as part of the operation code in a binary prefix chained with an AL / MOV instruction, or I will have one fixed binary code code for of each team, regardless of the meaning? This is what I want to know ... how to understand how machine code is written.

+4
source share
2 answers

Unfortunately, x86 coding is complex and irregular, and understanding this is hard work. The best "quick start" in encoding is a set of HTML pages on sandpile.org (it is short, but rather thorough).

First: http://sandpile.org/x86/opc_enc.htm - the do-it-yourself coding table shows about a dozen ways to encode instructions. The white cells in each row are required bytes in the instruction; the following gray cells exist (or not) based on various fields that appear earlier in the operation code. You should look at the lines starting with white โ€œ0Fhโ€, as well as with the first line. At the bottom of the same page are bit fields that appear in different "extended" fields of the operation code - you ignore everything except the line "modrm / sib" (first line).

Note that for all but the first line (this is a 1-byte operation code), the mod r / m byte should follow the operation code (for 1-byte operation codes, this depends on the command). This encodes the arguments for most instructions with two arguments. The table http://sandpile.org/x86/opc_rm.htm has values: one of the arguments must be a register, the other argument can be a register or indirect memory (the "reg" field encodes the register, the "mod" and "r / m" fields encode another argument). Usually, there is also a โ€œdirectionโ€ bit elsewhere in the opcode indicating the order of the arguments. The operation code also indicates whether we will manipulate, for example, AL, AX, EAX or RAX (that is, different sizes) or one of the extended registers, so each 3-bit field is indicated as referring to many different registers.

In modrm, if the โ€œmodโ€ bit is โ€œ11โ€, then the โ€œr / mโ€ field is also case sensitive. Otherwise, it usually refers to a memory address created by adding a named register to the (optional) offset that appears after the modrm byte (this constant has a length of 0, 1, or 4 bytes, depending on the "mod" bit). The exception is that the โ€œr / mโ€ bit is โ€œ100โ€ (that is, 0x4), which is usually called โ€œSPโ€ - in this case, the memory argument is described by the optional โ€œsibโ€ byte, which immediately follows the modrm byte (any offset modrm appears after sib). For SIB encoding, see http://sandpile.org/x86/opc_sib.htm or go to the modrm page.

Finally, to understand where the direction and size came from, look at some opcode: http://sandpile.org/x86/opc_1.htm . The first four entries are "ADD", with arguments in two different orders and having two different widths. Thus, in this case, the lower bits of the instruction encode the direction and width.

+5
source

There is a (mostly) one-to-one mapping between assembler mnemonics and machine instructions. You can find these mappings in the Intel Software Development Guide , Volume 2, which contains the complete x86 16-, 32-, and 64-bit instruction sets. You probably want to start with Chapter 2: The format of the instruction that describes the translations that you are trying to find.

In the case of mov al, 5 , as you say, you put the literal there. Machine code instruction:

 b0 05 

Since this is a form of MOV r8, imm8 MOV instruction. For mov bl, al you need the MOV r/m8,r8 form, which in your case will encode:

 88 c3 

c3 you can look in the 32-2-bit addressing form using the ModR / M byte in table 2-2, where you will see it at the intersection of the BL line and the AL column. (There is a 16-bit table, if you are in this mode, the value is the same in this case.)

+1
source

Source: https://habr.com/ru/post/1500820/


All Articles