64-bit assembly when using smaller registers

I understand that the assembly x86_64 has, for example, a (64-bit) rax register, but it can also be obtained as a 32-bit register, eax, 16 bits, ax and 8 bits, al. In what situation would I not just use the full 64 bits and why, what would be the advantage?

As an example, with this simple global welcome program:

section .data msg: db "Hello World!", 0x0a, 0x00 len: equ $-msg section .text global start start: mov rax, 0x2000004 ; System call write = 4 mov rdi, 1 ; Write to standard out = 1 mov rsi, msg ; The address of hello_world string mov rdx, len ; The size to write syscall ; Invoke the kernel mov rax, 0x2000001 ; System call number for exit = 1 mov rdi, 0 ; Exit success = 0 syscall ; Invoke the kernel 

rdi and rdx at least only need 8 bits, not 64, right? But if I change them to dil and dl, respectively (their lower 8-bit equivalents), the program collects and binds, but does not output anything.

However, it still works if I use eax, edi and edx, so should I use them, and not the full 64-bit ones? Why or why not?

+6
source share
5 answers

First of all, when loading a smaller (for example, 8-bit) value from memory (reading a char, working with a data structure, deserializing a network packet, etc.) into the register.

 MOV AL, [0x1234] 

against

 MOV RAX, [0x1234] SHR RAX, 56 # assuming there are actually 8 accessible bytes at 0x1234, # and they're the right endianness; otherwise you'd need # AND RAX, 0xFF or similar... 

Or, of course, writing the value back to memory.


(Edit, as in 6 years):

As it continues to grow:

 MOV AL, [0x1234] 
  • reads only one byte of memory at 0x1234 (inversion will only overwrite one byte of memory)
  • saves all other 56 bits of RAX
    • This creates a relationship between past and future RAX values, so the CPU cannot optimize the instruction using rename register .

In contrast to this:

 MOV RAX, [0x1234] 
  • reads 8 bytes of memory, starting with 0x1234 (inverse overwrites 8 bytes of memory)
  • overwrites all RAX
  • assumes that the bytes in memory have the same accuracy than the CPU (often this is not true in network packets, so my SHR instruction years ago)

It is also important to note:

 MOV EAX, [0x1234] 

Then, as mentioned in the comments, there are:

 MOVZX EAX, byte [0x1234] 
  • reads only one byte of memory at 0x1234
  • expands the value to fill all EAX (and therefore RAX) with zeros (excluding dependency and allowing optimization of register renaming).

In all these cases, if you want to write from register "A" into memory, you will need to select your width:

 MOV [0x1234], AL ; write a byte (8 bits) MOV [0x1234], AX ; write a word (16 bits) MOV [0x1234], EAX ; write a dword (32 bits) MOV [0x1234], RAX ; write a qword (64 bits) 
+2
source

Here you ask some questions.

If you simply load the lower 8 bits of the register, the rest of the register will retain its previous value. This may explain why your system call received the wrong parameters.

One reason to use 32 bits when that's all you need is that many instructions using EAX or EBX are one byte shorter than using RAX or RBX. It may also mean that the constants loaded into the register are shorter.

The instruction set has evolved over time and has a lot of quirks!

+5
source

If you just need 32-bit registers, you can work with them, this is normal if 64-bit. But if you only need 16-bit or 8-bit registers, try to avoid them or always use movzx / movsx to clear the remaining bits. It is well known that under x86-64, using 32-bit operands, they clear the higher bits of a 64-bit register. The main goal of this is to avoid chains of false dependencies.

Please refer to the appropriate section - 3.4.1.1 - Intel® 64 and IA-32 Volume Volume 1 Software Developer's Guide :

32-bit operands generate a 32-bit result, with zero extension to a 64-bit result in the general-purpose target register

Breaking dependency chains allow you to execute commands in parallel, in random order, using an out-of-order algorithm implemented inside the CPU since Pentium Pro in 1995.

Quote from the Intel® 64 and IA-32 Architecture Optimization Reference Guide , Section 3.5.1.8:

Sequences of codes that change the partial register may experience some delay in their dependency chain, but they can be avoided by using the dependency violation idioms. For processors based on the Intel Core microarchitecture, a series of instructions can help eliminate dependence on use when the software uses this instruction to clear registration contents to zero. Separate the dependencies on the parts of the registers between the instructions, working on 32-bit registers instead of partial registers. For moves, this can be done using 32-bit moves or using MOVZX.

Assembly / compiler rule. Rule 37. (M impact, MH generality) . Discontinuous dependencies on parts of registers between instructions, working on 32-bit registers instead of partial registers. For moves, this can be done using 32-bit moves or using MOVZX.

MOVZX and MOV with 32-bit operands for x64 are equivalent - they all break the dependency chains.

That's why your code will run faster if you always try to clear the highest bits of large registers when using lower registers. When the bit is always clear, thre is independent of the previous register value, the CPU can internally rename the registers.

Rename registration is a method used by the internal processor that eliminates false data dependencies resulting from register reuse through sequential instructions that do not have real data dependencies between them.

+2
source

If you want to work with only 8-bit quantity, you will work with the AL register. The same goes for AX and EAX.

For example, you can have a 64-bit value that contains two 32-bit values. You can work with low 32-bits by accessing the EAX register. If you want to work with high 32-bits, you can change two 32-bit quantities (turn the DWORD in the register) so that the high bits are now in EAX.

+1
source

64-bit is the largest part of the memory that you can work with as a whole. This does not mean what you need to use.

If you need 8 bits, use 8. If you need 16, use 16. If it doesn't matter how many bits, then it doesn't matter how much you use.

Admittedly, when on a 64-bit processor there is very little overhead to use the full 64 bits. But if, for example, you calculate the value of a byte, working with a byte will mean that the result will already be correct.

+1
source

Source: https://habr.com/ru/post/891995/


All Articles