Why can I access the lower dword / word / byte in the register, but not higher?

Question

Why can I access the lower dword / word / byte in the register, but not higher?

I began to study assembler, and it does not seem logical for me.

Why can't I use a few higher bytes in the register?

I understand the historical reason rax → eax → ax , so let's focus on the new 64-bit registers. For example, I can use r8 and r8d , but why not r8dl and r8dh ? The same thing happens with r8w and r8b .

My initial thinking was that I can use 8 r8b registers at the same time (for example, I can do with al and ah at the same time). But I can not. And using r8b makes the full r8 register busy.

What begs the question - why? Why do you need to use only part of the register if you cannot use other parts at the same time? Why not just leave only r8 and forget about the lower parts?

+5

assembly x86 x86-64 64bit cpu-registers

nikitablack Aug 4 '17 at 7:13

source share

3 answers

The general answer is that such access is expensive in several ways and rarely needed.

Since at least the second half of the 1980s, and deeply since the 1990s, instruction sets are modeled mainly for the convenience of the compiler than for the convenience of humans. Compiler logic is much simpler when it projects a set of variables with specific sizes (8, 16, 32, 64 bits) onto a fixed set of registers, and each register is used for exactly one value at a time. Registration overlap is very confusing. As a result, the compiler itself knows one register "A" (or even R0), which is AL, AX, EAX or RAX, depending on the size of the operand. To use AH, he must pay attention to the fact that AX consists of AH and AL, which is currently absent. Even if it generates instructions with AH (for example, LAHF), internally it is most likely viewed as "an operation that populates A using LowFlags * 256". (In real life, there are some hacks that blur this strong picture, but they are very local.)

This is combined with other compiler specifications. For example, GCC and Clang are deep SSA . As a result, you will never see the XCHG instruction in your release; if you find it somewhere in the code, this is a 100% handwritten assembly. The same for RCL, RCR, even if they are suitable in some specific cases (for example, divide uint32 by 7), probably for ROL, ROR. If AMD abandoned RCL, RCR from their x86-64 design, no one would mourn these instructions.

This does not include a vector tool, which is modeled on different principles and orthogonal to the main one. When the compiler decides to perform 4 parallel uint32 actions in the XMM register, it can use the PINS * instructions to replace part of such a register or PEXTR * to extract it, but in this case it tracks 2-4-8-16 ... values at a time . But this vectorization does not apply to the main set of registers, at least in the main modern ISA.

This movement in compilers has a constant and increased movement in hardware. It is easier to make 16-32 independent architectural registers and tracks (see rename the register ) separately (for example, add 2 register sources and provide 1 registration result) than provide each part of the register separately and calculate the instruction that (for the same example) receives 16 single-byte sources and generates 8 single-byte results. (That's why x86-64 is designed so that a 32-bit register entry clears the upper 32 bits of a 64-bit register, but this is not done for 8- and 16-bit operations, because the CPU has already received the need to combine with the upper bits of the previous value register for reasons related to the past.)

There are several chances to see how this has changed in the future, before the radical revolution of processor design, but I regard them as minimal.

If you currently need access to some registers, for example, bit 40-47 from RAX, this can be quite easily implemented with copying and rotation. To extract it:

 MOV RCX, RAX ; expect result in CL SHR RCX, 40 MOVZX RCX, CL ; to clear all bits except 7-0

To replace a value:

 ROR RAX, 40 MOV AL, CL ; provided that CL is what to insert ROL RAX, 40

these pieces of code are linear and fast enough.

+4

Netch Aug 6 '17 at 6:59

source share

There is one more step in history: the 8-bit 8080, which appeared before 8086. Despite the fact that it is an 8-bit processor, you can use pairs of 8-bit registers to perform some 16-bit operations.

https://en.wikipedia.org/wiki/Intel_8080#Registers

So, to make it easier to convert assembly code 8080 to code 8086, which seemed important at that time (Intel even supplied the program for this automatically, almost) - the new 16-bit registers were designed for optional use as a pair of 8-bit registers.

However, in 8086 there was no way to use pairs of 16-bit registers for 32-bit operations, so when 386 arose, there was no need to split 32-bit registers into two 16-bit registers.

As Johan shows, the instruction set still provides a way to get two 8-bit registers with the lower 16 bits. But this function (mis) has not been expanded to higher widths.

Similarly, when switching to 64 bits, there is no precedent for using pairs of 32-bit registers for 64-bit operations (with the exception of some odd double shifts). And no one else is trying to convert the old build code. Never worked so well.

+3

Bo persson Aug 4 '17 at 9:22

source share

Johan · Accepted Answer · 2017-08-04T08:51:20+0000

why can't i use a few higher bytes in the register

Each permutation of the command must be encoded in the instructions. The original 8086 processor supports the following options:

 instruction encoding remarks --------------------------------------------------------- mov ax,value b8 01 00 <-- whole register mov al,value b4 01 <-- lower byte mov ah,value b0 01 <-- upper byte

Since the 8086 is a 16-bit processor, three different versions cover all options.
80386 added 32-bit support. Designers had a choice: either add support for 3 additional sets of registers (x 8 registers = 24 new registers), and somehow find the encodings for them, or leave things basically the way they were before.

Here the designers have chosen:

 instruction encoding remarks --------------------------------------------------------- mov eax,value b8 01 00 00 00 (same encoding as mov ax,value!) mov ax,value 66 b8 01 00 (prefix 66 + encoding for mov eax,value) mov al,value (same as before) mov ah,value (same as before)

They simply added the 0x66 prefix to resize the register from (default) 32 to 16 bits plus the 0x67 prefix to resize the memory operand. And left it at the same time.

Otherwise, this would mean doubling the number of command encodings or adding ~~three~~ Six new prefixes for each of your "new" incomplete registers.
By the time 80386 was released, all command bytes had already been accepted, so there was no room for new prefixes. This opcode space was eaten up by useless instructions like AAA , AAD , AAM , AAS , DAA , DAS SALC . (They were disabled in X64 mode to free up the necessary space for encoding).

If you want to change only the higher register bytes, just do:

 movzx eax,cl //mov al,cl, but faster shl eax,24 //mov al to high byte.

But why not two (say r8dl and r8dh)

The original 8086 had 8 byte size registers:

 al,cl,dl,bl,ah,ch,dh,bh <-- in this order.

Index registers, base pointer, and stack register do not have byte registers.

In x64, this has been changed. If there is a REX prefix (denoting x64 registers), then al..bh (8 regs) encodes al .. r15l . 16 reg. 1 extra coding bit from rex prefix. This adds spl , dil , sil , bpl , but excludes any xh reg. (you can get four xh rules if you don't use the REX prefix).

And using r8b makes full r8 busy

Yes, this is called "partial record registration." Since the notation r8b changes part but not all r8 , r8 now split into two halves. Half have changed, and one half not. The CPU must join the two halves. He can either do this using an additional CPU cycle to do the job, or by adding more circuits for the task to be able to do this in one cycle.
The latter is expensive in terms of silicon and complex in terms of design, it also adds extra heat due to extra work (more work per cycle = more heat). See Why GCC Does Not Use Partial Registers? to run through how different x86 processors process entries with a partial register (and later reads the full register).

If I use r8b, I cannot access the upper 56 bits at the same time, they exist but are not available

No, they are not unaccessible .

 mov rax,bignumber //random value in eax mov al,0 //clear al xor r8d,r8d //r8=0 mov r8b,16 //set r8b or r8,rax //change r8 upper without changing r8b

You use masks plus and , or , xor and not and to change parts of the register without affecting the rest.

Actually, there was never a need for ah , but this led to a more compact code on 8086 (and more efficient use of registers). It is sometimes useful to use EAX or RAX, and then read AL and AH separately (e.g. movzx ecx, al / movzx edx, ah ) as part of decompressing bytes.

Why can I access the lower dword / word / byte in the register, but not higher?

More articles: