Why are rbp and rsp called general purpose registries?

According to Intel in x64, the following registers are called general purpose registers (RAX, RBX, RCX, RDX, RBP, RSI, RDI, RSP and R8-R15) https://software.intel.com/en-us/articles/introduction- to-x64-assembly .

The next article writes that RBP and RSP are special registers (RBP points to the base of the current stack frame, and the RSP point to the top of the current stack frame). https://www.recurse.com/blog/7-understanding-c-by-learning-assembly

Now I have two conflicting statements. Intel's statement must be trusted, but what is right and why are RBP and RSP generally called at all?

Thanks for any help.

+18
source share
2 answers

General purpose means that all these registers can be used with any instructions that perform calculations with general purpose registers, while, for example, you cannot do whatever you want using the instruction pointer (RIP) or flag register (RFLAGS) .

Some of these registers have been provided for use for a specific use and are usually. The most important of these are RSP and RBP.

If you need to use them for your own purpose, you must save their contents before storing anything else inside, and restore their original value if necessary.

+16
source

If the register can be an operand for add or used in addressing mode, then it is "general purpose" , unlike registers, such as the register of the FS segment or RIP. GP registers are also called integer registers, although other types of registers may also contain integers.

In computer architecture, processors typically process integer registers / instructions separately from registers / instructions FP / SIMD. For example, Intel Sandybridge family processors have separate physical register files for renaming integer GPs or FP vector registers. They are simply called FP integer or register files. (Where FP is the abbreviation of everything that the kernel does not need to save / restore in order to use the GP registers, leaving the FPU / SIMD state intact in user space.) Each entry in the FP register file is 256 bits wide (for contains the AVX vector ymm) , but entries in the integer register should only be 64 bits wide.

On processors that rename segment registers ( no Skylake ), I think this will be part of an integer state, like RFLAGS + RIP. But when we say "register of integers", we usually mean the general register.


Each register has some features for some instructions, with the exception of some completely new registers added with x86-64: R8-R15. They do not disqualify them as common goals (low 16 of) the original 8 are dated 8086, and there was an implicit use of each of them even in the original 8086.

For RSP, this is specifically for push / pop / call / ret, so most of the code never uses it for anything else. (And in kernel mode, it is used asynchronously for interrupts, so you really can't hide it somewhere to get an extra GP register, as you can in user space code: is ESP universal, like EAX? )

But in managed conditional expressions (for example, without signal handlers) you do not need to use RSP for the stack pointer. For example, you can use it to read an array in a loop using pop, as in this response code . (I actually used esp in 32-bit code, but the difference is the same: pop is faster than lodsd on Skylake, while both have lodsd 1 byte.)


Implicit use and feature for each register:

See Also x86 Build. Why is [e] bx stored in call conventions? for a partial list.

I basically limit this to user-space instructions, especially those that the modern compiler can actually generate from C or C ++ code. I am not trying to be exhaustive for registers that have a lot of hidden usage.

  • rax : single operand [i] mul / [i] div / cdq / cdqe, string instructions (stos), cmpxchg , etc. etc.), as well as special shorter encodings for many direct instructions, such as 2-byte cmp al, 1 or 5-byte add eax, 12345 (without ModRM byte). See also codegolf.SE Golf Tips in Machine Code x86 / x64 .

    There is also xchg -eax, from where 0x90 nop (before nop became a separately documented instruction in x86-64, because xchg eax,eax zero xchg eax,eax extends eax to RAX and therefore cannot use 0x90 encoding. But xchg rax,rax can still be compiled in REX.W = 1 0x90.)

  • rcx : shift count, count rep -string , slow loop statement
  • rdx : rdx:rax used by division and multiplication, and cwd / cdq / cqo to configure them. rdtsc BMI2 mulx .
  • rbx : 8086 xlatb . cpuid uses all four of EAX..EDX. 486 cmpxchg8b , x86-64 cmpxchg16b . Most 32-bit compilers will be cmpxchg8 for std::atomic<long long>::compare_exchange_weak . (Clean boot / clean storage can use SSE MOVQ or x87 fild / fistp, though if it is for Pentium or later.) 64-bit compilers will use 64-bit lock cmpxchg , not cmpxchg8b.

    Some 64-bit compilers will be cmpxchg16b for atomic<struct_16_bytes> . RBX has the smallest number of implicit uses of the original version 8, but lock cmpxchg16b is one of the few compilers that will actually be used.

  • rsi / rdi : string rep movsb , including rep movsb which sometimes rep movsb some compilers. (in some cases, gcc also tells rep cmpsb for string literals, but this is probably not optimal).
  • rbp : leave (only 1 mop slower than mov rsp, rbp / pop rbp . gcc actually uses it in frame pointer functions when it cannot just pop rbp ). Also terribly slow enter that no one ever uses.
  • rsp : operation stack: push / pop / call / ret and leave . (And enter ). And in kernel mode (not in user space), asynchronous equipment is used to maintain the interrupt context. This is why kernel code cannot have a red zone.

  • r11 : syscall / sysret uses it to save / restore RFLAGS user space. (Along with RCX for saving / restoring user RIP space).

Special cases of encoding in addressing mode:

(See Also rbp, not allowed as a SIB base? Which is just about addressing modes, where I copied this part of this answer.)

rbp / r13 cannot be a base register, without an offset: that coding instead means: (in ModRM) rel32 (RIPA-sibling), or (in SIB) disp32 without any base register. ( r13 uses the same 3 bits in ModRM / SIB, so this choice simplifies decoding by not forcing the instruction length decoder to look at the REX.B bit to get the 4th bit of the base register). [r13] is going to [r13 + disp8=0] . [r13+rdx] is going to [rdx+r13] (to avoid the problem by replacing the base / index when possible).

rsp / r12 as a base register always needs a SIB byte. (ModR / M encoding base = RSP is an escape code for signaling the SIB byte, and again, more decoders should take care of the REX prefix if r12 handled differently).

rsp cannot be an index register . This allows you to encode [rsp] , which is more useful than [rsp + rsp] . (Intel could develop ModRM / SIB encodings for 32-bit addressing modes (for the first time in 386), so SIB without an index is possible only with base = ESP. This will make [eax + esp*4] possible and only exclude [esp + esp*1/2/4/8] . But this is useless, so they simplified the hardware by making index = ESP code without an index, regardless of the base. This allows you to use two redundant encoding methods for any base or base + disp addressing mode: with or without SIB.)

r12 may be an index register . Unlike other cases, this does not affect the decoding of the instruction length. In addition, it cannot be bypassed with a longer encoding, as in other cases. AMD wanted the AMD64 register to be as orthogonal as possible, so it makes sense to spend a few extra transistors on the REX.X check as part of decoding the index / without the index. For example, [rsp + r12*4] requires index = r12, so if r12 not fully targeted, then AMD64 will become the worst compiler target.

  0: 41 8b 03 mov eax,DWORD PTR [r11] 3: 41 8b 04 24 mov eax,DWORD PTR [r12] # needs a SIB like RSP 7: 41 8b 45 00 mov eax,DWORD PTR [r13+0x0] # needs a disp8 like RBP b: 41 8b 06 mov eax,DWORD PTR [r14] e: 41 8b 07 mov eax,DWORD PTR [r15] 11: 43 8b 04 e3 mov eax,DWORD PTR [r11+r12*8] # *can* be an index 

Compilers like it when all registers can be used for anything, only limiting the allocation of registers for a few special cases. This is what the orthogonality register means.

+7
source

Source: https://habr.com/ru/post/1246810/


All Articles