What are processor registers and how are they used, in particular WRT multithreading?

Question

What are processor registers and how are they used, in particular WRT multithreading?

This question and my answer below is mainly in response to an area of confusion in another question.

At the end of the answer there are some WRT "volatile" problems and thread synchronization, which I am not quite sure about - I welcome comments and alternative answers. However, the question concerns primarily processor registers and how they are used.

+4

assembly compiler-construction cpu-registers code-generation

Steve314 Mar 05 '10 at 4:35

source share

2 answers

Registers are the "working store" in the CPU. They are very fast, but very limited resources. Typically, a CPU has a small fixed set of named registers, with the names being part of the assembly language convention for this machine code of the CPU. For example, 32-bit Intel x86 processors have four main data registers: eax, ebx, ecx and edx, as well as a number of indexing and other more specialized registers.

Strictly speaking, this is not entirely true these days - renaming registers, for example, is a common occurrence. Some processors have enough registers to enumerate, rather than name them, etc. However, this is a good basic model to work with. For example, register renaming is used to preserve the illusion of this basic model, despite being executed out of turn.

Using registers in hand-written assembler tends to have a simple register use pattern. Several variables will be stored only in registers for the duration of the subprogram or some significant part of it. Other registers are used in the read-modify-write pattern. For instance...

mov eax, [var1] add eax, [var2] mov [var1], eax

IIRC, this is valid (albeit probably inefficient) x86 assembler code. On Motorola 68000 I can write ...

 move.l [var1], d0 add.l [var2], d0 move.l d0, [var1]

This time, the source is usually the left parameter, and the destination is the right. 68000 has 8 data registers (d0..d7) and 8 address registers (a0..a7), and a7 IIRC also serves as a stack pointer.

At 6510 (back to the good old Commodore 64) I could write ...

 lda var1 adc var2 sta var1

The registers here are mostly implicit in the instructions - they all use register A (battery).

Please forgive any stupid mistakes in these examples. I have not written any significant amount of “real” (and not virtual) assembler for at least 15 years. However, the principle is the point.

The use of registers is specific to a particular piece of code. What the register occupies is basically what remains of the last instruction. The responsibility of the programmer is to keep track of what is in each register at every point in the code.

When calling a subroutine, either the caller or the caller must take responsibility for the absence of conflict, which usually means that the registers are saved on the stack at the beginning of the call and then read again at the end. Similar problems occur during interruptions. Things like the one responsible for maintaining the registers (caller or callee) are usually part of the documentation for each subprogram.

The compiler usually decides how to use registers in a much more complex way than the human programmer, but he works on the same principles. The mapping from registers to specific variables is dynamic and varies greatly depending on which piece of code you are viewing. Saving and restoring registers is mostly handled in accordance with standard conventions, although in some cases the compiler may improvise "user call conventions".

As a rule, local variables in a function are supposed to live on the stack. This is a general rule with "auto" variables in C. Since "auto" is the default, these are regular local variables. For instance...

 void myfunc () { int i; // normal (auto) local variable //... nested_call (); //... }

In the above code, "i" may well be stored mainly in the register. It can even move from one register to another and back as the function progresses. However, when nested_call is called, a value from this register will almost certainly be on the stack — either because the variable is a stack variable (not a register), or because the contents of the register are stored to allow nested_call to store itself.

In a multi-threaded application, ordinary local variables are local to a specific thread. Each thread receives its own stack and during its operation uses exclusively processor registers. In the context switch, these registers are saved. In registries or on the stack, local variables are not shared between threads.

This basic situation persists in a multi-core application, although two or more threads can be active at the same time. Each core has its own stack and its own registers.

Data stored in shared memory requires more care. This includes global variables, static variables in both classes and functions, as well as objects allocated by the heap. For instance...

 void myfunc () { static int i; // static variable //... nested_call (); //... }

In this case, the value of "i" is stored between function calls. The static area of main memory is reserved for storing this value (hence the name "static"). In principle, there is no need for any special actions to save "i" during the call to "nested_call", and at a glance the variable can be obtained from any thread running on any core (or even on a separate CPU).

However, the compiler still works hard to optimize the speed and size of the code. Repeated reads and writes to main memory are much slower than register access. The compiler almost certainly decides not to follow the simple read-modify-write pattern described above, but instead stores the value in the register for a relatively long period, avoiding repeated reads and writes to the same memory.

This means that changes made in one thread may be noticed by another thread for some time. Two streams may have different ideas about the value of "i" above.

There is no magic hardware solution for this. For example, there is no mechanism for register synchronization between threads. For the CPU, the variable and the register are completely separate objects - they do not know that they need to be synchronized. Of course, there is no synchronization between the registers in different threads or the launch on different cores - there is no reason to believe that another thread is using the same register for the same purpose at any given moment in time.

A partial solution is to flag the variable as "mutable" ...

 void myfunc () { volatile static int i; //... nested_call (); //... }

This tells the compiler not to optimize reading and writing to the variable. The processor has no concept of volatility. This keyword tells the compiler to generate other code by immediately reading and writing to memory, as indicated by the assignments, rather than avoiding these accesses with a register.

However, this is not a solution for multi-threaded synchronization, at least in itself. One suitable multi-threaded solution is to use some kind of lock to control access to this "shared resource." For instance...

 void myfunc () { static int i; //... acquire_lock_on_i (); // do stuff with i release_lock_on_i (); //... }

More is going on here than is immediately apparent. Basically, instead of writing the value of "i" back to your variable, ready for the call to "release_lock_on_i", it can be saved on the stack. As for the compiler, this is not unreasonable. In any case, it makes access to the stack (for example, saves the return address), so storing the register on the stack can be more efficient than writing it to "i" - more caching than access to a completely separate memory block.

Unfortunately, the release lock function does not know that the variable has not yet been written to memory, so it cannot fix anything. In the end, this function is only a library call (a real release lock can be hidden in a deeper nested call), and this library can be compiled several years before your application - it does not know how its calling users use registers or stack. This is a big part of why we use the stack, and why conventions need to be standardized (like who keeps the registers). The release lock feature cannot force callers to “synchronize” registers.

Equally, you can reinstall the old application with the new library - the caller does not know what release_lock_on_i does or how, it's just a function call. He does not know that it is first necessary to save the registers into memory.

To solve this problem, we can return "volatile".

 void myfunc () { volatile static int i; //... acquire_lock_on_i (); // do stuff with i release_lock_on_i (); //... }

We can temporarily use a regular local variable while the lock is active, to give the compiler the ability to use case for this short period. In principle, however, the lock should be released as soon as possible, so there should not be so much code. However, if we do this, we will write our temporary variable to "i" before releasing the lock, and the volatility of "i" ensures that it is written to main memory.

In principle, this is not enough. Writing to the main memory does not mean that you wrote it to the main memory - there are gaps between the cache, and your data can sit in any of these layers for a while. There is a problem with the “memory barrier,” and I don’t know much about it, but, fortunately, this problem is related to thread synchronization calls, such as calls to buy and release the lock above.

This memory protection issue does not eliminate the need for the "volatile" keyword.

+14

Steve314 Mar 05 '10 at 4:44

source share

Ben zotto · Accepted Answer · 2010-03-05T04:41:42+0000

CPU registers are small data storage areas on silicon CPU. For most architectures, this is the main place where all operations take place (data is loaded from memory, managed and discarded).

Regardless of which thread is running, it uses registers and owns an instruction pointer (which says which instruction will be next). When the OS replaces another thread, the entire state of the CPU, including the registers and the instruction pointer, is saved somewhere, effectively freezing the drying of the state of the thread when it comes back to life.

Most of the documentation for all of this is, of course, everywhere. Wikipedia about registries. Wikipedia about switching contexts. for starters. Edit: or read Steve314 answer. :)

What are processor registers and how are they used, in particular WRT multithreading?

More articles: