Internal Context Switches

I want to learn and fill in the gaps in my knowledge with this question

So, the user starts the thread (kernel level), and now he calls yield (the system call I assume) The scheduler should now save the context of the current thread in the TCB (which is stored somewhere in the kernel) and select another thread to start and load it context and go to its CS: EIP. To narrow things down, I'm working on Linux working on top of the x86 architecture. Now I want to go into details:

So, first we have a system call:

1) The wrapper function for yield will push the arguments of the system call onto the stack. Press the return address and raise the interrupt with the system call number pressed on some register (for example, EAX).

2) The interrupt changes the CPU mode from the user to the kernel and goes to the table of interrupt vectors, and from there to the actual system call in the kernel.

3) I assume that the scheduler is now called, and now it must save the current state in the TCB. Here is my dilemma. Since the scheduler will use the kernel stack and not the user stack to perform its operation (which means that SS and SP must be changed), how does it save the state of the user without changing any register in the process. I read on the forums that there are special hardware instructions for saving the state, but then how does the scheduler get access to them and who follows these instructions and when?

4) The scheduler now saves the state in the TCB and loads another TCB

5) When the scheduler starts the source thread, the control returns to the wrapper function, which clears the stack and the thread resumes

Side questions: Does the scheduler work as a kernel-only thread (i.e. a thread that can only run kernel code)? Is there a separate kernel stack for each kernel thread or each process?

+43
linux-kernel context-switch kernel
Sep 27
source share
3 answers

At a high level, there are two different mechanisms for understanding. Firstly, it is the kernel I / O mechanism: it switches one running thread to run usermode code to run kernel code in the context of this thread and vice versa. The second is the context switching mechanism itself, which switches in kernel mode from work in the context of one thread to another.

So, when Thread A calls sched_yield() and Thread B is replaced, the following happens:

  • Theme A goes into the kernel, switching from user mode to kernel mode;
  • Thread A in the context of the kernel - switches to Thread B in the kernel;
  • Thread B exits the kernel by switching from kernel mode back to user mode.

Each user thread has both a user mode stack and a kernel stack. When a thread enters the kernel, the current value of the user mode stack ( SS:ESP ) and the instruction pointer ( CS:EIP ) are stored in the stream kernel mode stack, and the CPU switches to the kernel kernel stack with syscall int $80 mechanism, this is done by the CPU itself. Other register values ​​and flags are also stored on the kernel stack.

When a thread returns from kernel to user mode, register values ​​and flags are popped from the kernel mode stack, after which user mode and instruction pointer values ​​are restored from the stored values ​​in kernel mode stack.

When the thread context switches, it calls the scheduler (the scheduler does not start as a separate thread - it always runs in the context of the current thread). The scheduler code selects the next process and calls the switch_to() function. This function essentially just switches the kernel stacks - it stores the current value of the stack pointer in the TCB for the current thread (called struct task_struct on Linux) and loads the previously saved stack pointer from the TCB for the next thread. At this point, it also saves and restores another thread state that is not commonly used by the kernel — things like floating point registers / SSE.

So, you can see that the main state of the user mode of the thread is not saved and is not restored during context switching - it is saved and restored to the thread kernel stack when entering and leaving the kernel. The context switch code does not need to worry about resetting user-mode register values ​​that are already safely stored in the kernel stack to this point.

+80
03 Oct '12 at 5:20
source share

What you skipped during step 2 is that the stack switches from the stream user level stack (where you press args) to the protected stream level stack. The current thread context interrupted by syscall is actually stored on this protected stack. Inside the ISR and just before entering the kernel, this secure stack switches back to the kernel stack you are talking about. Once inside the kernel, kernel functions, such as scheduler functions, eventually use the kernel stack. Later, the thread is selected by the scheduler, and the system returns to the ISR, it switches back from the kernel stack to the newly selected one (or the first, if none of them are active, none of the higher priority threads), the stack level is protected by the thread that ultimately contains new thread context. Therefore, the context is restored from this stack using the code automatically (depending on the underlying architecture). Finally, a special command restores the latest touchy registers, such as the stack pointer and command pointer. Back in user area ...

To summarize, a thread has (usually) two stacks, and there is one in the kernel itself. The kernel stack stacks at the end of each kernel input. It is interesting to note that starting with version 2.6, the kernel itself is loaded for some processing, therefore the kernel thread has its own stack of protection level next to the general kernel stack.

Some resources:

  • 3.3.3 Running a process switch Understanding the Linux kernel , O'Reilly
  • 5.12.1 Procedures for excluding or interrupting the Intel 3A Handler (sysprogramming) . The chapter number may differ from version to another, so the search for "Using the stack when switching to interrupts and exception handling procedures" should lead you to a good one.

Hope this help!

+9
Sep 28 '12 at 13:33
source share

The kernel itself does not have a stack at all. The same thing applies to the process. It also has no stack. Topics are only citizens of the system that are considered executive units. Because of this, only threads can be scheduled, and only threads have stacks. But there is one point at which kernel mode code is used - every moment in time works in the context of the current active thread. Thanks to this, the kernel can reuse the stack of the current active stack. Please note that only one of them can execute either the kernel code or the user code at the same time. Because of this, when invoking the kernel, it simply reuses the stack thread and performs a cleanup before returning control back to the interrupted actions in the thread. The same mechanism works for interrupt handlers. The same mechanism is used by signal handlers.

In turn, the stack thread is divided into two isolated parts, one of which is called the user stack (because it is used when the thread runs in user mode), and the second is called the kernel kernel (because it is used when the thread runs in kernel mode). When a thread crosses the boundary between user and kernel mode, the CPU automatically switches it from one stack to another. Both stacks are tracked differently by the kernel and processor. For a kernel core, the processor constantly stores a pointer to the top of the thread's kernel stack. This is easy because this address is constant for the stream. Each time a thread enters the kernel, it detects an empty kernel stack, and each time it returns to user mode, it clears the kernel stack. At the same time, the CPU does not mean a pointer to the top of the user stack when the thread is in kernel mode. Instead, when entering the kernel, the CPU creates a special interrupt stack at the top of the kernel stack and stores the value of the user mode stack pointer in this frame. When a thread exits the kernel, the CPU restores the ESP value from the previously created interrupt stack frame just before it is cleared. (for outdated x86, a pair of instructions an int / iret descriptor enters and exits kernel mode)

When entering the kernel mode immediately after the CPU creates the interrupt stack frame, the kernel pushes the contents of the remaining processor registers onto the kernel stack. Note that this saves values ​​only for those registers that can be used by kernel code. For example, the kernel does not save the contents of SSE registers just because it never touches them. Similarly, before requesting the CPU to return control to user mode, the kernel unloads the previously saved contents back to the registers.

Note that on systems like Windows and Linux, there is the concept of a system thread (often called a kernel thread, I know this is confusing). System threads are special threads because they run only in kernel mode and because of this they do not have a user part of the stack. The kernel uses them to perform supportive household tasks.

The thread switch is executed only in kernel mode. This means that both streams, outbound and inbound, start in kernel mode, both use their own kernel stacks, and both have kernel stacks that have “interrupt” frames with pointers to the top of the user stacks. The key point of a thread switch is switching between thread kernel stacks, as simple as:

 pushad; // save context of outgoing thread on the top of the kernel stack of outgoing thread ; here kernel uses kernel stack of outgoing thread mov [TCB_of_outgoing_thread], ESP; mov ESP , [TCB_of_incoming_thread] ; here kernel uses kernel stack of incoming thread popad; // save context of incoming thread from the top of the kernel stack of incoming thread 

Note that there is only one function in the kernel that performs a thread switch. In this regard, each time when switching the kernel stack, it can find the context of the incoming stream at the top of the stack. Just because every time before switching the kernel of the stack, it pushes the context of the outgoing stream onto its stack.

Please also note that each time after switching the stack and before returning to user mode, the kernel reboots the CPU mind with the new value of the top of the kernel stack. This ensures that when a new active thread tries to enter the kernel in the future, it will be switched by the processor to its own kernel stack.

Note also that not all registers are saved on the stack during thread switching; some registers, such as FPU / MMX / SSE, are stored in a specially allocated area in the TCB of the outgoing thread. The kernel uses a different strategy here for two reasons. First of all, not every thread in the system uses them. Pushing their contents and pushing it from the stack for each thread is inefficient. And secondly, there are special instructions for “quick” saving and downloading their content. And these instructions do not use the stack.

Note also that, in fact, part of the core of the thread stack has a fixed size and is distributed as part of the TCB. (true for Linux, and I believe that for Windows too)

+1
Oct. 25 '16 at 11:59
source share



All Articles