http://www.informit.com/articles/printerfriendly.aspx?p=25075
Implementing streams in user space
There are two main ways to implement a stream package: in user space and in the kernel. The choice is moderately controversial, and a hybrid implementation is possible. Now we will describe these methods, as well as their advantages and disadvantages.
The first method is to put the stream package completely in user space. The kernel knows nothing about them. As for the kernel, it manages the usual single-threaded processes. The first and most obvious advantage is that a user-level thread package can be implemented in an operating system that does not support threads. All operating systems used for this category, and even some of them, still do.
All these implementations have the same general structure, which is illustrated in Fig. 2-8 (a). Threads run on top of a runtime system, which is a set of procedures that control threads. We have already seen four of them: thread_create, thread_exit, thread_wait and thread_yield, but usually there are more of them.
When threads are managed in user space, each process needs its own table of its own threads in order to track threads in this process. This table is similar to the kernel process table, except that it only tracks the properties of each thread, such as each thread's thread counter, stack pointer, registers, state, etc. The thread table is managed by the runtime system. When a thread moves to a ready state or is blocked, the information necessary to restart it is stored in the thread table in the same way that the kernel stores information about processes in the process table.
When a thread executes something that might lead to its local blocking, for example, expecting another thread to complete some work in the process, it calls the run-time system procedure. This procedure checks if the thread should be placed in a blocked state. If so, it stores the thread registers (that is, its own) in the thread table, looks at the table for the finished thread to start, and reloads the machine registers with the new values stored in the thread. Once the stack pointer and program counter have been switched, the new thread comes to life again automatically. If the machine has instructions for storing all the registers and another for loading them, the entire thread switch can be executed in several instructions. Switching threads in this way is at least an order of magnitude faster than capturing the kernel, and is a good argument in favor of user-level thread packages.
However, there is one key difference from processes. When a thread is currently completed, for example, when it calls thread_yield, thread_yield code can store thread information in the thread table itself. Alternatively, it can then call the thread scheduler to select a different thread to start. The procedure for saving the state of the thread and the scheduler is only local procedures, so their use is much more efficient than calling the kernel. Among other problems, a trap is not required, there is no need to switch context, there is no need to clear the memory cache, etc. This makes thread planning very fast.
User level streams also have other advantages. They allow each process to have its own scheduling algorithm. For some applications, for example, those who have a garbage collector, there is no need to worry that a thread stopped at an inconvenient moment is a plus. They also scale better, since kernel threads invariably require some table space and stack space in the kernel, which can be a problem if there are a very large number of threads.
Despite its best performance, user-level threadstreams have some serious issues. The first of these is the problem of introducing lock system calls. Suppose a stream is read from the keyboard before all keys have been deleted. Preventing a thread actually makes the system call unacceptable, as it will stop all threads. One of the main goals of creating threads was primarily to allow each of them to use blocking calls, but to prevent one blocked thread from overlapping with the others. With system call blocking, it's hard to figure out how to achieve this goal.
All system calls can be changed to be non-blocking (for example, reading on the keyboard will simply return 0 bytes if the characters are no longer buffered), but requiring changes to the operating system are unattractive. In addition, one of the arguments for user-level threads was that they could work with existing operating systems. In addition, changing the semantics of reading will require changes in many user programs.
Another alternative is possible if it is possible to inform in advance whether the call will be blocked. On some versions of UNIX, there is a system call, select, exists, which allows the caller to determine whether the intended read will be blocked. When this call is present, the reading of the library procedure can be replaced by a new one, which first makes the select call and then only executes the read call if it is safe (i.e., it will not be blocked). If the read call is blocked, the call will not be completed. Instead, another thread executes. The next time the run-time system gains control, it can check again to make sure that reading is now safe. This approach requires rewriting parts of the system call library, inefficient and inefficient, but there is little choice. The code placed around the system call for verification is called a jacket or wrapper.
To some extent, similar to the problem of blocking system calls - the problem of page errors. We study them in chap. 4. At the moment, it is enough to say that computers can be configured in such a way that not all the program is in the main memory at once. If the program calls or proceeds to an instruction that is not in memory, a page error occurs and the operating system goes over and receives the missing command (and its neighbors) from the disk. This is called a page error. The process is blocked when the necessary instruction is found and read. If a thread causes a page error, the kernel, even without knowing the existence of threads, naturally blocks the entire process until the disk I / O is completed, even though other threads can be started.
Another problem with user-level stream packets is that if a thread starts, no other thread will ever work in this process unless the first thread voluntarily abandons the processor. Within one process, there are no clock interrupts, which makes it impossible to schedule processes in a cyclic way (in turn). If a thread does not enter the temporary system of its own free will, the scheduler will never get a chance.
One possible solution to the thread problem being executed forever is to force the runtime system to request a synchronization signal (interrupt) once per second to give it control, but it is also rude and random to program. Periodic clock interruptions at a higher frequency are not always possible, and even if they are, the total overhead can be significant. In addition, the thread may also need to interrupt the clock signal, which interferes with the use of the runtime system.
Another and probably the most destructive argument for user-level threads is that programmers usually want threads in applications where threads are often blocked, for example, on a multi-threaded web server. These threads constantly cause system calls. Once a trap has occurred with the kernel to make a system call, it is unlikely that more work on the kernel to switch threads if the old one is blocked, and the presence of this kernel eliminates the need to constantly select system calls that check if system read calls are safe. For applications that are essentially completely CPU-bound and rarely blocked, what's the point of having threads? No one would seriously suggest calculating the first n prime numbers or playing chess using streams, because nothing will work out doing it this way.