Linux system calls and kernel mode

I understand that there are system calls to provide access to features that are prohibited in user space, for example, when accessing the hard drive using the read() system call. I also understand that they are abstracted out by the user mode layer in the form of library calls, such as fread() , to ensure compatibility between the equipment.

So, from the point of view of application developers, we have something like:

 //library //syscall //k_driver //device_driver fread() -> read() -> k_read() -> d_read() 

My question is: what prevents me from inserting all the instructions in the fread() and read() functions directly into my program? The instructions are the same, so should the processor behave the same? I have not tried, but I believe that this does not work for some reason, I am missing. Otherwise, any application can get arbitrary work in kernel mode.

TL DR: What allows system calls in kernel mode 'enter' that cannot be copied by the application?

+6
source share
2 answers

System calls are not part of the kernel. More precisely, for example, the read function that you call still calls the library depending on your application. What read(2) does internally invoke the actual system call with some interrupt or syscall(2) build syscall(2) , depending on the CPU and OS architecture.

This is the only way for userland code to use privileged code, but it is an indirect way. User code and kernel code are executed in different contexts.

This means that you cannot add kernel source to your user code and expect it to do something useful, but it will crash. In particular, the kernel code has access to the physical memory addresses necessary for interacting with hardware. Userland code is limited to accessing virtual memory that does not have this capability. In addition, the code for the user instruction code is allowed to be executed; it is a subset of those supported by the CPU. A few instructions related to I / O, interrupt, and virtualization are examples of forbidden code. They are known as privileged instructions and require to be in lower ring or supervisor mode depending on the CPU architecture.

+8
source

You can embed them. You can issue system calls directly through syscall(2) , but this will soon become messy. Note that the overhead of a system call (the context switches back and forth, checks in the kernel ...), not to mention the time that the system call itself takes, makes your gain by investing in a noise mismatch (if there is any winning, more code means that the cache is not very useful, and performance suffers). Trust libc / kernel users to learn this question and do the back for you (in the corresponding *.h file) if this is really a measurable gain.

0
source

Source: https://habr.com/ru/post/945768/


All Articles