Where are the _syscallN macros included in <linux / unistd.h>?
It so happened that if you needed to make a system call directly on Linux without using an existing library, you could just include <linux/unistd.h> , and he would define a macro like this:
#define _syscall3(type,name,type1,arg1,type2,arg2,type3,arg3) \ type name(type1 arg1,type2 arg2,type3 arg3) \ { \ long __res; \ __asm__ volatile ("int $0x80" \ : "=a" (__res) \ : "0" (__NR_##name),"b" ((long)(arg1)),"c" ((long)(arg2)), \ "d" ((long)(arg3))); \ if (__res>=0) \ return (type) __res; \ errno=-__res; \ return -1; \ } Then you can just put somewhere in your code:
_syscall3(ssize_t, write, int, fd, const void *, buf, size_t, count); which will define the write function for you that correctly executed the system call.
It seems that this system has been replaced by something (I assume that the [vsyscall] page that every process receives) is more reliable.
So, what is the correct way (please specify) for a program to make a system call directly on new Linux kernels? I understand that I have to use libc and let it do this work for me. But let me assume that I have a good reason for wanting to know how to do this :-).
OK, so I looked into it further, since I did not receive from this answer and found good information. First, when the application starts on linux, in addition to the traditional parameters argc, argv, envp. There is another array that is passed with some additional data called auxv. See here for more details.
One of these key / value pairs has a key equivalent to AT_SYSINFO . Defined either in /usr/include/asm/auxvec.h or /usr/include/elf .
The value associated with this key is the entry point to the system call function (on the "vdso" or "vsyscall" page displayed in each process.
You could just replace the traditional int 0x80 or syscall with a call to that address, and it will actually make a system call. Unfortunately, this is ugly. So libc people came up with a nice solution. When they secrete TCB and assign it to the gs segment. They put the value of AT_SYSINFO at some fixed offset in the TCB (unfortunately, it is not fixed in different versions, so you cannot rely on the fact that the offset is always the same constant). So instead of the traditional int 0x80 you can simply say call *%gs:0x10 , which will call the system call procedure found in the vdso section.
I assume the goal here is to simplify the spelling of libc. This allows the libc guys to write one block of code to work with system calls and not have to worry about it again. The kernel guys can change the way they make system calls at any given time, they just need to change the contents of the vdso page to use the new mechanism, which is good. In fact, you will not need to recompile your libc! However, this makes things a pain in the butt for us people who are writing a line assembly and trying to play with things under the hood.
Fortunately, the old way still works if you really want to do something manually :-).
EDIT: One thing I noticed with my experiments is that AT_SYSINFO does not seem to be provided to the program in my x86_64 block ( AT_SYSINFO_EHDR is, but I don't know how to use it). Therefore, I am not 100% sure how the address of the system call function is determined in this situation.