Get path from file descriptor if path is greater than PATH_MAX

Question

Get path from file descriptor if path is greater than PATH_MAX

I get file system events from fanotify. Sometimes I want to get the absolute path to the file being accessed.

This is usually not a problem - fanotify_event_metadata contains a file descriptor fd , so I can call readlink on /proc/self/fd/<fd> and get my path.

However, if the path exceeds PATH_MAX readlink , it can no longer be used - it fails with ENAMETOOLONG . I am wondering if there is a way to get the file path in this case.

Obviously, I can fstat descriptor I get from fanaticization and move the entire file system looking for files with the same identifier and identifier number. But this approach is impossible for me in terms of performance (even if I optimize it to ignore paths shorter than PATH_MAX ).

I tried to get the parent directory by reopening fd with O_PATH and calling openat(fd, "..", ...) . Obviously, this did not work because fd does not reference the directory. I also tried checking the contents of the buffer after an unsuccessful readlink call (hoping that it contains a partial path). That didn't work either.

So far, I have managed to get long paths for files inside the working directory of the process that opened them (fanatical events contain the pid target process, so I can read /proc/<pid>/cwd and get the path to the root from there). But this is a partial solution.

Is there a way to get the absolute path from a file descriptor without going through the entire file system? Preferably one that will work with the 2.6.32 / glibc 2.11 kernel.

Update: for the curious. I understood why calling readlink("/proc/self/fd/<fd>", ... with a buffer large enough to hold the entire path does not work.

Take a look at the do_proc_readlink implementation. Note that it does not use the provided buffer directly. Instead, it selects one page and uses it as a temporary buffer when it calls d_path . In other words, no matter how large the buffer , d_path will always be limited by page size. This is 4096 bytes on amd64. Same as PATH_MAX ! -ENAMETOOLONG itself returns prepend when the specified page ends.

+6

c linux fanotify

Nikita Kakuev Oct 26 '16 at 11:28

source share

2 answers

Msalters · Answer 1 · 2016-10-26T12:40:47+0000

readlink can be used for link purposes that are longer than PATH_MAX . There are two limitations: the name of the link itself must be shorter than PATH_MAX (check, "/proc/self/fd/<fd>" is about 20 characters), and the provided output buffer must be large enough. You can first call lstat to find out how large the output buffer should be, or just call readlink several times with growing buffers.

Luis colorado · Answer 2 · 2016-10-26T21:22:11+0000

restriction PATH_MAX births from the fact that unix (or linux, henceforth) needs to bind the size of parameters passed to the kernel. There is no limit to how deep the file hierarchy can grow, and there is always the ability to access all files, no matter how deep they are in the file system hierarchy. What is actually limited is the length of the string that you can pass or receive from the kernel representing the file name. This means that you cannot create (because you need to pass the target path) a symbolic link longer than this length, but you can easily go through the path to this limit.

When you pass the file name to the kernel, you can do this for two reasons, to name the file (or device, or socket, or fifo, or something else), to open it, etc. You do this and your file name first goes to a routine that converts this path into an inode (which actually controls the kernel). This procedure starts scanning from two possible points in the file system hierarchy. These points are the inode link of the root inode and the inode pointer of the current workflow. The choice of which index to use as the departure inode depends on the presence of the leading character / at the beginning of the path. From this moment, the PATH_MAX characters are processed each time, but this can lead us deep enough so that we cannot reach the root in just one step ...

Suppose you use the path to change the current directory and run chdir A/B/C/D/E/.../Z After that, you create new directories and do the same, chdir AA/AB/AC/AD/AE/.../AZ , then chdir BA/BB/BC/BD/... etc ... there is nothing in the system that would prevent you from penetrating so deep into the file system (you can try it yourself, I already did and tested before). You can grow to a map that is much larger than PATH_MAX . But this only means that you cannot get directly from the root of the file system. You can go there in steps, as far as the system allows, and depending on where you fix your root directory (using syscall chdir (2))

you may have noticed (or not) that there is no system call to get your working directory path from the root ... There are several reasons for this:

the inode root and the working inode descriptor are two concepts from the local to the process. Two processes on the same system can have different working directories, as well as different root directories, to the extent that they can share nothing in common and in no way from one directory to reach another.
The index path may be ambiguous. Well, this does not apply to the directory, since two hard links are not allowed to point to the same inode directory (this was possible in old organizations where directories were to be created with the mknod(2) system call, if you have access to some hp-ux v6 or old Unix SysV R4, you can create directories with a record ... --- pointing to a catalog robbery or similar things, just being root and knowing how to use syscall mknod(2) ), the idea is is that when two links point to the same index, which (or both) then goes to the root, like first of them is the right path from the root to the current directory inode?
curren inode and root can be separated by a path far enough not to meet the PATH_MAX limit.
there may be several different file systems (and file system types) involved in root access. Thus, this is not something that can be obtained, only knowing the stored data on the disks, you should know the mount table.

For these reasons, there is no direct support in the kernel to find out the root path to the file. And also there is no way to get the path (and this is what the pwd(1) command does) than to follow the entry .. and go to the parent directory and look for the link there that goes to the inode number of the current directory ... and repeat this until until the parent index becomes the same as the last inode visited. Only then will you be in the root directory (your root directory, which is generally different from other process root directories)

Just try this exercise:

 i=0 while [ "$i" -lt 10000 ] do mkdir dir-$i cd dir-$i i=$(expr "$i" + 1) done

and see how far you can go from the root directory in your hierarchy.

NOTE 1

Another reason that you cannot get the file path from the open descriptor is that you only have access to the inode (the path you used for open(2) , it cannot have anything to do with the actual root path, as you can use symbolic links and in relation to the working directory or changing the root directory between an open call and the time when you want to access this path, it may not even exist, since you can have unlink(2) d it). Information about using inode does not have a link to the inode path, since there can be several (even millions) paths to a file. In inode, you only have the number of links, which means the number of paths that actually end with this inode.

Get path from file descriptor if path is greater than PATH_MAX

NOTE 1

More articles: