How can I detect file access on Linux?

I have many threads and data processing applications that I sometimes need to track, that is, I need to know which files they read. This mainly helps pack test files, but it can also be useful when debugging.

Is there a way to run executables in such a way as to create such a list?

I have two thoughts on this:

  • There is a command that I can invoke, and this command invokes my applications. Something like GDB lines. I call GDB, give it the path to the executable and some arguments, and GDB calls it for me. Perhaps there is something similar to telling me how system resources are used.
  • Perhaps a more interesting (but unnecessary side) solution.
    • create a library called libc.so that implements fopen (and some others)
    • change LD_LIBRARY_PATH to the new library
    • make a copy of the real libc.so and rename fopen (maybe, possibly) in the editor
    • my library downloads a copy and calls the renamed function as needed to provide fopen functionality.
    • call the application, which then calls my fopen proxy.

Alternate number 1 would certainly be preferred, but comments on how to make # 2 easier are also welcome.

+10
c ++ linux resources
May 18 '09 at 23:24
source share
3 answers

One option is to use strace:

strace -o logfile -eopen yourapp 

This will log all file open events, but it imposes a performance penalty, which can be significant. The advantage of this is that it is easy to use.

Another option is to use LD_PRELOAD. This matches your option # 2. The basic idea is to do something like this:

 #define _GNU_SOURCE #include <stdio.h> #include <dlfcn.h> int open(const char *fn, int flags) { static int (*real_open)(const char *fn, int flags); if (!real_open) { real_open = dlsym(RTLD_NEXT, "open"); } fprintf(stderr, "opened file '%s'\n", fn); return real_open(fn, flags); } 

Then build with:

 gcc -fPIC -shared -ldl -o preload-example.so preload-example.c 

And run your program, for example:

 $ LD_PRELOAD=$PWD/preload-example.so cat /dev/null opened file '/dev/null' 

This is much less overhead.

Note, however, that there are other entry points for opening files - for example, fopen (), openat (), or one of many entry points for legacy compatibility:

 00000000000747d0 g DF .text 000000000000071c GLIBC_2.2.5 _IO_file_fopen 0000000000068850 g DF .text 000000000000000a GLIBC_2.2.5 fopen 000000000006fe60 g DF .text 00000000000000e2 GLIBC_2.4 open_wmemstream 00000000001209c0 w DF .text 00000000000000ec GLIBC_2.2.5 posix_openpt 0000000000069e50 g DF .text 00000000000003fb GLIBC_2.2.5 _IO_proc_open 00000000000dcf70 g DF .text 0000000000000021 GLIBC_2.7 __open64_2 0000000000068a10 g DF .text 00000000000000f5 GLIBC_2.2.5 fopencookie 000000000006a250 g DF .text 000000000000009b GLIBC_2.2.5 popen 00000000000d7b10 w DF .text 0000000000000080 GLIBC_2.2.5 __open64 0000000000068850 g DF .text 000000000000000a GLIBC_2.2.5 _IO_fopen 00000000000d7e70 w DF .text 0000000000000020 GLIBC_2.7 __openat64_2 00000000000e1ef0 g DF .text 000000000000005b GLIBC_2.2.5 openlog 00000000000d7b10 w DF .text 0000000000000080 GLIBC_2.2.5 open64 0000000000370c10 g DO .bss 0000000000000008 GLIBC_PRIVATE _dl_open_hook 0000000000031680 g DF .text 0000000000000240 GLIBC_2.2.5 catopen 000000000006a250 g DF .text 000000000000009b GLIBC_2.2.5 _IO_popen 0000000000071af0 g DF .text 000000000000026a GLIBC_2.2.5 freopen64 00000000000723a0 g DF .text 0000000000000183 GLIBC_2.2.5 fmemopen 00000000000a44f0 w DF .text 0000000000000088 GLIBC_2.4 fdopendir 00000000000d7e70 g DF .text 0000000000000020 GLIBC_2.7 __openat_2 00000000000a3d00 w DF .text 0000000000000095 GLIBC_2.2.5 opendir 00000000000dcf40 g DF .text 0000000000000021 GLIBC_2.7 __open_2 00000000000d7b10 w DF .text 0000000000000080 GLIBC_2.2.5 __open 0000000000074370 g DF .text 00000000000000d7 GLIBC_2.2.5 _IO_file_open 0000000000070b40 g DF .text 00000000000000d2 GLIBC_2.2.5 open_memstream 0000000000070450 g DF .text 0000000000000272 GLIBC_2.2.5 freopen 00000000000318c0 g DF .text 00000000000008c4 GLIBC_PRIVATE __open_catalog 00000000000d7b10 w DF .text 0000000000000080 GLIBC_2.2.5 open 0000000000067e80 g DF .text 0000000000000332 GLIBC_2.2.5 fdopen 000000000001e9b0 g DF .text 00000000000003f5 GLIBC_2.2.5 iconv_open 00000000000daca0 g DF .text 000000000000067b GLIBC_2.2.5 fts_open 00000000000d7d60 w DF .text 0000000000000109 GLIBC_2.4 openat 0000000000068850 w DF .text 000000000000000a GLIBC_2.2.5 fopen64 00000000000d7d60 w DF .text 0000000000000109 GLIBC_2.4 openat64 00000000000d6490 g DF .text 00000000000000b6 GLIBC_2.2.5 posix_spawn_file_actions_addopen 0000000000121b80 g DF .text 000000000000008a GLIBC_PRIVATE __libc_dlopen_mode 0000000000067e80 g DF .text 0000000000000332 GLIBC_2.2.5 _IO_fdopen 

You may need to hook all of this for completeness - at least those that do not have the _ prefix should be connected. In particular, be sure to hook fopen separately, since the internal libc call from the fopen () function to open () is not connected by the LD_PRELOAD library.

A similar caveat applies to strace - there is an openat script, and depending on your architecture there may be other outdated system calls. But not as much as with LD_PRELOAD hooks, so if you are not against a performance hit, this might be an easier option.

+13
May 18 '09 at 23:27
source share
 man strace 
Example

(suppose 2343 is a process identifier):

 # logging part strace -p 2343 -ff -o strace_log.txt # displaying part grep ^open strace_log.txt 
+4
May 18, '09 at 23:26
source share

I am using something like:

 strace -o file.txt ./command 

Then you can

 cat file.txt | grep open 

to get a list of all files opened by the program.

+2
May 18, '09 at 23:28
source share



All Articles