I am writing a program that finds all the subdirectories from the parent directory, which contains a huge number of files using os.File.Readdir
, but running strace
to see the number of system codes showed that the go version uses lstat()
for all files / directories, present in the parent directory. (I am testing this using the /usr/bin
)
Go code:
package main import ( "fmt" "os" ) func main() { x, err := os.Open("/usr/bin") if err != nil { panic(err) } y, err := x.Readdir(0) if err != nil { panic(err) } for _, i := range y { fmt.Println(i) } }
Strace in the program (without the following threads):
% time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 93.62 0.004110 2 2466 write 3.46 0.000152 7 22 getdents64 2.92 0.000128 0 2466 lstat // this increases with increase in no. of files. 0.00 0.000000 0 11 mmap 0.00 0.000000 0 1 munmap 0.00 0.000000 0 114 rt_sigaction 0.00 0.000000 0 8 rt_sigprocmask 0.00 0.000000 0 1 sched_yield 0.00 0.000000 0 3 clone 0.00 0.000000 0 1 execve 0.00 0.000000 0 2 sigaltstack 0.00 0.000000 0 1 arch_prctl 0.00 0.000000 0 1 gettid 0.00 0.000000 0 57 futex 0.00 0.000000 0 1 sched_getaffinity 0.00 0.000000 0 1 openat ------ ----------- ----------- --------- --------- ---------------- 100.00 0.004390 5156 total
I tested the same with C readdir()
without seeing this behavior.
C code:
#include <stdio.h>
Strace in the program:
% time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 100.00 0.000128 0 2468 write 0.00 0.000000 0 1 read 0.00 0.000000 0 3 open 0.00 0.000000 0 3 close 0.00 0.000000 0 4 fstat 0.00 0.000000 0 8 mmap 0.00 0.000000 0 3 mprotect 0.00 0.000000 0 1 munmap 0.00 0.000000 0 3 brk 0.00 0.000000 0 3 3 access 0.00 0.000000 0 1 execve 0.00 0.000000 0 4 getdents 0.00 0.000000 0 1 arch_prctl ------ ----------- ----------- --------- --------- ---------------- 100.00 0.000128 2503 3 total
I know that the only fields in the dirent structure that are set by POSIX.1 are d_name and d_ino, but I am writing this for a specific file system.
I tried *File.Readdirnames()
, which does not use lstat
and gives a list of all files and directories, but to see if the returned line is a file, or the directory will eventually make lstat
again.
- I was wondering if it is possible to rewrite the go program so that it is not necessary to avoid
lstat()
in all files. I could see that program C uses the following system calls. open("/usr/bin", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 3 fstat(3, {st_mode=S_IFDIR|0755, st_size=69632, ...}) = 0 brk(NULL) = 0x1098000 brk(0x10c1000) = 0x10c1000 getdents(3, /* 986 entries */, 32768) = 32752
- Is this a bit of premature optimization that I should not worry about? I raised this question because the number of files in a controlled directory will have a huge number of small archive files, and the difference in system codes is almost two times higher than the version of
C
and GO
that will get to disk.
nohup source share