Golang os * File.Readdir using lstat for all files. Is it possible to optimize it?

I am writing a program that finds all the subdirectories from the parent directory, which contains a huge number of files using os.File.Readdir , but running strace to see the number of system codes showed that the go version uses lstat() for all files / directories, present in the parent directory. (I am testing this using the /usr/bin )

Go code:

 package main import ( "fmt" "os" ) func main() { x, err := os.Open("/usr/bin") if err != nil { panic(err) } y, err := x.Readdir(0) if err != nil { panic(err) } for _, i := range y { fmt.Println(i) } } 

Strace in the program (without the following threads):

 % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 93.62 0.004110 2 2466 write 3.46 0.000152 7 22 getdents64 2.92 0.000128 0 2466 lstat // this increases with increase in no. of files. 0.00 0.000000 0 11 mmap 0.00 0.000000 0 1 munmap 0.00 0.000000 0 114 rt_sigaction 0.00 0.000000 0 8 rt_sigprocmask 0.00 0.000000 0 1 sched_yield 0.00 0.000000 0 3 clone 0.00 0.000000 0 1 execve 0.00 0.000000 0 2 sigaltstack 0.00 0.000000 0 1 arch_prctl 0.00 0.000000 0 1 gettid 0.00 0.000000 0 57 futex 0.00 0.000000 0 1 sched_getaffinity 0.00 0.000000 0 1 openat ------ ----------- ----------- --------- --------- ---------------- 100.00 0.004390 5156 total 

I tested the same with C readdir() without seeing this behavior.

C code:

 #include <stdio.h> #include <dirent.h> int main (void) { DIR* dir_p; struct dirent* dir_ent; dir_p = opendir ("/usr/bin"); if (dir_p != NULL) { // The readdir() function returns a pointer to a dirent structure representing the next // directory entry in the directory stream pointed to by dirp. // It returns NULL on reaching the end of the directory stream or if an error occurred. while ((dir_ent = readdir (dir_p)) != NULL) { // printf("%s", dir_ent->d_name); // printf("%d", dir_ent->d_type); if (dir_ent->d_type == DT_DIR) { printf("%s is a directory", dir_ent->d_name); } else { printf("%s is not a directory", dir_ent->d_name); } printf("\n"); } (void) closedir(dir_p); } else perror ("Couldn't open the directory"); return 0; } 

Strace in the program:

 % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 100.00 0.000128 0 2468 write 0.00 0.000000 0 1 read 0.00 0.000000 0 3 open 0.00 0.000000 0 3 close 0.00 0.000000 0 4 fstat 0.00 0.000000 0 8 mmap 0.00 0.000000 0 3 mprotect 0.00 0.000000 0 1 munmap 0.00 0.000000 0 3 brk 0.00 0.000000 0 3 3 access 0.00 0.000000 0 1 execve 0.00 0.000000 0 4 getdents 0.00 0.000000 0 1 arch_prctl ------ ----------- ----------- --------- --------- ---------------- 100.00 0.000128 2503 3 total 

I know that the only fields in the dirent structure that are set by POSIX.1 are d_name and d_ino, but I am writing this for a specific file system.

I tried *File.Readdirnames() , which does not use lstat and gives a list of all files and directories, but to see if the returned line is a file, or the directory will eventually make lstat again.

  • I was wondering if it is possible to rewrite the go program so that it is not necessary to avoid lstat() in all files. I could see that program C uses the following system calls. open("/usr/bin", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 3 fstat(3, {st_mode=S_IFDIR|0755, st_size=69632, ...}) = 0 brk(NULL) = 0x1098000 brk(0x10c1000) = 0x10c1000 getdents(3, /* 986 entries */, 32768) = 32752
  • Is this a bit of premature optimization that I should not worry about? I raised this question because the number of files in a controlled directory will have a huge number of small archive files, and the difference in system codes is almost two times higher than the version of C and GO that will get to disk.
+6
source share
1 answer

The dirent package looks like it is doing what you are looking for. The following is an example of your C written in Go:

 package main import ( "bytes" "fmt" "io" "github.com/EricLagergren/go-gnulib/dirent" "golang.org/x/sys/unix" ) func int8ToString(s []int8) string { var buff bytes.Buffer for _, chr := range s { if chr == 0x00 { break } buff.WriteByte(byte(chr)) } return buff.String() } func main() { stream, err := dirent.Open("/usr/bin") if err != nil { panic(err) } defer stream.Close() for { entry, err := stream.Read() if err != nil { if err == io.EOF { break } panic(err) } name := int8ToString(entry.Name[:]) if entry.Type == unix.DT_DIR { fmt.Printf("%s is a directory\n", name) } else { fmt.Printf("%s is not a directory\n", name) } } } 
+4
source

Source: https://habr.com/ru/post/1013691/


All Articles