Recording Operation Cost [Go lang]

I have a Go program that writes lines to a file. I have a loop that repeats 20,000 times , and in each iteration I write about 20-30 lines to a file. I just wanted to know which one is best written to a file.

  • Approach 1: continue to open the file pointer at the beginning of the code and write it for each line. It does 20,000 * 30 write operations.

  • Approach 2: Use bytes.Buffer Go and save everything in the buffer and write it at the end. Also in this case, the file pointer must be open from the beginning of the code or at the end of the code. Is this important?

I assume approach 2 should work better. Can someone confirm this for a reason. How to write immediately is better than writing periodically. Because the file pointer will be open. I use f.WriteString(<string>) and buffer.WriteString(<some string>) buffer is of type bytes.Buffer and f is the open file pointer.

+5
source share
4 answers

bufio package is designed specifically for this kind of tasks. Instead of making syscall for each Write call bufio.Writer , it buffers to a fixed number of bytes in internal memory before doing syscall. After syscall reuses the internal buffer for the next piece of data

Comparison with your second approach bufio.Writer

  • makes more system calls ( N/S instead of 1 )
  • uses less memory ( S bytes instead of N bytes)

where S is the size of the buffer (can be specified via bufio.NewWriterSize ), N is the total size of the data to be written.

Usage Example ( https://play.golang.org/p/AvBE1d6wpT ):

 f, err := os.Create("file.txt") if err != nil { log.Fatal(err) } defer f.Close() w := bufio.NewWriter(f) fmt.Fprint(w, "Hello, ") fmt.Fprint(w, "world!") err = w.Flush() // Don't forget to flush! if err != nil { log.Fatal(err) } 
+5
source

Operations that take time to write to files are system calls and disk I / O. The fact that the file pointer is open does not cost you anything. So naively, we could say that the second method is best.

Now, as you may know, you do not directly write files to files, it uses an internal cache in memory for files that are written, and does real I / O later. I do not know the exact details of this, and in general I do not need.

What I would suggest is a midpoint solution: make a buffer for each iteration of the loop and write it once. Thus, in order to reduce a significant part of the number of system calls and (potentially) writing to disk, but not consuming too much memory using a buffer (depending on the size of your lines, I have to take this into account).

I would suggest benchmarking for a better solution, but because of the caching performed by the system, a comparative I / O disk is a real nightmare.

+3
source

Syscalls is not cheap, so the second approach is better.

You can use the lat_syscall tool from lmbench to determine how long it takes to invoke a single write :

 $ ./lat_syscall write Simple write: 0.1522 microseconds 

So, on my system, it will take about 20,000 * 0.15 μs = 3 ms of extra time to call write for each line.

+1
source

I would question the complexity and complexity of the code using bufio.Writer, not just io.Writer, when the OS will already be a buffer before synchronizing with the disk. This is similar to buffer buffering. I think that you really need to navigate your system to make sure that it really stands forever in your code.

0
source

Source: https://habr.com/ru/post/1239970/


All Articles