How to read in a large flat file in the Golang

I have a flat file that has 339276 lines of text with a size of 62.1 MB. I try to read in all the lines, analyze them based on some conditions that I have, and then insert them into the database.

At first I tried using the bufio.Scan () and bufio.Text () loops to get the string, but I ran out of free space. I switched to using bufio.ReadLine / ReadString / ReadByte (I tried each) and had the same problem with each. I did not have enough buffer space.

I tried to use reading and setting the buffer size, but since the document says that it is actually a constant that can be reduced, but not more than 64 * 1024 bytes. Then I tried to use File.ReadAt, where I set the initial mail item and moved it when I brought to each section to no avail. I looked at the following examples and explanations (not an exhaustive list):

Read a text file into an array of lines (and write) How to read the last lines from a large file with Go every 10 seconds reading a file line by line in go

How can I read the entire file (either one at a time, or all at once) into a slice so that I can then do something in lines?

Here is the code I tried:

file, err := os.Open(feedFolder + value) handleError(err) defer file.Close() // fileInfo, _ := file.Stat() var linesInFile []string r := bufio.NewReader(file) for { path, err := r.ReadLine("\n") // 0x0A separator = newline linesInFile = append(linesInFile, path) if err == io.EOF { fmt.Printf("End Of File: %s", err) break } else if err != nil { handleError(err) // if you return error } } fmt.Println("Last Line: ", linesInFile[len(linesInFile)-1]) 

Here is what I have tried:

 var fileSize int64 = fileInfo.Size() fmt.Printf("File Size: %d\t", fileSize) var bufferSize int64 = 1024 * 60 bytes := make([]byte, bufferSize) var fullFile []byte var start int64 = 0 var interationCounter int64 = 1 var currentErr error = nil for currentErr != io.EOF { _, currentErr = file.ReadAt(bytes, st) fullFile = append(fullFile, bytes...) start = (bufferSize * interationCounter) + 1 interationCounter++ } fmt.Printf("Err: %s\n", currentErr) fmt.Printf("fullFile Size: %s\n", len(fullFile)) fmt.Printf("Start: %d", start) var currentLine []string for _, value := range fullFile { if string(value) != "\n" { currentLine = append(currentLine, string(value)) } else { singleLine := strings.Join(currentLine, "") linesInFile = append(linesInFile, singleLine) currentLine = nil } } 

I'm at a loss. Either I do not understand how the buffer works, or I do not understand the other. Thanks for reading.

+6
source share
3 answers

bufio.Scan() and bufio.Text() in a loop works fine for me in much larger files, so I assume that you have lines that exceed the buffer capacity. Then

  • check line completion
  • and what version of Go are you using path, err :=r.ReadLine("\n") // 0x0A separator = newline ? It looks like func (b *bufio.Reader) ReadLine() (line []byte, isPrefix bool, err error) has an isPrefix return value specifically for your use case http://golang.org/pkg/bufio/#Reader. ReadLine
+5
source

It is not clear that before parsing and inserting them into the database, you must read all the lines. Try to avoid this.

You have a small file: "A flat file that has 339,276 lines of text with a size of 62.1 MB." For instance,

 package main import ( "bytes" "fmt" "io" "io/ioutil" ) func readLines(filename string) ([]string, error) { var lines []string file, err := ioutil.ReadFile(filename) if err != nil { return lines, err } buf := bytes.NewBuffer(file) for { line, err := buf.ReadString('\n') if len(line) == 0 { if err != nil { if err == io.EOF { break } return lines, err } } lines = append(lines, line) if err != nil && err != io.EOF { return lines, err } } return lines, nil } func main() { // a flat file that has 339276 lines of text in it for a size of 62.1 MB filename := "flat.file" lines, err := readLines(filename) fmt.Println(len(lines)) if err != nil { fmt.Println(err) return } } 
+3
source

It seems to me that this readLines option is shorter and faster than the proposed peterSO

 func readLines(filename string) (map[int]string, error) { lines := make(map[int]string) data, err := ioutil.ReadFile(filename) if err != nil { return nil, err } for n, line := range strings.Split(string(data), "\n") { lines[n] = line } return lines, nil } 
0
source

Source: https://habr.com/ru/post/984346/


All Articles