How to split a large text file (32 GB) using C #

I tried to split the file into approximately 32 GB using the code below, but I got a memory exception .

Please suggest me split the file using C# .

 string[] splitFile = File.ReadAllLines(@"E:\\JKS\\ImportGenius\\0.txt"); int cycle = 1; int splitSize = Convert.ToInt32(txtNoOfLines.Text); var chunk = splitFile.Take(splitSize); var rem = splitFile.Skip(splitSize); while (chunk.Take(1).Count() > 0) { string filename = "file" + cycle.ToString() + ".txt"; using (StreamWriter sw = new StreamWriter(filename)) { foreach (string line in chunk) { sw.WriteLine(line); } } chunk = rem.Take(splitSize); rem = rem.Skip(splitSize); cycle++; } 
+6
source share
6 answers

Well, for starters, you need to use File.ReadLines (assuming you're using .NET 4) so ​​that it doesn't try to read all of this in memory. Then I just keep calling the method to spit the β€œnext” number of lines into the new file:

 int splitSize = Convert.ToInt32(txtNoOfLines.Text); using (var lineIterator = File.ReadLines(...).GetEnumerator()) { bool stillGoing = true; for (int chunk = 0; stillGoing; chunk++) { stillGoing = WriteChunk(lineIterator, splitSize, chunk); } } ... private static bool WriteChunk(IEnumerator<string> lineIterator, int splitSize, int chunk) { using (var writer = File.CreateText("file " + chunk + ".txt")) { for (int i = 0; i < splitSize; i++) { if (!lineIterator.MoveNext()) { return false; } writer.WriteLine(lineIterator.Current); } } return true; } 
+11
source

Do not read all the lines in the array at once, but use the StremReader.ReadLine method , for example:

 using (StreamReader sr = new StreamReader(@"E:\\JKS\\ImportGenius\\0.txt")) { while (sr.Peek() >= 0) { var fileLine = sr.ReadLine(); //do something with line } } 
+6
source

Instead of immediately reading the entire file using File.ReadAllLines , use File.ReadLines in the foreach loop to read lines as needed.

 foreach (var line in File.ReadLines(@"E:\\JKS\\ImportGenius\\0.txt")) { // Do something } 

Edit: In an unrelated note, you do not need to avoid backslashes when prefixing a string with "@". Therefore, either write "E:\\JKS\\ImportGenius\\0.txt" , or @"E:\JKS\ImportGenius\0.txt" , but @"E:\\JKS\\ImportGenius\\0.txt" is redundant.

+3
source
 File.ReadAllLines 

This will read the entire file in memory.

To work with large files, you only need to read what you need now into memory and then throw it away as soon as you are done with it.

The best option would be File.ReadLines , which returns a lazy enumerator, the data is only read into memory when you get the next line from the enumerator. By leaving you to avoid multiple enumerations (for example, do not use Count() ), only parts of the file will be read.

+3
source

The problem is that you immediately read the contents of the entire file in memory using File.ReadAllLines() . What you need to do is open FileStream with File.OpenRead() and read / write smaller fragments.

Edit: Actually, for your case, ReadLine is obviously better. See Other Answers. :)

0
source

Use StreamReade r to read the file, write using StreamWriter.

0
source

Source: https://habr.com/ru/post/921396/


All Articles