Combining two huge files in C ++

I have two hundred plus megabytes std :: ofstream text files, and I want to concatenate them. Using fstream to store data to create a single file usually results in a memory error, because the size is too large.

Is there a way to merge them faster than O (n)?

File 1 (160 MB):

0 1 3 5 7 9 11 13 ... ... 9187653 9187655 9187657 9187659 

File 2 (120 MB):

 abcdefghij abcdefghji abcdefgihj abcdefgijh ... ... jihgfedcba 

Merge (380 MB):

 0 1 3 5 7 9 11 13 ... ... 9187653 9187655 9187657 9187659 abcdefghij abcdefghji abcdefgihj abcdefgijh ... ... jihgfedcba 

File Generation:

 std::ofstream a_file ( "file1.txt" ); std::ofstream b_file ( "file2.txt" ); while(//whatever){ a_file << num << endl; } while(//whatever){ b_file << character << endl; } // merge them here, doesn't matter if output is one of them or a new file a_file.close(); b_file.close(); 
+6
source share
4 answers

Assuming you don’t want to do any processing, and just want to combine the two files to do the third, you can do it very simply, file buffer streams:

 std::ifstream if_a("a.txt", std::ios_base::binary); std::ifstream if_b("b.txt", std::ios_base::binary); std::ofstream of_c("c.txt", std::ios_base::binary); of_c << if_a.rdbuf() << if_b.rdbuf(); 

I have tried such things with files up to 100 MB in the past and had no problems. You effectively allow C ++ and libraries to handle the required buffering. It also means you don’t have to worry about file position if your files get really big.

An alternative is that you just wanted to copy b.txt to the end of a.txt , in which case you would need to open a.txt using the append flag and search to the end:

 std::ofstream of_a("a.txt", std::ios_base::binary | std::ios_base::app); std::ifstream if_b("b.txt", std::ios_base::binary); of_a.seekp(0, std::ios_base::end); of_a << if_b.rdbuf(); 

How these methods work by passing std::streambuf input streams to operator<< output stream, one of the overrides of which takes the streambuf parameter ( operator << ). As mentioned in this link, in the absence of errors, streambuf inserted unformatted into the output stream to the end of the file.

+14
source

Is there a way to merge them faster than O (n)?

This would mean that you processed the data without skipping it even once. You cannot interpret it for merging without reading it at least once (short answer: no).

To read data, you should consider unbuffered reads (look at std :: fstream :: read).

+5
source

On Windows: -

 system ("copy File1+File2 OutputFile"); 

on Linux: -

 system ("cat File1 File2 > OutputFile"); 

But the answer is simple - do not read the entire file in memory! Read the input files in small blocks: -

 void Cat (input_file, output_file) { while ((bytes_read = read_data (input_file, buffer, buffer_size)) != 0) { write_data (output_file, buffer, bytes_read); } } int main () { output_file = open output file input_file = open input file1 Cat (input_file, output_file) close input_file input_file = open input file2 Cat (input_file, output_file) close input_file } 
+2
source

Actually, it depends on whether you want to use "pure" C ++ for this, personally, due to portability, I would like to write:

 #include <cstdlib> #include <sstream> int main(int argc, char* argv[]) { std::ostringstream command; command << "cat "; // Linux Only, command for Windows is slightly different for (int i = 2; i < argc; ++i) { command << argv[i] << " "; } command << "> "; command << argv[1]; return system(command.str().c_str()); } 

Is C ++ code good? No, not really (not portable and does not shy away from command arguments).

But it will help you get ahead of you where you are right now.

As for the “real” solution in C ++, with all the ugliness that could control threads ...

 #include <fstream> #include <string> static size_t const BufferSize = 8192; // 8 KB void appendFile(std::string const& outFile, std::string const& inFile) { std::ofstream out(outFile, std::ios_base::app | std::ios_base::binary | std::ios_base::out); std::ifstream in(inFile, std::ios_base::binary | std::ios_base::in); std::vector<char> buffer(BufferSize); while (in.read(&buffer[0], buffer.size())) { out.write(&buffer[0], buffer.size()); } // Fails when "read" encounters EOF, // but potentially still writes *some* bytes to buffer! out.write(&buffer[0], in.gcount()); } 
+2
source

Source: https://habr.com/ru/post/956634/


All Articles