How to increase the speed of my C ++ program when reading delimited text files?

I will show you C # and C ++ code that perform the same task: to read the same text file, separated by the "|" and save with the text delimited by "#".

When I execute the program in C ++, the elapsed time is 169 seconds.

UPDATE 1: Thanks to Seth (compilation with: cl / EHsc / Ox / Ob2 / Oi) and GWW for changing the position of line s outside the loops, elapsed time has been reduced to 53 seconds. I also updated the code.

UPDATE 2: Do you have any suggestions for improving C ++ code?

When I run the C # program, the elapsed time is 34 seconds!

The question is, how can I increase the speed of C ++ compared to C # one?

C ++ program:

int main () { Timer t; cout << t.ShowStart() << endl; ifstream input("in.txt"); ofstream output("out.txt", ios::out); char const row_delim = '\n'; char const field_delim = '|'; string s1, s2; while (input) { if (!getline( input, s1, row_delim )) break; istringstream iss(s1); while (iss) { if (!getline(iss, s2, field_delim )) break; output << s2 << "#"; } output << "\n"; } t.Stop(); cout << t.ShowEnd() << endl; cout << "Executed in: " << t.ElapsedSeconds() << " seconds." << endl; return 0; } 

C # program:

  static void Main(string[] args) { long i; Stopwatch sw = new Stopwatch(); Console.WriteLine(DateTime.Now); sw.Start(); StreamReader sr = new StreamReader("in.txt", Encoding.Default); StreamWriter wr = new StreamWriter("out.txt", false, Encoding.Default); object[] cols = new object[0]; // allocates more elements automatically when filling string line; while (!string.Equals(line = sr.ReadLine(), null)) // Fastest way { cols = line.Split('|'); // Faster than using a List<> foreach (object col in cols) wr.Write(col + "#"); wr.WriteLine(); } sw.Stop(); Console.WriteLine("Conteo tomΓ³ {0} secs", sw.Elapsed); Console.WriteLine(DateTime.Now); } 

UPDATE 3:

Well, I have to say that I am very happy for the help received and because the answer to my question is satisfied.

I slightly altered the text of the question, and I tested the solutions, which were kindly raised by Molbdlino and Bo Persson.

Saving Seth readings for the compilation command (i.e. cl / EHsc / Ox / Ob2 / Oi pgm.cpp):

Bo Persson's solution took an average of 18 seconds to complete, really good, considering the code is close to what I like).

Molbdlino's solution took 6 seconds on average, really awesome !! (also thanks to Konstantin).

It is never too late to learn, and I have learned valuable things with my question.

Best wishes.

+2
source share
5 answers

As Konstantin suggests, read large chunks at a time using read .

I reduce the time from ~ 25s to ~ 3s in a 129M file with 5M "records" (26 bytes each) in 100,000 lines.

 #include <iostream> #include <fstream> #include <sstream> #include <algorithm> using namespace std; int main () { ifstream input("in.txt"); ofstream output("out.txt", ios::out); const size_t size = 512 * 1024; char buffer[size]; while (input) { input.read(buffer, size); size_t readBytes = input.gcount(); replace(buffer, buffer+readBytes, '|', '#'); output.write(buffer, readBytes); } input.close(); output.close(); return 0; } 
+7
source

How about this for the central loop

 while (getline( input, s1, row_delim )) { for (string::iterator c = s1.begin(); c != s1.end(); ++c) if (*c == field_delim) *c = '#'; output << s1 << '\n'; } 
+4
source

It seems to me that your slow part is within getline . I do not have clear documentation that would support my idea, but this is what seems to me. You should try using read . Since getline has a delimiter, so it needs to check every character to see if it finds a delimiter, so it looks like several in operations, so your program accesses the character in the file and then writes it to your program, in other words, the time it takes on the movement of the disk head. But if you use the read function, you copy a block of characters and then work with them in the program memory, which can reduce time.

PS again, I do not have documentation on getline and how it works, but I am sure of read , I hope that it will be useful.

+2
source

If you know the maximum length of a line, you can stdio + fgets and lines with zero completion, it will swing.

For C #, if it fits in memory (maybe not, if it takes 34 seconds), I would be interested to see how IO.File.WriteAllText("out.txt",IO.File.ReadAllText("in.txt").Replace("|","#")); !

+1
source

I would be very surprised if this version beat @molbdnilo, but it is probably the second fastest, and (I would put) the simplest and cleanest:

 #include <fstream> #include <string> #include <sstream> #include <algorithm> int main() { std::ifstream in("in.txt"); std::ostringstream buffer; buffer << in.rdbuf(); std::string s(buffer.str()); std::replace(s.begin(), s.end(), '|', '#'); std::ofstream out("out.txt"); out << s; return 0; } 

Based on past experience with this method, I expect it to be no worse than half the speed of what @molbdnilo posted - which should still be three times faster than your C # version and more than ten times faster than your original C ++ version. [Edit: I just wrote a file generator, and the file is just over 100 megabytes, it is even closer than I expected - I get 4.4 seconds versus 3.5 for @molbdnilo code.] The combination of reasonable speed with a really short, simple The code is often a pretty decent compromise. Of course, all this is due to the fact that you have enough physical memory to store the entire contents of the file in memory, but overall this is a pretty safe assumption.

0
source

Source: https://habr.com/ru/post/895653/


All Articles