In C ++, how to read a single file with multiple threads?

Question

In C ++, how to read a single file with multiple threads?

I am reading a CSV file from a local hard drive using VS2012 in Windows 7, 64-bit, 8 core.

The file I am reading has 50,000 lines and each line has 200+ attributes, so reading the data and passing it to the appropriate variables takes a lot of time. So I'm wondering if I can speed it up with multithreads so that each thread reads a part of the file.

I looked for this information and found that someone said that since the hard drive is not multithreaded, using multiple threads for this will slow down. It's true?

If you can read a file with multiple streams, can someone give me an example from which I can learn?

Also, is it possible to explicitly assign a thread or task to the processor core?

And the last question: I read the same file with Python, and it was completed after a few seconds. Can I find out why Python reads faster than C ++?

+4

c ++ python multithreading readfile

ChangeMyName Jul 31 '13 at 13:39

source share

1 answer

Max feldkamp · Accepted Answer · 2013-08-01T15:30:03+0000

Reading a file requires creating a system call in any language or OS, which means a call to the base operating system and waiting for its contents to be placed in memory for you (provided that you pass OS security checks and all that). Multithreading a file will really slow you down, as you will make more system calls that exit the program and control your hands in the operating system.

Thus, hyde is the best deal - perhaps split file parsing into multiple threads if necessary. If you can parse a file that would be large in seconds, I would say that it really is not worth it. For example, if you are using a graphical application, you definitely want to keep a separate stream for downloading files, so as not to block your interface.

As for speed, I would suggest that there are two main problems. Firstly, I suspect that python reads files through the memory buffer by default, which speeds up execution. If you can buffer reading your file (so you can make fewer system calls), you can see a performance boost. Another problem will be which data structures you use in Python and C ++ to load / analyze data. Without knowing my code, I cannot offer anything concrete, but spending a little time researching / thinking about the various data structures applicable to your program may be useful. Keep in mind that Python and C ++ data structures have very different performance profiles, so one that works well in Python may be a much worse choice in C ++.

Edit: A simple example of using file buffering in C ++ STL from http://www.cplusplus.com/reference/

// read a file into buffer - sgetn() example #include <iostream> // std::cout, std::streambuf, std::streamsize #include <fstream> // std::ifstream int main () { char* contents; std::ifstream istr ("test.txt"); if (istr) { std::streambuf * pbuf = istr.rdbuf(); std::streamsize size = pbuf->pubseekoff(0,istr.end); pbuf->pubseekoff(0,istr.beg); // rewind contents = new char [size]; pbuf->sgetn (contents,size); istr.close(); std::cout.write (contents,size); } return 0; }

In C ++, how to read a single file with multiple threads?

More articles: