Reading a file requires creating a system call in any language or OS, which means a call to the base operating system and waiting for its contents to be placed in memory for you (provided that you pass OS security checks and all that). Multithreading a file will really slow you down, as you will make more system calls that exit the program and control your hands in the operating system.
Thus, hyde is the best deal - perhaps split file parsing into multiple threads if necessary. If you can parse a file that would be large in seconds, I would say that it really is not worth it. For example, if you are using a graphical application, you definitely want to keep a separate stream for downloading files, so as not to block your interface.
As for speed, I would suggest that there are two main problems. Firstly, I suspect that python reads files through the memory buffer by default, which speeds up execution. If you can buffer reading your file (so you can make fewer system calls), you can see a performance boost. Another problem will be which data structures you use in Python and C ++ to load / analyze data. Without knowing my code, I cannot offer anything concrete, but spending a little time researching / thinking about the various data structures applicable to your program may be useful. Keep in mind that Python and C ++ data structures have very different performance profiles, so one that works well in Python may be a much worse choice in C ++.
Edit: A simple example of using file buffering in C ++ STL from http://www.cplusplus.com/reference/
// read a file into buffer - sgetn() example #include <iostream> // std::cout, std::streambuf, std::streamsize #include <fstream> // std::ifstream int main () { char* contents; std::ifstream istr ("test.txt"); if (istr) { std::streambuf * pbuf = istr.rdbuf(); std::streamsize size = pbuf->pubseekoff(0,istr.end); pbuf->pubseekoff(0,istr.beg); // rewind contents = new char [size]; pbuf->sgetn (contents,size); istr.close(); std::cout.write (contents,size); } return 0; }
source share