While you can use the second stream to analyze the data after reading it, you probably won't get a huge amount by doing this. Trying to use more than one stream to read data will almost certainly hurt speed, not improve it. Using multiple threads for data processing is pointless - processing will be many times faster than reading, so even with one additional thread, the limit will be on disk speed.
One (possible) way to get significant speed is to bypass regular iostreams - while some of them are almost as fast as using C FILE *, I don’t know anything that is really faster, and some of them are much slower, If you use this on a system (like Windows) that has an input / output model that is noticeably different from C, you can get much more with a little caution.
The problem is quite simple: the file you are reading is (potentially) larger than the available cache space, but you will not get anything from caching, because you are not going to re-read the fragments of the file again (at least if you do something reasonably) . Thus, you want to tell the system to bypass any caching and simply transfer the data as quickly as possible from the disk to your memory, where you can process it. On a Unix-like system, probably open() and read() (and you won’t get much). On Windows, these are CreateFile and ReadFile , passing the FILE_FLAG_NO_BUFFERING flag to CreateFile - and will probably double your speed if you do it right.
You also received some answers that protect the execution of processing using various parallel constructs. I think they are fundamentally wrong. If you are not doing something terribly stupid, the time to count the words in the file will be only a few milliseconds longer than it takes to just read the file.
The structure that I would use would be to have two buffers, say, megabytes apiece. Reading data into one buffer. Turn this buffer into the counting stream to count the words in this buffer. While this happens, read the data in the second buffer. When this is done, basically clipboards and continue. There is a bit of extra processing that you will need to do when exchanging buffers to process a word that can cross the border from one buffer to another, but this is pretty trivial (basically, if the buffer doesn't end with white space, you're still in one word, when start working with the next data buffer).
As long as you are sure that it will only be used on a multiprocessor (multi-core) machine, using real threads will be great. If it is likely that this could be done on a single-core computer, you might be better off using a single thread with overlapping I / O.