The most effective way is to block by block, simultaneously open all files and reuse the read buffer for writing. Since the information is provided, there is no other template in the data that can be used to speed up.
You will open each file in a different file descriptor to avoid opening and closing each line. Open them all at the beginning or lazily when you go. Close them all until graduation. On most Linux distributions, by default, only 1,024 open files will be available, so you will have to overcome the limitation by, say, using ulimit -n 2600
if you have permission to do this (see also /etc/security/limits.conf
).
Highlight a buffer, say a couple of kb and raw, read from the source file. Iterate and save control variables. Whenever you reach the end or end of a buffer, write from the buffer to the correct file descriptor. There are a few extreme cases that you will have to consider, for example, when a reading gets a new line, but not enough to figure out which file to enter.
You can do the reverse to avoid processing the first few bytes of the buffer if you choose the minimum row size. It will turn out to be a little more complicated, but nevertheless accelerated.
Interestingly, non-blocking I / O takes care of issues like this.
source share