I process data from a hard disk from one large file (processing is fast and not too much overhead), and then I have to write the results back (hundreds of thousands of files).
I started writing the results right away in the files one at a time, which was the slowest option. I realized that this happens much faster if I create a vector of a certain number of files, and then write them right away, and then return to processing while the hard drive is busy writing everything that I poured into it (this at least seems to be what's happening).
My question is: can I somehow estimate the value of convergence for the amount of data that I have to write from hardware limitations? It seems to me that this is a hard drive, I have 16 MB of buffer on this hard drive and get these values (all for ~ 100000 files):
Buffer size time (minutes)
------------------------------
no Buffer ~ 8:30
1 MB ~ 6:15
10 MB ~ 5:45
50 MB ~ 7:00
Or is it just a coincidence?
I would also be interested in experience / rules on how to improve write performance in general, for example, large blocks on the hard disk are useful, etc.
Edit:
Hardware is a pretty standard consumer drive (I'm a student, not a data center). WD 3.5 1TB / 7200 / 16MB / USB2, HFS + Log, OS - MacOS 10.5. I'll try Ext3 / Linux and the internal drive soon, not the external one).
source
share