Writing data blocks during processing - is there a convergence value due to hardware limitations?

I process data from a hard disk from one large file (processing is fast and not too much overhead), and then I have to write the results back (hundreds of thousands of files).

I started writing the results right away in the files one at a time, which was the slowest option. I realized that this happens much faster if I create a vector of a certain number of files, and then write them right away, and then return to processing while the hard drive is busy writing everything that I poured into it (this at least seems to be what's happening).

My question is: can I somehow estimate the value of convergence for the amount of data that I have to write from hardware limitations? It seems to me that this is a hard drive, I have 16 MB of buffer on this hard drive and get these values ​​(all for ~ 100000 files):

Buffer size      time (minutes)
------------------------------
no Buffer        ~ 8:30
 1 MB            ~ 6:15
10 MB            ~ 5:45
50 MB            ~ 7:00

Or is it just a coincidence?

I would also be interested in experience / rules on how to improve write performance in general, for example, large blocks on the hard disk are useful, etc.

Edit:

Hardware is a pretty standard consumer drive (I'm a student, not a data center). WD 3.5 1TB / 7200 / 16MB / USB2, HFS + Log, OS - MacOS 10.5. I'll try Ext3 / Linux and the internal drive soon, not the external one).

+3
source share
4 answers

- , ?

. , , , :

  • ,

, USB , IDE, , SATA. , XFS , ext2 . . , , .

, :

  • ( ) open, write close . , .

  • , , , . . Matteo Frigo FFTW, .

, , , ​​ , ! .

+4

- , . - .

. ( , ) " ", , . - () / , , , .

+2

XML , /. , SAX .

, . 100.000s API.

, , 100 000. . , .

, , , , .

, , . .

[edit] 100K ? .

+1

Norman: , .

- std::vector . ( , - ). , . gettimeofday , . , , X%. , -X%. X .

0

Source: https://habr.com/ru/post/1727067/


All Articles