Reading a large number of small files in sequence

I have this problem: I have a set of small files about 2000 bytes each (they are all the same size), and about 100,000 of them are about 200 megabytes of space. I need to be able to select a range in these files in real time. Say a file from 1000 to 1100 (total 100 files), read them and send them quickly over the network.

It’s good that the files will always be read sequentially, i.e. there will always be a lot of words "from this file and another hundred," and not "this file here and this file there, etc.". "

Files can also be added to this collection at run time, so this is not a fixed number of files.

The current scheme I came up with is this: not a single file is larger than 2000 bytes, so instead of having several files located on the disk, I will have one large file containing all the other files, even at 2048 bytes intervals with 2- The first byte of each 2048 block is the actual byte size of the file contained in the next 2046 bytes (the file size is from 1800 to 1950 bytes or so), and then search inside this file instead of opening a new file for each file I need to read.

So, when I need to get the file at position X, I just do X * 2048, read the first two bytes, and then read the bytes from (X * 2048) +2 to the size contained in the first two bytes. This large 200 MB file will be added read-only, even if the serialized input stream / process (has not decided yet) adds more data to it.

This needs to be done on Windows, C is an option, but I would prefer C #.

+3
source share
9 answers

I think your idea is probably best suited for decent work.

Alternatively, you can buy a solid state drive and not worry about files.

, , ( ).

, .

+2

- ?

RDBMS bunch fo 2k

+3

.

" " (.. 2048 ) . IO.

, , .

, , .

- , " "?

+2

, , , 1200 1400? , ? ?

, . . , ?

, , , ? , . .

, , , , , , , . . , , ? .

, , , / . .

, # 1000 1100, (#) , .

+1

"dbase" - .

"index" "dbase". , , .

.

+1

. , , . , ?

, , . , , , , , .

0

: , , , 2KiB, , , , .

, , , , 150 .

- , . , , (2K) , . 64-128K , , .

0

, (. , .). , , 4096 . Afaik, , , API WIN32 #.

. SO.

0

, SO:

Java?

0

Source: https://habr.com/ru/post/1719988/


All Articles