How to increase the performance of reading files with multiple threads?

I need to read a single file using multiple threads in Linux. There are only read operations and no need to write. Reading a file does not need to read the entire file every time. It must read one or more parts of the file each time. I stored the offset of each part in advance. The file is too large to fit into main memory.

So, for example, many users want to read such a file. I use a thread or process to read a file to respond to user requests. What will happen on Linux? Will all read operations be queued? And will the OS finish reading the file one by one? Is it possible to improve the performance of such operations?

I am trying to implement a simple inverted index used when searching for information. I put the dictionary into memory and publish lists in files. Each file contains an index segment. In the dictionary, I can store something like an offset to indicate the position of the word posting list. When 100 users want to find something in one second, they send different requests. Therefore, each reading will read a different part of the file.

+3
source share
6 answers

If the file is too large to fit in the system memory and you have many threads that need to read the entire file, there is a good chance that your application will be limited by disk I / O ... no matter how you read the file, and how no matter how smart the OS is.

, . , , . , , . , , , . ( , , . , .)

. , .

EDIT: , . ( !), . , @Jon Skeet . , , /. :

  • .
  • , .
  • , (, ), .
  • , .

. , Witten ..

0

- .. , - , . :)

, , , , .

+3

, . , . (.. ) . ( , , , ), , . ( , , , ). , , . ( ) , .

, . , ? ? , ? . , , RAM, , , , . , , ? , mmap() , , . , , . , .

, - , !

+2

, ?

/, mmap(), () , . 32- , "- 4 , , , 2 "; 64- , , .

, mmap(); .

+2

(Linux .) , , ? , . , , . , , , .

, , ( ), : (RAID 10). , , .

+1

,

  • ()

, ( )... , ,

  • Increase the priority of your process (if possible) so that other processes do not overload the processor time.
  • Highlight the level distribution between the threads.
  • Based on random access (you can enable or disable the cache)
    • For example, you can disable the cache if the reads are completely random, and you see that in most cases there is no cache
0
source

Source: https://habr.com/ru/post/1718439/


All Articles