How to improve concurrency SSD I / O throughput on Linux

The program below reads a bunch of lines from a file and analyzes them. It could be faster. On the other hand, if I have several cores and several files to process, this does not really matter; I can just run jobs in parallel.

Unfortunately, this does not work on my arch. Running two copies of the program is slightly (if at all) faster than starting one copy (see below) and less than 20% of what my disk is capable of. On an ubuntu machine with identical hardware, the situation is slightly better. I get linear scaling for 3-4 cores, but I still occupy about 50% of the SSD storage capacity.

What obstacles prevent linear scaling of I / O throughput as the number of cores increases and what can be done to improve concurrency I / O on the software / OS side?

PS - For the equipment mentioned below, one core is fast enough for reading to be tied to I / O if I moved the parsing to a separate thread. There are also other optimizations to improve single-core performance. However, for this question, I would like to focus on concurrency and how my coding options and OS can affect me.

Details:

Here are some lines of output iostat -x 1:

Copying a file to / dev / null using dd:

Device:         rrqm/s   wrqm/s     r/s     w/s     rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00     0.00  883.00    0.00 113024.00     0.00   256.00     1.80    2.04    2.04    0.00   1.13 100.00

Running my program:

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               1.00     1.00  141.00    2.00 18176.00    12.00   254.38     0.17    1.08    0.71   27.00   0.96  13.70

Running two instances of my program at once, reading different files:

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda              11.00     0.00  139.00    0.00 19200.00     0.00   276.26     1.16    8.16    8.16    0.00   6.96  96.70

! , .

dd:

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               9.00     0.00  468.00    0.00 61056.00     0.00   260.92     2.07    4.37    4.37    0.00   2.14 100.00

:

#include <string>

#include <boost/filesystem/path.hpp>
#include <boost/algorithm/string.hpp>
#include <boost/filesystem/operations.hpp>
#include <boost/filesystem/fstream.hpp>

typedef boost::filesystem::path path;
typedef boost::filesystem::ifstream ifstream;

int main(int argc, char ** argv) {
  path p{std::string(argv[1])};
  ifstream f(p);
  std::string line;
  std::vector<boost::iterator_range<std::string::iterator>> fields;

  for (getline(f,line); !f.eof(); getline(f,line)) {
    boost::split (fields, line, boost::is_any_of (","));
  }
  f.close();
  return 0;
}

:

g++ -std=c++14 -lboost_filesystem -o gah.o -c gah.cxx
g++ -std=c++14 -lboost_filesystem -lboost_system -lboost_iostreams -o gah gah.o

:

I ( , dentries inodes) , linux .

, , ; mmap pubsetbuf .

, IO-. , ( , iostat ) .

, , , , , , - ? - , , , / , ?

+4
2

.

Copying a file to /dev/dull with dd:

( , /dev/null...)

int main(int argc, char ** argv) {
  path p{std::string(argv[1])};
  ifstream f(p);
  std::string line;
  std::vector<boost::iterator_range<std::string::iterator>> fields;

  for (getline(f,line); !f.eof(); getline(f,line)) {
    boost::split (fields, line, boost::is_any_of (","));
  }
  f.close();
  return 0;
}

, , . , , .

, , . dd , , - - , , , ...

+2

, :

1) .

, , . 1 , 45 /.

2) pubsetbuf .

, pubsetbuf , , (, @ ); . -, 8191 ( strace), , .

, , , 1000 , , ( ). 50 /.

3) / .

-, linux cfq io SSD , . slice_sync 0, (. Mikko Rantalainen ) noop, , 60 /, . .

noop , ( ).

+1

Source: https://habr.com/ru/post/1676071/


All Articles