What is the fastest way for high-performance sequential file I / O in C ++?

Assuming the following for ...
<b> Conclusion:
File open ...
Data is transferred to disk. The data in memory is in a large contiguous buffer. It is written to disk in its original form directly from this buffer. The size of the buffer is adjustable, but fixed throughout the stream. Buffers are written to the file one by one. Search operations are not performed.
... file is closed.

Input signal:
A large file (sequentially written above) is read from disk from beginning to end.




Are there generally accepted recommendations for achieving the fastest sequential file I / O in C ++?

Some possible considerations:

  • Recommendations for choosing the optimal buffer size
  • Will a portable library, such as boost :: asio, be too abstract to reveal the subtleties of a particular platform, or can they be considered optimal?
  • Is asynchronous I / O always preferable to synchronous? What if the application is not connected to the CPU?

I understand that this will have platform-specific considerations. I welcome general recommendations as well as recommendations for specific platforms.
(my most immediate interest in Win x64, but I'm also interested in comments on Solaris and Linux)

+43
c ++ performance file-io
Jul 29 '09 at 15:53
source share
7 answers

Are there generally accepted recommendations for achieving the fastest sequential file I / O in C ++?

Rule 0: Measurement. Use all available profiling tools and get to know them. It is almost a commandment in programming that if you haven’t measured it, you don’t know how fast it is, and for I / O this is even more true. Be sure to check in real working conditions, if possible. A process that does not have competition for the I / O system can be overly optimized, fine-tuned for conditions that do not exist under real loads.

  • Use mapped memory instead of writing to files. This is not always faster, but it allows optimizing I / O in the operating system, but is relatively portable, avoiding unnecessary copying and using the OS’s knowledge of how the disk is actually used. ("Portable" if you use a shell, not an API call for the OS).

  • Try and linearize your output as much as possible. The need to skip around memory to find buffers for writing can have noticeable effects under optimized conditions, because cache problems, paging, and other memory subsystems will start to matter. If you have a lot of buffers, pay attention to scatter I / O support, which is trying to do this linearization for you.

Some possible considerations:

  • Recommendations for choosing the optimal buffer size

Page size for beginners, but be prepared to customize from there.

  • Will a portable library, such as boost :: asio, be too abstract to reveal the complexities of a particular platform, or can they be considered optimal?

Do not consider it optimal. It depends on how carefully the library is loaded on your platform and how much effort the developers put into its rapid implementation. Having said that, a portable I / O library can be very fast, because most systems have fast abstractions, and you can usually come up with a common API that spans a lot of databases. Boost.Asio, as far as I know, is finely tuned for the specific platform on which it is located: there are a number of OS variants of specific APIs for fast asynchronous I / O (for example, epoll , / dev / epoll , kqueue , Windows, overlapping I / O ), and Asio wraps them all.

  • Is asynchronous I / O always preferable to synchronous? What if the application is not connected to the CPU?

Asynchronous I / O is no faster than synchronous I / O. What asynchronous I / O does is that your code does not waste time waiting for I / O to complete. This happens faster than the other method, without wasting time using threads, because it will go into your code when I / O is ready, and not earlier. There are no false starts or downtime issues that need to be stopped.

+29
Jul 29 '09 at 18:31
source share

A general tip is to disable buffering and reading / writing in large chunks (but not too big, then you will spend too much time waiting for the I / O to complete, where otherwise you might start buzzing on the first megabyte. It's trivial to find a sweet spot with this algorithm, there will be only one pen: the size of the piece).

In addition, for input mmap() file sharing and read-only - (if not the fastest, then) the most efficient way. Call madvise() if your platform has it to tell the kernel how you will navigate the file so that it can read readahead and quickly throw pages.

For output, if you already have a buffer, think of backing it up with a file (also with mmap() ), so you don't need to copy data in user space.

If mmap() is not to your liking, then fadvise() , but for really hard ones, asynchronous I / O.

(All of the above applies to POSIX, Windows names may be different).

+10
Jul 29 '09 at 17:52
source share

For Windows, you want you to use FILE_FLAG_SEQUENTIAL_SCAN in your CreateFile () call if you decide to use a special Windows API call for the platform. This optimizes caching for I / O. Regarding buffer sizes, a buffer size that is a multiple of the size of the disk sector is generally recommended. 8K is a good starting point from which little can be gained from magnification.

This article discusses the comparison between asynchronous and synchronized on Windows.

http://msdn.microsoft.com/en-us/library/aa365683(VS.85).aspx

+5
Jul 29 '09 at 16:38
source share

As you noted above, it all depends on the machine / system / library used. A quick solution on one system can be slow on another. However, the general rule is to write as many pieces as possible.
Usually writing a byte at a time is the slowest.

The best way to know for sure is to code several different ways and project them.

+3
Jul 29 '09 at 16:01
source share

You asked about C ++, but it sounds like you passed by and are ready to get a small platform.

On Windows, FILE_FLAG_SEQUENTIAL_SCAN with file matching is probably the fastest way. In fact, your process may exit before the file actually goes to disk. Without explicitly blocking the reset operation, it may take up to 5 minutes for Windows to start writing these pages.

You need to be careful if the files are not on local devices, but on a network drive. Network errors will appear as SEH errors that you will need to handle.

In * nixes, you can get better performance by writing a sequentially unprocessed disk device. This is possible for Windows, but the API is not supported. This avoids the overhead of the file system, but it may not be enough to be useful.

It is unlikely that RAM is 1000 or more times faster than disks, and the processor is even faster. There are probably not many logical optimizations that will help, except that you can avoid moving the disk heads (look for them) when possible. A separate drive just for this file can help a lot here.

+2
Jul 30 '09 at 2:18
source share

You will get absolute maximum performance using CreateFile and ReadFile . Open the file with FILE_FLAG_SEQUENTIAL_SCAN .

Read with a buffer size that is two. Only a benchmark can determine this number. I saw it 8K once. Another time I found that it is 8M! This is changing wildly.

This depends on the size of the processor cache, the operating efficiency of the OS, and the overhead associated with many small write operations.

Mapping memory is not the fastest way. It has more overhead because you cannot control the block size, and the OS must be to blame on all pages.

+2
Feb 23 2018-12-23T00:
source share

On Linux, buffered reads and writes speed up a lot, more often with increasing buffer sizes, but the return is decreasing, and you usually want to use BUFSIZ (defined by stdio.h ), because large buffer sizes won't help much.

mmap ing provides quick access to files, but calling mmap quite expensive. For small files (16KiB), read and write system calls win (see https://stackoverflow.com/a/167189/) for numbers when reading through read and mmap .

+1
Feb 21 '17 at 18:35
source share



All Articles