Overloading a new operator to store objects in the mmap'd file

I have a Linux C ++ program with fairly large memory requirements. Most of the memory is consumed by just a few classes, and it is rarely accessed. I want to transfer these classes from main memory to disk storage, modifying as little of the existing code as possible.

The idea was to override the new operator for these objects and assign them to the mmap() 'd memory area. Thus, my code modifications remain very limited, the rest of the program can happily access these objects, not knowing that something has changed, and the kernel will ensure that the objects that I need are in memory, and the rest on disk. I know this is very similar to how a swap works, but the swap section is usually too small for my program.

Some questions that I have:

  • Is this a very bad idea? Do you know something better to achieve the same?
  • Do I have to pre-allocate the maximum file size, and I need all this space to be allocated on disk? If so, is it possible to map to a sparse file?
  • I do not want to write my own heap allocator. Is it possible to use an existing one?
  • When my program finishes, the mmap file will be deleted. This means that I do not want any pages to be written to the disk unless the kernel erases them from memory. Is there something like a lazy flag for mmap for this, or is it automatic?
+4
source share
3 answers

Looking at each question in turn

  • Is this a very bad idea? Do you know something better to achieve the same?

It is not clear what you hope to achieve with this. Linux already supports the memory used by the swap space (therefore, if your data exceeds physical memory, some will be replaced with a disk). Are you having problems running out of address space or slow startup due to excessive swapping? Using mmap backup storage will not affect either.

  • Do I have to pre-allocate the maximum file size, and I need all this space to be allocated on disk? If so, is it possible to map to a sparse file?

Yes, you need the file to be the size of the space you are mmaping. However, you can start with a small / mmap file and subsequently enlarge the file (and mmap additional pages) as needed. You can also use a sparse file, so disk space is not used until the pages are written.

  • I do not want to write my own heap allocator. Is it possible to use an existing one?

There are heap managers who use MMAP-enabled storage. I have seen versions of Doug Lea malloc and various other Bibop distributors that do this.

  • When my program finishes, the mmap file will be deleted. This means that I do not want any pages to be written to the disk unless the kernel erases them from memory. Is there something like a lazy flag for mmap for this, or is it automatic?

In this case, you can simply use MAP_ANON and not have a file at all. However, this returns to the first question, as it essentially duplicates what the malloc system (and the new one) does. Actually on some OSs (Solaris?), This is exactly what the malloc system does. The main reason I saw custom mallocs based on mmap in the past is persistent storage (so the file will remain after the process is complete and will be restarted upon reboot).

+2
source

I can come up with a few problems with the approach you would like to take, so this is not an answer yet.

  • When you “swap” something, that is, the problem you are facing is that it consumes too much memory to support the objects around, so when do you delete them (effectively delete them)? for example, make the same decision as the memory manager of your OS?
  • Although you can store the binary representation of the class in an mmaped block, if the class is not a POD, then the "swapping" process will not do what you expect (for example, if there are members that are highlighted in heaps, what happens to them?)
  • Mmap'd memory will still take your process into account, so your problems will not go away ...

I think your best bet here is to look at your design and think about when these classes are needed for a long time. And to build, use and discard when they are not needed - are they expensive to build? Maybe it would be cheaper to serialize to some local file and restore (when I say serialize, I mean not just a copy of mem!)

+1
source

The best option would probably be to indicate that your program requires a minimum number of swaps to configure, rather than trying to simulate more swap with mmap() . In particular, your last point cannot be overcome - dirty pages in comparisons with file support are usually preferable to write.

+1
source

Source: https://habr.com/ru/post/1335329/


All Articles