Looking at each question in turn
- Is this a very bad idea? Do you know something better to achieve the same?
It is not clear what you hope to achieve with this. Linux already supports the memory used by the swap space (therefore, if your data exceeds physical memory, some will be replaced with a disk). Are you having problems running out of address space or slow startup due to excessive swapping? Using mmap backup storage will not affect either.
- Do I have to pre-allocate the maximum file size, and I need all this space to be allocated on disk? If so, is it possible to map to a sparse file?
Yes, you need the file to be the size of the space you are mmaping. However, you can start with a small / mmap file and subsequently enlarge the file (and mmap additional pages) as needed. You can also use a sparse file, so disk space is not used until the pages are written.
- I do not want to write my own heap allocator. Is it possible to use an existing one?
There are heap managers who use MMAP-enabled storage. I have seen versions of Doug Lea malloc and various other Bibop distributors that do this.
- When my program finishes, the mmap file will be deleted. This means that I do not want any pages to be written to the disk unless the kernel erases them from memory. Is there something like a lazy flag for mmap for this, or is it automatic?
In this case, you can simply use MAP_ANON and not have a file at all. However, this returns to the first question, as it essentially duplicates what the malloc system (and the new one) does. Actually on some OSs (Solaris?), This is exactly what the malloc system does. The main reason I saw custom mallocs based on mmap in the past is persistent storage (so the file will remain after the process is complete and will be restarted upon reboot).
source share