Based on Tim's answer, I personally would use something similar to BiBOP.
The basic idea is simple: use fixed-sized pools.
There are some clarifications.
Firstly, the size of the pools is usually fixed. It depends on your distribution procedure, as a rule, if you know which OS you are running on the card at least 4 KB at the same time, when you use malloc, then you use this value. For a memory mapped file, you can enlarge it.
The advantage of fixed-size pools is that it fights fragmentation very well. All pages are the same size, you can easily recycle a blank page with 256 bytes per 128-byte page.
There is still fragmentation for large objects that are usually allocated outside this system. But this is low, especially if you put large objects in several page sizes, so that the memory will be easily recyclable.
Secondly, how to handle pools? Using linked lists.
Pages are usually untyped (by themselves), so you have a free list of pages for preparing new pages and placing "recycled" pages.
For each size category, you have a list of "busy" pages in which memory is allocated. For each page you save:
- placement size (for this page)
- number of selected objects (for checking emptiness)
- pointer to the first free cell
- pointer to the previous and next page (may point to the "head" of the list)
Each free cell in itself is a pointer (or index, depending on your size) with the next free cell.
The list of "busy" pages of a given size is managed simply:
- when deleting: if you clear the page, remove it from the list and drag it to the processed pages, otherwise update the list of free cells of this page (note: finding the beginning of the current page is usually a simple module operation at the address)
- on insertion: search starting from the chapter, as soon as you find the page is incomplete, move it in front of the list (if it is not already specified) and insert your element
This layout is really memory efficient, and only one page is for indexing.
For applications with multi-threaded / multi-processor processes, you will need to add synchronization (usually a mutex per page) if you can get inspiration from Google tcmalloc (try to find another page instead of blocking, use the -local cache stream to remember which page you used last time).
Having said that, have you tried Boost.Interprocess? It provides dispensers.