How to cache 1000 large C ++ objects

Environment: Windows 8 64 bit, Windows 2008 server 64 bit Visual Studio (Professional) 2012 64 bit

list L; // I have 1000 large CMyObjects in my program that I cache that are shared by different threads in my Windows maintenance program.

For our SaaS middleware product, we cache in memory 1000 large C ++ objects (read-only const objects, each about 4 MB in size), which removes the system from memory. Can we associate a disk file (or some other permanent OS managed mechanism) with our C ++ objects? No need for data exchange / interprocess exchange.

A disk file will be sufficient if it works throughout the process (our Windows utility). Read-only const C ++ objects are shared by different threads in the same Windows service.

I even considered using object databases (e.g. mongoDB) to store objects, which would then be loaded / unloaded with each use. Although faster than reading our serialized file (hopefully), it will still ruin the performance.

The goal is to maintain caching of C ++ objects for performance reasons and to avoid the need to load / unload a serialized C ++ object each time. It would be great if this file on the disk is managed by the OS and requires minimal configuration in our code.

Thanks in advance for your answers.

+1
source share
3 answers

The only thing that is controlled by the OS in the form in which you are describing is the swap file. You can create a separate application (let it be called the "cache assistant"), which loads all the objects into memory and waits for requests. Since it does not use pages of memory, the OS will eventually populate the pages in the page file, recalling this only if necessary. Communication with the application can be through named pipes or sockets.

The disadvantages of this approach are that the performance of such a cache will be highly variable and may degrade the performance of the entire server.

I would recommend writing your own caching algorithm / application, since later you will have to configure its properties.

0
source

One solution is, of course, to simply load each object and let the OS deal with replacing it with / from the disk as needed. (Or dynamically load, but never discard if the object is absolutely not destroyed). This approach will work well if there are a number of objects that are more often used than others. And loading from swapspace is almost certainly faster than anything you can write. The exception is that if you know in advance which objects are more likely or less likely for later use, and they can "throw away" the necessary objects in case of low memory.

You can also use a file with memory mapping - this will allow you to read and write to the file as if it were memory (and the OS will cache the contents in RAM as memory becomes available). In WIndows you will use CreateFileMapping or OpenFileMapping to create / open a file, and then MapViewOfFile map the file to memory. When done, use UnmapViewOfFile to "unmount" the memory, and then CloseHandle to close FileMapping.

The only concern about file matching is that it may not appear at the same address in memory the next time, so you cannot have pointers inside the file and load the same data as the binary the next time. Of course, it works great to create new files every time.

0
source

So, your thousands of massive objects have constructor, destructor, virtual functions and pointers. This means that you cannot easily withdraw them. The OS can do this for you, so your most practical approach is simply to add more physical memory, perhaps swap space for SSDs and use this 64-bit address space. (I don’t know how much your OS is actually addressing, but apparently enough to match your ~ 4G objects).

The second option is to find a way to simply save some memory. This may be the use of a specialized distributor to reduce sagging or remove layers of indirection. You did not provide sufficient information about your data so that I made specific proposals on this subject.

The third option, assuming that you can put your program in memory, is to simply speed up deserialization. Can you change the format to something that you can analyze more efficiently? Can you somehow deserialize objects quickly on demand?

The last option and the biggest job is to manually manage the swap file. It would be wise, as a first step, to divide your massive polymorphic classes into two parts: polymorphic flies (with one instance per specific subtype) and a smoothed aggregate contextual structure. This unit is the part that you can safely change and move in your address space.

Now you just need a swap mechanism with memory mapping, some kind of cache tracking, whose pages are currently being displayed, perhaps a smart pointer that replaces your raw pointer with page offset +, which can display data on demand, etc., you don’t provided enough information about their data structure and access patterns to make more detailed suggestions.

0
source

Source: https://habr.com/ru/post/1201586/


All Articles