Main Memory Strategies B + Trees

I am trying to develop a main memory index for key-value pairs using C ++. I need to make sure that the index can be restored after a failure. I am using the CSB + -Tree implementation (BSD license) that I found here The main task I am facing is to save the parent-child relationship data after re-creating the node instances. I searched for various strategies for saving and restoring the "tree structure" to / from the disk. Some of them:

  • Saving node objects in pre-order and writing NULLS for null pointers.
  • Providing IDS to nodes and storing the node identifier instead of a pointer when writing to disk and then allowing pointers during re-instantiation using identifiers.
  • Use file offset values โ€‹โ€‹(addresses in physical memory), rather than addresses of the main memory of child nodes when saving. This may mean that I have to flee from leaflets.

I also looked at a couple of serialization libraries. Google ProtocolBuffers and speed up serialization.

Now, "Nodes" in implementations have a number of pointer variables. Some of them are pointers to other nodes, while others are pointers to โ€œkey valuesโ€. The code below is a simplified version to keep the point.

struct NodeHead { NodeHead *null; // null indicates internal node char *children; // ptr to children NodeEntry entries[1]; // entry array } struct NodeEntry { uint16_t offset; // offset to NodeHead of the key in byte uint8_t next; // index of the next entry; 0xff means null uint8_t num; // [0]: number of entries in use }; 

I was thinking about writing input values โ€‹โ€‹directly to the data for the node, rather than saving the link. And giving each instance of NodeHead an identifier and use it to maintain a "child" relationship. I would like advice if this can be done better.

+4
source share
1 answer

Are data pairs (key, value) stored separately on disk, or do you need to save them along with the index? Do you save the data yourself in memory or are you only in the resident memory of the index, and the data is on disk? If the entire dataset is resident, do not save the tree structure at all. Just save an ordered list of pairs (key, value) and rebuild the tree at boot. I have never used this library, but any reasonable implementation of a B-tree should be able to very efficiently create a B-tree in memory from a pre-sorted stream of records.

+1
source

Source: https://habr.com/ru/post/1345172/


All Articles