I am trying to develop a main memory index for key-value pairs using C ++. I need to make sure that the index can be restored after a failure. I am using the CSB + -Tree implementation (BSD license) that I found here The main task I am facing is to save the parent-child relationship data after re-creating the node instances. I searched for various strategies for saving and restoring the "tree structure" to / from the disk. Some of them:
- Saving node objects in pre-order and writing NULLS for null pointers.
- Providing IDS to nodes and storing the node identifier instead of a pointer when writing to disk and then allowing pointers during re-instantiation using identifiers.
- Use file offset values โโ(addresses in physical memory), rather than addresses of the main memory of child nodes when saving. This may mean that I have to flee from leaflets.
I also looked at a couple of serialization libraries. Google ProtocolBuffers and speed up serialization.
Now, "Nodes" in implementations have a number of pointer variables. Some of them are pointers to other nodes, while others are pointers to โkey valuesโ. The code below is a simplified version to keep the point.
struct NodeHead { NodeHead *null; // null indicates internal node char *children; // ptr to children NodeEntry entries[1]; // entry array } struct NodeEntry { uint16_t offset; // offset to NodeHead of the key in byte uint8_t next; // index of the next entry; 0xff means null uint8_t num; // [0]: number of entries in use };
I was thinking about writing input values โโdirectly to the data for the node, rather than saving the link. And giving each instance of NodeHead an identifier and use it to maintain a "child" relationship. I would like advice if this can be done better.
source share