B + tree with variable length keys

In the general implementation of the B + tree, we can assume that the keys have a fixed length (for example, 25 bytes). Then we can determine that each node should have a minimum number of keys and a maximum.

If I wanted the tree to accept keys of variable length, what should I change? What if I say that the node must have at least 2 keys, but the key I'm trying to insert is so large that it does not fit into the block containing the node?

+4
source share
3 answers

A simple solution is to store the keys as pointers (wrapped in a type that overrides relative operators, etc.) rather than values, but this, of course, damages the terrain, which is part of the use point of B + trees.

However, the more elements, the less important it is that the objects are adjacent in memory. Huge elements will not correspond to any cache page, not to mention several on one page.

Another relatively simple approach is to use a type of union or placement of a new or any other to place elements in a memory type for an element that is large enough for all types of elements you could use. You still have a fixed number of bytes per element, but elements do not necessarily use all of these bytes.

If you are ready to do this work, you may have variable-sized nodes. Of course, you will have problems with these nodes, depending on how you arrange the node data structure to handle this. For example, you might have a small array of element pointers in a node, pointing to elements that are also inside the node (not separately distributed on the heap).

In addition, every time you change a node, you may need to redistribute it. Even if all you do is rebalance, it can move a huge element from one node to another, and even if the destination node takes place in the sense of free space for the element, it may not have enough bytes to store the value.

In a sense, each node will be a mini-heap in which you can allocate or make room for items large or small, but sometimes you will have to go back to the heap to replace this mini-heap with more or less.

Once again it is worth mentioning that if the elements are huge, the locality inside the node is probably irrelevant.

I implemented B + -line multi-track trees in memory in front of me, but I never reached this extreme.

+2
source

You can leave the rest of the large key on the overflow page, for example there .

+1
source

Use hashing. A hash is a fixed key size. For good hashing functions, see http://www.cse.yorku.ca/~oz/hash.html .

0
source

Source: https://habr.com/ru/post/1479183/


All Articles