Splitting a node into b + tree

im trying to figure out what exactly happens when node overflows. Information: in my b + tree there are 4 pointers to a block and 3 data sections. problem: I realized that when an overflow occurs, we split into 2 nodes in my case each with 2 keys, and insert the parent node average value without removing it from the son (unlike tree b).

however, I got into a situation:

|21|30|50| |10|20|-| |21|22|25| |30|40|-| |50|60|80| 

and I want to insert key 23 first I split | 21 | 22 | 25 | in: | 21 | 22 | - | and | 23 | 25 | - | now I need to insert key 23 in the parent | 21 | 30 | 50 | the witch causes another split. | 21 | 23 | - | and | 30 | 50 | - | but where is the pointer to 30 points? is it possible that both this pointer and one after 23 indicate | 23 | 25 | - | ?

+6
source share
4 answers

Insert 23:

  • as you said 21 | 22 | - | and | 23 | 25 | - | are being created
  • two nodes require a parent
  • parent is created in the root directory of node: | 21 | 23 | 30 | 50 |
  • The root has too many elements now
  • divide the root into 2 nodes | 21 | 23 | - and | 30 | 50 | -
  • add a new parent for 2 new nodes (which is the new root of the tree)

Basically, this insert will increase the depth of the tree by 1

+2
source

Here's how to handle pointers. This is your B + tree before insertion. (some add-ons are used to make it easier to view pointers)

  [21 30 50] / \ \ \ [10|20|-] => [21|22|25] => [30|40|-] => [50|60|80] 

After inserting 23 you will have 2 nodes. It is important to know that the left split must always be the same instance , and the right split must be the new node. this will simplify the processing of processing pointers.

Thus, the partition should look like this.

  old fresh [21|22|-] => [23|25|-] 

since the left node is the same instance, key 21 at the root has the correct right pointer. This is how the nodes look before splitting the root and after splitting the leaf. we are in the middle of the process.

  [21 30 50] / \ \ \ [10|20|-] => [21|22|-] => [23|25|-] => [30|40|-] => [50|60|80] 

only a new item with key 23 should be added next to 21 in the root directory. since root has no space, it will also be broken. the middle key is 23 (the first element of the right split).

  [21 30] [ 50 ] / \ \ * \ [10|20|-] => [21|22|-] => [23|25|-] => [30|40|-] => [50|60|80] 

Since root is split, we need to add our middle key left or right and find a new middle key for promotion. pay attention to the null pointer in the right section. because this node is new, it should be initialized soon!

(the root was split to the right of the fact that in the node element fewer elements are in the case of an odd number of elements, but in general it does not matter how you split up.)

our middle key was 23. so add 23.

  [21 23 30] [ 50 ] / \ \ \ * \ [10|20|-] => [21|22|-] => [23|25|-] => [30|40|-] => [50|60|80] 

Good! If there were no separation here, our insertion would be completed by now. but since there is separation at this level, we must find a new middle key to advance. also don't forget the null pointer for fresh node.

(In the case of leaf splitting, you must keep the key values โ€‹โ€‹in the sheets and promote a copy of the middle key, but in the case of splitting the internal nodes, you can safely move and push the keys up.)

here the new middle key is 30 . allows you to put it on the left node.

  30 \ [30|40|-] 

(Itโ€™s important how to choose the middle key. Always the node that receives the middle key from the lower sections should omit one element for the new middle key , if this node is left by the node, discard the last element from it, otherwise release the first element.)

  [21 23] 30 [ 50 ] / \ \ \ * \ [10|20|-] => [21|22|-] => [23|25|-] => [30|40|-] => [50|60|80] 

Please note that 30 is not in any of the sections. that our middle key we have to process. Always give the right node the middle key to the left of the new node.

  [21 23] 30 [50] / \ \ * / \ [10|20|-] => [21|22|-] => [23|25|-] => [30|40|-] => [50|60|80] 

Then the right middle key pointer will point to the fresh node on the right split. finally, the middle key moves forward, a new root is created with the left partition to the left and to the right of the right bifurcation and the middle key.

  [ 30 ] / \ [21 23] [50] / \ \ / \ [10|20|-] => [21|22|-] => [23|25|-] => [30|40|-] => [50|60|80] 

Congrats! you made your insert. it may look easy on paper, but I have to admit it is a bit of a challenge when coding.


There are several ways to represent these key-node elements in memory. my approach is that every key-node element has a right pointer with a key. therefore, each internal node will have an array of key-node s. thus, node keys and pointers are always stored together. the left most children are handled by a separate left pointer, this pointer is stored on each internal node and is never null (after full operations).

0
source

The problem that you need to understand what is happening is using the bad presentation model that you are using. You should have one data value associated with the pointer and the invariant that the data value is the smallest value in the subtree that the pointer points to.

So this is how you should represent the b-tree node before inserting.

  10|21|30|50| <--- root node 10|20| 21|22|25| 30|40| 50|60|80| <--- leaf nodes 

In this view, the pointer after the value 10 in the root of the node points to the leaf node with the first value of 10, etc.

When you insert 23, it is inserted into the node sheet with the first value of 21. This gives the following tree and does not need to be split.

  10|21|30|50| <--- root node 10|20| 21|22|23|25| 30|40| 50|60|80| <--- leaf nodes 

When you insert 24, which creates the effect you described, you get the following

  10|30| <--- new root node 10|21|24| 30|50| <--- intermediate nodes 10|20| 21|22|23| 24|25| 30|40| 50|60|80| <--- leaf nodes 

As you can see, there is no longer ambiguity. The node leaf has been split, and the pair of key pointers 24 | should have been inserted at the root of the node. Since it was full, it had to be divided. Now that there are 5 values, you get one node with three values โ€‹โ€‹and one node with 2. You can choose whether the left node or the right node will get three values. It is important that the basic invariant is preserved. Each key value in node is the smallest element in the subtree referenced by its associated pointer. It was necessary to add a new root node, and its pointer to the key value should now be obvious. There is no ambiguity.

It also makes it obvious that many optimization strategies are possible. Instead of splitting the root of the node, we could transfer the value 21 to the left leaf of the node that was not filled. One that has a first value of 10. This avoids breaking the root of the node and keeps the height of the b-tree unchanged and gives better filling of the b-tree. Thus, you minimize the space and time of the search. But this means that you can effectively check if side balancing is possible. Changing the key value in the root of the node is still necessary. and etc.

As you can see, the value 10 in your b-tree is really not needed in any leaf node, and it is often omitted in the b-tree view (i.e. wikipedia ), but this can be confusing and probably the reason, for which you are there. :)

-1
source

From what I was taught, for the last node, itโ€™s normal that the node is less than n / 2. So, in your example, the top nodes will become:

  |23| / \ |21|22|-| |25|-|-| 

I think that makes sense. If you think about it, you have 5 remaining nodes, so you need 5 pointers from the top level. This layout is the only way, so you can have 5 pointers, all other combinations either overflow the node or create additional pointers.

If the nodes were | 21 | 22 | - | and | 23 | 25 | - |, then the root node will be | 23 | - | - |. Then it makes no sense to have 23 in the right node, because everything that is in the right node must be equal to or greater than 23 anyway!

-1
source

Source: https://habr.com/ru/post/890276/


All Articles