Since David Eisenstat does not have time to extend his answer, I will try to add more detailed information to a similar algorithm.
Intuition
The basic intuition underlying the algorithm is based on the following statements:
operator # 1 : if BST contains values a
and b
( a < b
) and there are no values between them, then either a
(node for value a
) is (possibly indirect) the parent of b
(node for value b
) or b
is (possibly , indirect) by the parent a
.
This operator is obviously true because if their youngest common ancestor of C
is some other node than a
and b
, its C
value must be between a
and b
. Note that operator # 1 is true for any BST (balanced or unbalanced).
statement # 2 : if a simple (unbalanced) BST contains the values a
and b
( a < b
) and there are no values between them. And we are trying to add a value x
such that a < x < b
, then x
(node for the value x
) will either direct, or large (child) child of a
, or direct left (smaller) child of b
, depending on what node is lower in the tree.
Suppose that the lower of the two nodes is equal to a
(the other case is symmetric). During the insert phase, the value of x
will move along the same path as a
during insertion, because the tree does not contain any values between a
and x
, that is, for any comparison values a
and x
are indistinguishable. This means that the value x
will move through the tree to node a
and will go through node b
at some earlier stage (see Operator # 1). As x > a
it should become the right child of a
. The right right child of a
must be empty at this point because a
is in the subtree of b
, that is, all values in this subtree are less than b
, and since there are no values in the tree between a
and b
, no value can be the correct child of the node a
.
Note that statement # 2 could potentially be incorrect for some balanced BST after rebalancing, although this should be a strange case.
operator # 3 : in balanced BST for any value of x
, and not in the tree, you can find the closest and closest smaller values in O(log(N))
time.
This follows directly from the operators # 1 and # 2: all you need to do is find the potential insertion point for the x
value in BST (accepts O(log(N))
), one of the two values will be the direct parent of the insertion point and to search of the other you need to move the tree back to the root (again takes O(log(N))
).
So now the idea behind the algorithm, it becomes clear: to quickly insert into an unbalanced BST, we need to find the nodes with the smallest and smaller values. We can easily do this if we additionally maintain a balanced BST with the same keys as our target (unbalanced) BST, and with the corresponding nodes from this BST as values. Using this additional data structure, we can find the insertion point for each new value in O(log(N))
and update this data structure with a new value in O(log(N))
time.
Algorithm
Since no inner step of the cycle exceeds O(log(N))
, the total complexity is O(N*log(N))
Java implementation
I'm too lazy to implement balanced BST, so I used the standard Java TreeMap
, which implements the Red-Black tree and has useful lowerEntry
and higherEntry
that correspond to step 4 of the algorithm (you can see the source code so that both are actually O(log(N))
).
import java.util.Map; import java.util.TreeMap; public class BSTTest { static class Node { public final int value; public Node left; public Node right; public Node(int value) { this.value = value; } public boolean compareTree(Node other) { return compareTrees(this, other); } public static boolean compareTrees(Node n1, Node n2) { if ((n1 == null) && (n2 == null)) return true; if ((n1 == null) || (n2 == null)) return false; if (n1.value != n2.value) return false; return compareTrees(n1.left, n2.left) && compareTrees(n1.right, n2.right); } public void assignLeftSafe(Node child) { if (this.left != null) throw new IllegalStateException("left child is already set"); this.left = child; } public void assignRightSafe(Node child) { if (this.right != null) throw new IllegalStateException("right child is already set"); this.right = child; } @Override public String toString() { return "Node{" + "value=" + value + '}'; } } static Node insertToBst(Node root, int value) { if (root == null) root = new Node(value); else if (value < root.value) root.left = insertToBst(root.left, value); else root.right = insertToBst(root.right, value); return root; } static Node buildBstDirect(int[] values) { Node root = null; for (int v : values) { root = insertToBst(root, v); } return root; } static Node buildBstSmart(int[] values) { Node root = null; TreeMap<Integer, Node> balancedTree = new TreeMap<Integer, Node>(); for (int v : values) { Node node = new Node(v); if (balancedTree.isEmpty()) { root = node; } else { Map.Entry<Integer, Node> lowerEntry = balancedTree.lowerEntry(v); Map.Entry<Integer, Node> higherEntry = balancedTree.higherEntry(v); if (lowerEntry == null) {