Effective mass change of persistent data structures

I understand how trees are usually used to change persistent data structures (create a new node and replace all its ancestors).

But what if I have a tree of 10,000 nodes and I need to change 1000 of them? I do not want to go through and create 1000 new roots, I need only one new root, which arises due to a change in everything at once.

For example: For example, take a constant binary tree. In the case of a single node update, it searches until it finds a node, creates a new one with the changes and old children, and creates new ancestors to the root.

In the case of a bulk upgrade, we can: Instead of just updating a single node, you will update 1000 nodes on it in a single pass.

In the root directory of the node, the current list is a complete list. Then you split this list between those that correspond to the left node and those that correspond to the right. If none of them matches one of the children, do not go down to him. Then you go to the left of the node (assuming there are matches), split its search list between its children and continue. When you have one node and a match, you update it and come back, replacing and updating the ancestors and other branches as needed.

This will only lead to one new root, even if it has changed any number of nodes.

+6
source share
2 answers

Such "bulk modification" operations are sometimes called bulk updates. Of course, the details will depend on what data structure you are working with and what modifications you are trying to perform.

Typical types of operations may include "delete all values โ€‹โ€‹that satisfy a certain condition" or "increase the values โ€‹โ€‹associated with all keys in this list." Often these operations can be performed in a single pass throughout the structure, taking O (n) time.

You seem to be concerned about the memory allocation that is associated with the creation of "1000 new roots." A typical distribution for performing operations one at a time is O (k log n), where k is the number of nodes to be modified. A typical distribution for a single walk throughout the structure would be O (n). Which is better depends on k and n.

In some cases, you can reduce the amount of selection - due to more complex code, paying special attention to when changes occur. For example, if you have a recursive algorithm that returns a tree, you can modify the algorithm to return a tree with a boolean indicating whether something has changed. After that, the algorithm could check these booleans before allocating a new node to see if the old node can be reused. However, people usually donโ€™t worry about this extra check until and until they have evidence that extra memory allocation is actually a problem.

+8
source

The specific implementation of what you are looking for can be found in Clojure (and ClojureScript) transients .

In short, given the completely immutable, persistent data structure, its temporary version will make changes using a destructive (in terms of distribution) mutation, with which you can again return to the correct structure of persistent data when you are done with your performance-sensitive functions . Only when moving to a permanent data structure are new roots created (for example), thereby amortizing the cost of associated costs by the number of logical operations that you performed on the structure when it was in its temporary form.

+3
source

Source: https://habr.com/ru/post/889137/


All Articles