Below is the O (n log n) algorithm, which works for graphs in which each child has no more than one parent (EDIT: this algorithm does not apply to the dual-user case with O (n log n) performance). It is worth noting that I consider that performance can be improved to O (n log (maximum level label)) with additional work.
One parent case:
For each node x in the reverse topological order, create a binary search tree T_x, which strictly increases both at the date of birth and in the number of generations removed from x. (T_x contains the first born baby c1 in the subgraph of the pedigree graph embedded in x, along with the next earliest born c2 in this subgraph, so c2 is the “grandfather and grandmother’s level” is strictly higher than c1, along with the next earliest born baby c3 in this subgraph, such that the level of c3 is strictly greater than c2, etc.). To create T_x, we combine the previously constructed T_w trees, where w is a child of x (they were previously built because we are repeating in the reverse topological order).
If we are careful with how we perform the mergers, we can show that the total cost of such unions is O (n log n) for the entire ancestral graph. The main idea is to note that after each merge, no more than one node of each level is saved in the combined tree. To each tree T_w we associate the potential h (w) log n, where h (w) is equal to the length of the longest path from w to the leaf.
When we combine the T_w child trees to create T_x, we “destroy” all the T_w trees, freeing up all the potential they store for use in building the T_x tree; and we create a new tree T_x with potential (log n) (h (x)). Thus, our goal is to spend no more than O ((log n) (sum_w (h (w)) - h (x) + constant)) time to create T_x from T_w trees, so the amortized cost of the merger will be only O (log n). This can be achieved by choosing the tree T_w so that h (w) is maximum as the starting point for T_x, and then modified T_w to create T_x. After such a choice is made for T_x, we combine each of the other trees one by one in T_x with an algorithm similar to the standard algorithm for merging two binary search trees.
Essentially, merging is done by repeating each node y in T_w, searching for the y-predecessor of z by date of birth, and then inserting y into T_x if there are more levels removed from x than z; then, if z was inserted into T_x, we look for a node in T_x of the lowest level, strictly exceeding the level z, and splicing intermediate nodes to maintain the invariant that T_x is ordered strictly by date of birth and by level. It costs O (log n) for each node in T_w, and in T_w there are at most O (h (w)) nodes, so the total cost of merging all the trees is O ((log n) (sum_w (h (w))) summing over all children w, except that the child w 'is such that h (w') is maximal.
We store the level associated with each T_x element in the auxiliary field of each node in the tree. We need this value so that we can determine the actual level of x as soon as we built T_x. (As a technical detail, we actually save the difference in each level of the node with the level of its parent in T_x, so that we can quickly increase the values for all nodes in the tree. This is a standard BST trick.)
What is it. Just note that the initial potential is 0, and the final potential is positive, so the sum of amortized estimates is the upper limit of the total value of all mergers throughout the tree. We will find the label of each node x as soon as we create the BST T_x, by binary searching for the last element in T_x, which was born before x died at the price of O (log n).
To improve binding to O (n log (maximum level label)), you can easily merge trees only by combining the first few elements of the tree as needed to provide a solution for the current node. If you use a BST that uses link locality, such as the splay tree, then you can achieve the above.
We hope that the above algorithm and analysis is at least clear enough to follow. Just comment if you need any clarification.