Tree iterator, can you optimize this further?

As a continuation of my initial question about a small piece of this code, I decided to ask to see if you can improve what we have come up with so far.

The following code iterates over the binary tree (left / right = child / next). I believe that there is room for one conditional ( down boolean). The fastest answer wins!

  • The cnt statement can be multiple statements, so you can verify that it appears only once.
  • The member functions child() and next() about 30 times slower than the hasChild () and hasNext () operations.
  • Keep it iterative <- dropped this requirement, because the recursive solution was presented faster.
  • This is C ++ code.
  • The procedure for visiting sites should remain as in the example below. (first click parents, then children, then the β€œnext” nodes).
  • BaseNodePtr is boost :: shared_ptr, since assignments are slow, avoiding temporary BaseNodePtr variables.

This code currently takes 5897ms to visit 62,200,000 nodes in the test tree, calling this function 200,000 times.

 void processTree (BaseNodePtr current, unsigned int & cnt ) { bool down = true; while ( true ) { if ( down ) { while (true) { cnt++; // this can/will be multiple statesments if (!current->hasChild()) break; current = current->child(); } } if ( current->hasNext() ) { down = true; current = current->next(); } else { down = false; current = current->parent(); if (!current) return; // done. } } } 
+1
source share
5 answers

Why not a recursive solution?

 void processTree (const BaseNodePtr &current, unsigned int & cnt ) { cnt++; if (current->hasChild()) processTree(current->child()); if (current->hasNext()) processTree(current->next()); } 

Since shared_ptr seems to be your bottleneck, why not improve it? Do you use streams? If not, then define the character BOOST_HAS_THREADS . The shared_ptr reference counter is protected by a mutex, which is probably the cause of slow operation.

Why not change the data structure so as not to use shared_ptr at all? Manage source pointers yourself? Maybe use scoped_ptr ?

+5
source

For maximum speed, you need to arrange the nodes in memory so that they are stored in a continuous block in the order in which you visit them.

For example, if you have a tree defined as follows.

  1 / \ 2 3 / \ /\ 4 5 6 7 /\ / /\ 8 9 10 11 12 / \ \ 13 14 15 

Then the visit function, as described, will visit the nodes in the following order

 1 2 4 8 13 14 9 5 3 6 10 7 11 12 15 

Now, if you order nodes in memory as a continuous block of 15 distributions and store the nodes in the order shown above, you usually visit a node that has spatial locality . ”This can improve your cache hits, depending on the size of your node structure and, thus speed up the work.

To create a fast iterative method of visiting all nodes in the tree only once and without recursion.

 unsigned int g_StackDepth = 0; BaseNodePtr* g_Stack[MAX_STACK_DEPTH]; void processTree (BaseNodePtr root, unsigned int & cnt ) { g_Stack[g_StackDepth++] = root; while( g_StackDepth > 0 ) { BaseNodePtr curr = g_Stack[--g_StackDepth]; cnt++; if ( curr->HasNext() ) { g_Stack[g_StackDepth++] = curr->Next(); } if ( curr->HasChild() ) { g_Stack[g_StackDepth++] = curr->Child(); } } } 

In combination with the above order, as far as I know, you need to get the maximum speed that you CAN get.

Obviously, this has limitations, since you need to know how big your stack can grow in advance. Although you can get around this using std :: vector. However, using std :: vector will eliminate all the advantages offered by the iterative method above.

Hope this helps :)

+3
source

Create a function "nextvisit" and keep calling it to simplify the code; next to this, use const iso value-semantics links for shared pointers ... this can save valuable copies of shared-ptr:

 // define the order of visitation in here BaseNodePtr& next( const BaseNodePtr& p ) { if( p->hasChild() ) return p->child(); if( p->hasNext() ) return p->next(); BaseNodePtr ancestor = p->parent(); while( ancestor != 0 && !ancestor->hasNext() ) ancestor = ancestor->parent(); return ancestor; } void processTree( const BaseNodePtr& p, unsigned int& cnt ) { while( p != NULL ) { ++cnt; p = next(p); } } 

But for readability, clarity, maintainability ... for God's sake, use recursion. If your stack is not big enough.

+1
source

I HATE when answers dismiss the question with the help of β€œdon't do this,” but here I go ...

Tell me, is there a way to remove down bool ... will it really have a REAL difference in runtime? We are talking about a small part of the processors and a few bytes on the stack.

Focus on making child () and parent () calls faster if you need speed. Otherwise, you are wasting your time (IMOHO).

EDIT: Perhaps take a walk on the tree (with this β€œslow” code) ONCE and create an array of pointers in the tree in the desired order. Use this "index" later.

What I'm saying, I think you're approaching optimization from the wrong angle.

+1
source

Here's how to have only one recursion call instead of two:

 void processTree (const BaseNodePtr &current, unsigned int & cnt ) { for(bool gotNext = true; gotNext; current = current->next()) { cnt++; if (current->hasChild()) processTree(current->child()); gotNext = current->hasNext(); } } 
+1
source

Source: https://habr.com/ru/post/1308879/


All Articles