Tree iterator, can you optimize this further?

Question

Tree iterator, can you optimize this further?

As a continuation of my initial question about a small piece of this code, I decided to ask to see if you can improve what we have come up with so far.

The following code iterates over the binary tree (left / right = child / next). I believe that there is room for one conditional ( down boolean). The fastest answer wins!

The cnt statement can be multiple statements, so you can verify that it appears only once.
The member functions child() and next() about 30 times slower than the hasChild () and hasNext () operations.
Keep it iterative <- dropped this requirement, because the recursive solution was presented faster.
This is C ++ code.
The procedure for visiting sites should remain as in the example below. (first click parents, then children, then the “next” nodes).
BaseNodePtr is boost :: shared_ptr, since assignments are slow, avoiding temporary BaseNodePtr variables.

This code currently takes 5897ms to visit 62,200,000 nodes in the test tree, calling this function 200,000 times.

 void processTree (BaseNodePtr current, unsigned int & cnt ) { bool down = true; while ( true ) { if ( down ) { while (true) { cnt++; // this can/will be multiple statesments if (!current->hasChild()) break; current = current->child(); } } if ( current->hasNext() ) { down = true; current = current->next(); } else { down = false; current = current->parent(); if (!current) return; // done. } } }

+1

c ++ optimization iteration binary-tree

Ron Aug 19 '09 at 20:08

source share

5 answers

For maximum speed, you need to arrange the nodes in memory so that they are stored in a continuous block in the order in which you visit them.

For example, if you have a tree defined as follows.

  1 / \ 2 3 / \ /\ 4 5 6 7 /\ / /\ 8 9 10 11 12 / \ \ 13 14 15

Then the visit function, as described, will visit the nodes in the following order

 1 2 4 8 13 14 9 5 3 6 10 7 11 12 15

Now, if you order nodes in memory as a continuous block of 15 distributions and store the nodes in the order shown above, you usually visit a node that has spatial locality . ”This can improve your cache hits, depending on the size of your node structure and, thus speed up the work.

To create a fast iterative method of visiting all nodes in the tree only once and without recursion.

 unsigned int g_StackDepth = 0; BaseNodePtr* g_Stack[MAX_STACK_DEPTH]; void processTree (BaseNodePtr root, unsigned int & cnt ) { g_Stack[g_StackDepth++] = root; while( g_StackDepth > 0 ) { BaseNodePtr curr = g_Stack[--g_StackDepth]; cnt++; if ( curr->HasNext() ) { g_Stack[g_StackDepth++] = curr->Next(); } if ( curr->HasChild() ) { g_Stack[g_StackDepth++] = curr->Child(); } } }

In combination with the above order, as far as I know, you need to get the maximum speed that you CAN get.

Obviously, this has limitations, since you need to know how big your stack can grow in advance. Although you can get around this using std :: vector. However, using std :: vector will eliminate all the advantages offered by the iterative method above.

Hope this helps :)

+3

Goz Aug 19 '09 at 21:32

source share

Create a function "nextvisit" and keep calling it to simplify the code; next to this, use const iso value-semantics links for shared pointers ... this can save valuable copies of shared-ptr:

 // define the order of visitation in here BaseNodePtr& next( const BaseNodePtr& p ) { if( p->hasChild() ) return p->child(); if( p->hasNext() ) return p->next(); BaseNodePtr ancestor = p->parent(); while( ancestor != 0 && !ancestor->hasNext() ) ancestor = ancestor->parent(); return ancestor; } void processTree( const BaseNodePtr& p, unsigned int& cnt ) { while( p != NULL ) { ++cnt; p = next(p); } }

But for readability, clarity, maintainability ... for God's sake, use recursion. If your stack is not big enough.

+1

xtofl Aug 19 '09 at 20:28

source share

I HATE when answers dismiss the question with the help of “don't do this,” but here I go ...

Tell me, is there a way to remove down bool ... will it really have a REAL difference in runtime? We are talking about a small part of the processors and a few bytes on the stack.

Focus on making child () and parent () calls faster if you need speed. Otherwise, you are wasting your time (IMOHO).

EDIT: Perhaps take a walk on the tree (with this “slow” code) ONCE and create an array of pointers in the tree in the desired order. Use this "index" later.

What I'm saying, I think you're approaching optimization from the wrong angle.

+1

Aardvark Aug 19 '09 at 20:41

source share

Here's how to have only one recursion call instead of two:

 void processTree (const BaseNodePtr &current, unsigned int & cnt ) { for(bool gotNext = true; gotNext; current = current->next()) { cnt++; if (current->hasChild()) processTree(current->child()); gotNext = current->hasNext(); } }

+1

David Lehavi Aug 19 '09 at 21:22

source share

1800 INFORMATION · Accepted Answer · 2009-08-19T20:19:10+0000

Why not a recursive solution?

 void processTree (const BaseNodePtr &current, unsigned int & cnt ) { cnt++; if (current->hasChild()) processTree(current->child()); if (current->hasNext()) processTree(current->next()); }

Since shared_ptr seems to be your bottleneck, why not improve it? Do you use streams? If not, then define the character BOOST_HAS_THREADS . The shared_ptr reference counter is protected by a mutex, which is probably the cause of slow operation.

Why not change the data structure so as not to use shared_ptr at all? Manage source pointers yourself? Maybe use scoped_ptr ?

Tree iterator, can you optimize this further?

More articles: