How to optimize this postfix expression tree for speed?

Thanks for the help I received in this :

I have a nice, concise recursive function for navigating a tree in postfix order:

deque <char*> d; void Node::postfix() { if (left != __nullptr) { left->postfix(); } if (right != __nullptr) { right->postfix(); } d.push_front(cargo); return; }; 

This is an expression tree. Branch nodes are operators randomly selected from the array, and leaf nodes are values ​​or the 'x' variable, also randomly selected from the array.

 char *values[10]={"1.0","2.0","3.0","4.0","5.0","6.0","7.0","8.0","9.0","x"}; char *ops[4]={"+","-","*","/"}; 

Since it will be called billions of times during the launch of the genetic algorithm of which it is a part, I would like to optimize it for speed. I have a number of questions on this topic that I will ask in separate posts.

First of all, how can I access each “truck” as it is found. That is: instead of clicking on the “load” on deque and then processing deque to get the value, I would immediately start processing it.

Edit: This question suggests that processing deque afterwards is the best way.

I do not yet know about parallel processing in C ++, but ideally this could be done simultaneously on two different processors.

In python, I will make the function a generator and access the next "load" using .next ().

See the above Editing.

But I use C ++ to speed up python implementation. I think that such a tree has existed for a long time, and someone probably already optimized it. Any ideas? Thanks

+4
source share
3 answers

Assuming cargo handling is expensive enough to block the mutex is relatively cheap, you can use a separate thread to access the queue when you put items on it.

In thread 1, your current logic will be executed, but it will block the queue mutexes before adding an element and unlock it later.

Then thread 2 will just loop forever, checking the size of the queue. If it is not empty, then block the queue, pull out all available cargo and process it. Repeat the cycle. If no load can sleep for a short period of time and repeat.

If the lock is too expensive, you can create a queue of queues: first you put 100 items in the cargo queue, and then put this queue in a blocked queue (for example, the first example). Then start a new "local" queue and continue.

+1
source

Of course, you must first measure the overhead before worrying about optimization, since your next-generation genetic algorithm and mutations can put out the time of the estimate.

Once you decide you want to optimize ... the obvious answer is to compile the expression ("as much as possible"). Fortunately, there are many ways to “compile”.

If you implement this in Python, you can ask Python (I'm not an expert) to compile the constructed abstract syntax tree into a function, and it can be much faster, especially if CPython supports this.

It seems that you are implementing this in C ++. In this case, I would not evaluate the expression tree as you defined it, as this means that there are many walks, indirect function calls, etc., which are quite expensive.

One trick is to pop the actual expression as a text string with the corresponding text text of the C ++ function around it into a file and run the C ++ compiler. You can automate the entire spit-compile-relink with enough script magic, so if you do this rarely, it will work and you will get an expression score as fast as the machine can do it.

Assuming you don’t want to do this, I’ll be tempted to go through the expression tree before starting the evaluation process and “compile” this tree as a set of actions stored in a linear array called “code”. Actions will be determined by enumeration:

 enum actions { // general actions first pushx, // action to push x on a stack push1, push2, // action to push 2 on a stack ... pushN, add, sub, mul, // action multiply top two stack elements together div, ... // optimized actions add1, sub1, mul1, div1, // action to divide top stack element by 1 ... addN, subN, ... addx, subX, ... } 

In this case, I defined actions to implement the evaluator of the push-down stack expression, because it is easy to understand. Fortunately, your vocabulary is quite limited, so your actions can also be quite limited (they would be more complex if you had arbitrary variables or constants).

The expression ((x * 2.0) + x) -1 will be performed by a series of actions

  pushx mul2 addx sub1 

Most likely, it is much better than that.

It would be possible to determine the actions for the implementation of the register-oriented appraiser using the multi-mode CPU model; which would provide even faster execution (I would assume twice, but only if the expression got very complicated).

What you want is actions that cover the most common calculations that you need to perform (so you can always choose a valid sequence of actions regardless of your original expression) and actions that often occur in expressions you encounter (add1 quite typical for machine code, I don’t know what your statistics look like, and your remark that you are doing genetic programming says that you don’t know what statistics will be, but you can somehow change them rit or do and get an educated guess).

Now your inner loop for evaluation will look (sloppy syntax here):

 float stack[max_depth]; stack_depth=0; for (i=1;i<expression_length;i++) { switch (code[i]) // one case for each opcode in the enum { case pushx: stack[stack_depth++]=x; break; case push1: stack[stack_depth++]=1; break; ... case add: stack[stack_depth-1]+=stack[stack_depth]; stack_depth--; break; ... case subx: stack[stack_depth]-=x; break; ... } } // stack[1] contains the answer here 

The above code implements a very fast "stream interpreter" for the pushdown expression evaluator.

Now you just need to create the contents of the code array. You can do this using the original expression tree by executing the initial tree of recursive expressions, but instead of evaluating the expression, write down the action that your current expression evaluator will do in the code array and spit out special actions if you find them ( this amounts to "eye optimization"). This is a classic compilation from trees, and you can learn a lot more about how to do this pretty much in any compiler book.

Yes, this is all fair work. But then you decided to run a genetic algorithm, which is quite expensive.

+3
source

There are many good suggestions in this thread to speed up the iteration of the tree:

Tree iterator, can you optimize it further?

Regarding the problem, I think you could handle Cargo in a different thread, but seeing that you are not actually doing this. You are likely to spend more time on thread synchronization mechanisms that do some actual work.

You can find, instead of just dragging it into a deque, so that if you just handle it when you go forward, you can have it all faster. Yuo can find handling all of this in a separate loop at the end faster. The best way to find out is to try both methods with various different inputs and their time.

+2
source

Source: https://habr.com/ru/post/1308877/


All Articles