AST score (abstract syntax tree) in Clojure

How to evaluate AST with better performance? Currently, we are creating AST as a tree, where leaf nodes (terminals) are functions of one argument - a map of keywords and their meanings. Terminals are represented by keywords, and functions (without terminals) can be user functions (or clojure). The full growth method creates a tree from non-terminals and terminals:

(defn full-growth "Creates individual by full growth method: root and intermediate nodes are randomly selected from non-terminals Ns, leaves at depth depth are randomly selected from terminals Ts" [Ns Ts arity-fn depth] (if (<= depth 0) (rand-nth Ts) (let [n (rand-nth Ns)] (cons n (repeatedly (arity-fn n) #(full-growth Ns Ts arity-fn(dec depth))))))) 

An example of a generated AST:

 => (def ast (full-growth [+ *] [:x] {+ 2, * 2} 3)) #'gpr.symb-reg/ast => ast (#object[clojure.core$_STAR_ 0x6fc90beb " clojure.core$_STAR_@6fc90beb "] (#object[clojure.core$_STAR_ 0x6fc90beb " clojure.core$_STAR_@6fc90beb "] (#object[clojure.core$_STAR_ 0x6fc90beb " clojure.core$_STAR_@6fc90beb "] :x :x) (#object[clojure.core$_PLUS_ 0x1b00ba1a " clojure.core$_PLUS_@1b00ba1a "] :x :x)) (#object[clojure.core$_PLUS_ 0x1b00ba1a " clojure.core$_PLUS_@1b00ba1a "] (#object[clojure.core$_PLUS_ 0x1b00ba1a " clojure.core$_PLUS_@1b00ba1a "] :x :x) (#object[clojure.core$_PLUS_ 0x1b00ba1a " clojure.core$_PLUS_@1b00ba1a "] :x :x))) 

which is equivalent

 `(~* (~* (~* ~:x ~:x) (~+ ~:x ~:x)) (~+ (~+ ~:x ~:x) (~+ ~:x ~:x))) (def ast `(~* (~* (~* ~:x ~:x) (~+ ~:x ~:x)) (~+ (~+ ~:x ~:x) (~+ ~:x ~:x)))) 

We can write fn that directly evaluates this AST as:

 (defn ast-fn [{x :x}] (* (* (* xx) (+ xx)) (+ (+ xx) (+ xx)))) => (ast-fn {:x 3}) 648 

We have two methods for creating an AST-based function, one using apply and map, and the other using comp and juxt:

 (defn tree-apply "((+ :x :x) in) => (apply + [(:x in) (:x in))]" ([tree] (fn [in] (tree-apply tree in))) ([tree in] (if (sequential? tree) (apply (first tree) (map #(tree-apply % in) (rest tree))) (tree in)))) #'gpr.symb-reg/tree-apply => (defn tree-comp "(+ :x :x) => (comp (partial apply +) (juxt :x :x))" [tree] (if (sequential? tree) (comp (partial apply (first tree)) (apply juxt (map tree-comp (rest tree)))) tree)) #'gpr.symb-reg/tree-comp => ((tree-apply ast) {:x 3}) 648 => ((tree-comp ast) {:x 3}) 648 

Given the time fn, we measure the execution time of functions by test cases:

 => (defn timing [f interval] (let [values (into [] (map (fn[x] {:xx})) interval)] (time (into [] (map f) values))) true) => (timing ast-fn (range -10 10 0.0001)) "Elapsed time: 37.184583 msecs" true => (timing (tree-comp ast) (range -10 10 0.0001)) "Elapsed time: 328.961435 msecs" true => (timing (tree-apply ast) (range -10 10 0.0001)) "Elapsed time: 829.483138 msecs" true 

As you can see, there is a huge performance difference between the direct function (ast-fn) generated by the tree-generated function and the generated tree-apply function.

Is there a better way?

Edit:. Madstap's answer looks pretty promising. I made some changes to its solution (terminals can be some other functions, and not just keywords, such as a constant function that constantly returns a value regardless of the input):

 (defn c [v] (fn [_] v)) (def c1 (c 1)) (defmacro full-growth-macro "Creates individual by full growth method: root and intermediate nodes are randomly selected from non-terminals Ns, leaves at depth depth are randomly selected from terminals Ts" [Ns Ts arity-fn depth] (let [tree (full-growth Ns Ts arity-fn depth) val-map (gensym) ast2f (fn ast2f [ast] (if (sequential? ast) (list* (first ast) (map #(ast2f %1) (rest ast))) (list ast val-map))) new-tree (ast2f tree)] `{:ast '~tree :fn (fn [~val-map] ~new-tree)})) 

Now, creating ast-m (using the constant c1 as the terminal) and the associated ast-m-fn:

 => (def ast-m (full-growth-macro [+ *] [:x c1] {+ 2 * 2} 3)) #'gpr.symb-reg/ast-m => ast-m {:fn #object[gpr.symb_reg$fn__20851 0x31802c12 " gpr.symb_reg$fn__20851@31802c12 "], :ast (+ (* (+ :x :x) (+ :x c1)) (* (* c1 c1) (* :x c1)))} => (defn ast-m-fn [{x :x}] (+ (* (+ xx) (+ x 1)) (* (* 1 1) (* x 1)))) #'gpr.symb-reg/ast-m-fn 

Timing looks very similar:

 => (timing (:fn ast-m) (range -10 10 0.0001)) "Elapsed time: 58.478611 msecs" true => (timing (:fn ast-m) (range -10 10 0.0001)) "Elapsed time: 53.495922 msecs" true => (timing ast-m-fn (range -10 10 0.0001)) "Elapsed time: 74.412357 msecs" true => (timing ast-m-fn (range -10 10 0.0001)) "Elapsed time: 59.556227 msecs" true 
+5
source share
2 answers

Use a macro to write the ast-fn equivalent.

 (ns foo.core (:require [clojure.walk :as walk])) (defmacro ast-macro [tree] (let [val-map (gensym) new-tree (walk/postwalk (fn [x] (if (keyword? x) (list val-map x) x)) (eval tree))] `(fn [~val-map] ~new-tree))) 

On my machine, it comes close to perf of ast-fn . 45 ms to 50 ms He performs more searches, but this can be fixed with some additional tricks.

Edit: I was thinking more about this. eval Entering an argument during macro expansion will limit the use of this parameter (the argument cannot be local). When creating full-growth macro might work better. As amalloy says, it's all about what you want to do at runtime versus macro exposure time.

 (defmacro full-growth-macro "Creates individual by full growth method: root and intermediate nodes are randomly selected from non-terminals Ns, leaves at depth depth are randomly selected from terminals Ts" [Ns Ts arity-fn depth] (let [tree (full-growth Ns Ts arity-fn depth) val-map (gensym) new-tree (walk/postwalk (fn [x] (if (keyword? x) (list val-map x) x)) tree)] `{:ast '~tree :fn (fn [~val-map] ~new-tree)})) 
+1
source

You redefine much of what the compiler does in a much less efficient way, using hashmaps to search for variables by name at run time. Typically, the compiler can pre-locate locators in a known place on the stack and search for them with a single bytecode instruction, but you force it to call many functions to find out which variable to use for x . Likewise, you go through several levels of dynamic sending to find out what you want to call * , while usually the compiler can see the literal * in the source code and issue a simple call to clojure.lang.Numbers/multiply .

Putting it all aside for the duration of the execution will impose the inevitable punishment on you. I think you have done as much as you can to speed things up.

+1
source

Source: https://habr.com/ru/post/1272400/


All Articles