Improving the algorithm for enumerating binary trees

Question

Improving the algorithm for enumerating binary trees

I can currently list rooted planar unlabeled binary trees using the following Prolog prolog code.

e --> u | b | t. u --> ['[op(u),['], e, [']]']. b --> ['[op(b),['], e, [','], e, [']]']. t --> ['number(n)'].

Note. See the output listings below.

and output them in an enlarged size using

 es(S) :- length(Ls, _), phrase(e, Ls), % or e(Ls,[]), atomic_list_concat(Ls,S).

However, this is an inefficient brute force algorithm.

Is there a more efficient algorithm for enumerating flat root unmarked binary trees?

Note. Trees can be generated using the trees from the previous two iterations, think of Fibonacci numbers and adding either a unary branch or a binary branch, however this leads to duplication of trees. I myself could make this version, what I'm looking for is an algorithm that generates trees in an efficient way for the first time without duplicates.

Note. A binary tree is also known as a binary expression tree or K-arry tree with K <= 2.

Addition

results

My brute force version for M (15) took 1 hour and 27 minutes, while the effective version for M (15) took about 2 seconds.

Obviously, an efficient algorithm is much more efficient and why I asked a question.

Motzkin numbers

The number of trees having nodes N for root flat unmarked binary trees is given by Motzkin numbers. See: OEIS A001006

 Nodes Trees 1 1 2 1 3 2 4 4 5 9

The number of trees that have N internal nodes for a flat root unlabeled binary tree is given by Catalan numbers. There is a more efficient algorithm for generating flat root binary trees using Catalan numbers.

Note:
The number of trees based on catalytic numbers does not have unary branches and counts only internal nodes.

while

the number of trees based on Motzkin numbers do has unary branches and counts all the nodes.

Cm:
OEIS A000108
Catalan numbers from Tom Davis

Related Prolog List Items with Motzkin Number

 % M is Motzkin number, N is number of list elements passed to atomic_list_concat\2 m_to_n(1,1). m_to_n(2,3). m_to_n(M,N) :- M > 2, N is (M*2)-1. es_m(M,S) :- m_to_n(M,N), length(Ls,N), e(Ls,[]), atomic_list_concat(Ls,S).

Statistics with an ineffective brute force version

 es_c(M,Count) :- aggregate_all(count, es_m(M,_), Count). ?- time(es_c(1,Count)). % 57 inferences, 0.000 CPU in 0.000 seconds (?% CPU, Infinite Lips) Count = 1. ?- time(es_c(2,Count)). % 141 inferences, 0.000 CPU in 0.000 seconds (?% CPU, Infinite Lips) Count = 1. ?- time(es_c(3,Count)). % 571 inferences, 0.000 CPU in 0.000 seconds (?% CPU, Infinite Lips) Count = 2. ?- time(es_c(4,Count)). % 2,740 inferences, 0.000 CPU in 0.000 seconds (?% CPU, Infinite Lips) Count = 4. ?- time(es_c(5,Count)). % 13,780 inferences, 0.000 CPU in 0.001 seconds (0% CPU, Infinite Lips) Count = 9. ?- time(es_c(6,Count)). % 70,072 inferences, 0.000 CPU in 0.002 seconds (0% CPU, Infinite Lips) Count = 21. ?- time(es_c(7,Count)). % 357,358 inferences, 0.016 CPU in 0.012 seconds (136% CPU, 22870912 Lips) Count = 51. ?- time(es_c(8,Count)). % 1,824,082 inferences, 0.063 CPU in 0.056 seconds (111% CPU, 29185312 Lips) Count = 127. ?- time(es_c(9,Count)). % 9,313,720 inferences, 0.297 CPU in 0.290 seconds (102% CPU, 31372531 Lips) Count = 323. ?- time(es_c(10,Count)). % 47,561,878 inferences, 1.469 CPU in 1.467 seconds (100% CPU, 32382555 Lips) Count = 835. ?- time(es_c(11,Count)). % 242,896,160 inferences, 7.672 CPU in 7.665 seconds (100% CPU, 31660599 Lips) Count = 2188. ?- time(es_c(12,Count)). % 1,240,493,974 inferences, 38.797 CPU in 38.841 seconds (100% CPU, 31974069 Lips) Count = 5798. ?- time(es_c(13,Count)). % 6,335,410,822 inferences, 206.047 CPU in 213.116 seconds (97% CPU, 30747425 Lips) Count = 15511. ?- time(es_c(14,Count)). % 32,356,235,848 inferences, 1016.156 CPU in 1018.955 seconds (100% CPU, 31841792 Lips) Count = 41835. ?- time(es_c(15,Count)). % 165,250,501,417 inferences, 5231.766 CPU in 5268.363 seconds (99% CPU, 31585991 Lips) Count = 113634.

References

Free downloadable pdf book that can help Analytical Combinatorics by Philip Flagola and Robert Sedgwick

Also see links in the Catalan tag.

Motzkin numbers

Bnf

 <expression> ::= <unary expression> | <binary expression> | <terminal> <unary expression> ::= "(u" <expression> ")" <binary expression> ::= "(b" <expression> " " <expression> ")" <terminal> ::= "t"

Using answer from David Eisenstat

Think of it as notes or breadcrumbs for me if I need to use it again many months after I forget it.

To check the answer, I used WSL (Windows Subsystem for Linux) with Python 3 installed

Using Windows 10, I created a file called motzkin.py in the directory

 C:\Users\Eric\Documents\Prolog

with Python code

 def ubtrees(n): if n == 1: yield 't' elif n > 1: for t in ubtrees(n - 1): yield '(u {})'.format(t) for i in range(1, n - 1): for t1 in ubtrees(i): for t2 in ubtrees(n - 1 - i): yield '(b {} {})'.format(t1, t2)

then in WSL I created a symlink to the Windows Prolog directory

 $ ln -s "/mnt/c/Users/Eric/Documents/Prolog" /home/eric/Prolog

and changed to the WSL Prolog directory

 $ cd Prolog

then launched Python3

 ~/Prolog$ python3

and imported python code

 >>> import motzkin

and did the following with an argument: ubtrees is the Motzkin number

 >>> for value in ubtrees(1): ... print(value) ... t >>> for value in ubtrees(2): ... print(value) ... (ut) >>> for value in ubtrees(3): ... print(value) ... (u (ut)) (btt) >>> for value in ubtrees(4): ... print(value) ... (u (u (ut))) (u (btt)) (bt (ut)) (b (ut) t) >>> for value in ubtrees(5): ... print(value) ... (u (u (u (ut)))) (u (u (btt))) (u (bt (ut))) (u (b (ut) t)) (bt (u (ut))) (bt (btt)) (b (ut) (ut)) (b (u (ut)) t) (b (btt) t)

and check the Motzkin numbers

 def m_count(m): count = sum(1 for x in ubtrees(m)) print("Count: ", count) >>> m_count(1) Count: 1 >>> m_count(2) Count: 1 >>> m_count(3) Count: 2 >>> m_count(4) Count: 4 >>> m_count(5) Count: 9 >>> m_count(6) Count: 21 >>> m_count(7) Count: 51 >>> m_count(8) Count: 127 >>> m_count(9) Count: 323 >>> m_count(10) Count: 835 >>> m_count(11) Count: 2188 >>> m_count(12) Count: 5798 >>> m_count(13) Count: 15511 >>> m_count(14) Count: 41835 >>> m_count(15) Count: 113634

To exit interactive Python, use

 quit()

Duplicate Notes

As I learned about Motzkin’s numbers, I manually enumerated the trees with a pen and paper and found a duplicate using the method of adding a unary branch to the previous M (N-1) trees and binary branches to the previous previous M (N-2) trees.

This single tree was generated twice for M (5) from M (4) trees

 (b (ut) (ut))

first by adding a unary branch to

 (b (ut) t)

and second by adding a unary branch to

 (bt (ut))

After that, I had a sequence of numbers 1,2,4,9,21, which I used with OEIS search and the top result was A001006 for Motskin numbers. As soon as I got a wider list of Motzkin numbers, I used the Prolog code to generate counters for large input values, and they all agreed. Now you can add OEIS to your software box with a valid example to demonstrate to others.

Larger view

If you read this far, you will probably see that this is part of a more serious problem, which is to create a system first in Prolog, which can use rewriting of terms to solve mathematical expressions to the basic calculus, but it is more important to show the steps taken. This way, it gets one part of the way to create binary expression trees that will be used as test cases. The next step will be the ability to individually set the number of nodes that are unary and binary, instead of fixing them with the Motzkin number. I used only Motzkin numbers to make sure that I correctly generated a subset of the combinations. Now that I have a template for this, I can change it to take one argument for the number of unary nodes and one for binary nodes. See: How to List Combinations Using DCG with CLP (FD) and Several Constraints

Only when I'm stuck will I ask questions related to this, so don't expect to see all the necessary breadcrumbs.

Prolog Output

 ?- length(Ls, N), phrase(e, Ls). Ls = ['number(0)'], N = 1 ; Ls = ['[op(u),[', 'number(0)', ']]'], N = 3 ; Ls = ['[op(u),[', '[op(u),[', 'number(0)', ']]', ']]'], N = 5 ; Ls = ['[op(b),[', 'number(0)', ',', 'number(0)', ']]'], N = 5 ; Ls = ['[op(u),[', '[op(u),[', '[op(u),[', 'number(0)', ']]', ']]', ']]'], N = 7 ; Ls = ['[op(u),[', '[op(b),[', 'number(0)', ',', 'number(0)', ']]', ']]'], N = 7 ; Ls = ['[op(b),[', '[op(u),[', 'number(0)', ']]', ',', 'number(0)', ']]'], N = 7 ; Ls = ['[op(b),[', 'number(0)', ',', '[op(u),[', 'number(0)', ']]', ']]'], N = 7 ;

 ?- es(S). S = 'number(0)' ; S = '[op(u),[number(0)]]' ; S = '[op(u),[[op(u),[number(0)]]]]' ; S = '[op(b),[number(0),number(0)]]' ; S = '[op(u),[[op(u),[[op(u),[number(0)]]]]]]' ; S = '[op(u),[[op(b),[number(0),number(0)]]]]' ; S = '[op(b),[[op(u),[number(0)]],number(0)]]' ; S = '[op(b),[number(0),[op(u),[number(0)]]]]' ; S = '[op(u),[[op(u),[[op(u),[[op(u),[number(0)]]]]]]]]' ; S = '[op(u),[[op(u),[[op(b),[number(0),number(0)]]]]]]' ; S = '[op(u),[[op(b),[[op(u),[number(0)]],number(0)]]]]' ; S = '[op(u),[[op(b),[number(0),[op(u),[number(0)]]]]]]' ; S = '[op(b),[[op(u),[[op(u),[number(0)]]]],number(0)]]' ; S = '[op(b),[[op(u),[number(0)]],[op(u),[number(0)]]]]' ; S = '[op(b),[[op(b),[number(0),number(0)]],number(0)]]' ; S = '[op(b),[number(0),[op(u),[[op(u),[number(0)]]]]]]' ; S = '[op(b),[number(0),[op(b),[number(0),number(0)]]]]' ;

 ?- es_m(1,E). E = 'number(n)' ; false. ?- es_m(2,E). E = '[op(u),[number(n)]]' ; false. ?- es_m(3,E). E = '[op(u),[[op(u),[number(n)]]]]' ; E = '[op(b),[number(n),number(n)]]' ; false. ?- es_m(4,E). E = '[op(u),[[op(u),[[op(u),[number(n)]]]]]]' ; E = '[op(u),[[op(b),[number(n),number(n)]]]]' ; E = '[op(b),[[op(u),[number(n)]],number(n)]]' ; E = '[op(b),[number(n),[op(u),[number(n)]]]]' ; false. ?- es_m(5,E). E = '[op(u),[[op(u),[[op(u),[[op(u),[number(n)]]]]]]]]' ; E = '[op(u),[[op(u),[[op(b),[number(n),number(n)]]]]]]' ; E = '[op(u),[[op(b),[[op(u),[number(n)]],number(n)]]]]' ; E = '[op(u),[[op(b),[number(n),[op(u),[number(n)]]]]]]' ; E = '[op(b),[[op(u),[[op(u),[number(n)]]]],number(n)]]' ; E = '[op(b),[[op(u),[number(n)]],[op(u),[number(n)]]]]' ; E = '[op(b),[[op(b),[number(n),number(n)]],number(n)]]' ; E = '[op(b),[number(n),[op(u),[[op(u),[number(n)]]]]]]' ; E = '[op(b),[number(n),[op(b),[number(n),number(n)]]]]' ; false.

+5

language-agnostic algorithm binary-tree motzkin-numbers oeis

Guy coder Mar 6 '17 at 13:43

source share

3 answers

In addition to the solution already published, I would like to introduce the following Prolog solution for the task.

Firstly, this is a declarative description of such trees:

  e (number) -> [].
 e (u (Arg)) -> [_], e (Arg).
 e (b (Left, Right)) -> [_, _], e (Left), e (Right).

I also use dcg . However, I use it for a different purpose than the question in question: in my case, I only describe the list to limit the depth of the solutions. Think about non-terminals by indicating how many “tokens” will definitely be consumed by applying the rule. Also note that I use compound terms to naturally describe such trees.

An example query using iterative deepening:

  ? - length (Ls, _), phrase (e (E), Ls).
 Ls = [],
 E = number;
 Ls = [_5710],
 E = u (number);
 Ls = [_5710, _5716],
 E = u (u (number));
 Ls = [_5710, _5716],
 E = b (number, number);
 Ls = [_5710, _5716, _5722],
 E = u (u (u (number))).

Now I can count :

  es_count (M, Count): -
         length ([_ | Ls], M),
         findall (., phrase (e (_), Ls), Sols),
         length (Sols, Count).

Here are some more solutions:

  ? - length (_, M), 
     time (es_count (M, Count)), 
     portray_clause (m_count (M, Count)), 
     false
 % 7 inferences, 0.000 CPU in 0.000 seconds (64% CPU, 636364 Lips)
 % 28 inferences, 0.000 CPU in 0.000 seconds (81% CPU, 1120000 Lips)
 m_count (1, 1).
 % 29 inferences, 0.000 CPU in 0.000 seconds (31% CPU, 1318182 Lips)
 m_count (2, 1).
 % 33 inferences, 0.000 CPU in 0.000 seconds (76% CPU, 1736842 Lips)
 m_count (3, 2).
 % 41 inferences, 0.000 CPU in 0.000 seconds (81% CPU, 1952381 Lips)
 m_count (4, 4).
 % 61 inferences, 0.000 CPU in 0.000 seconds (88% CPU, 2178571 Lips)
 m_count (5, 9).
 % 109 inferences, 0.000 CPU in 0.000 seconds (91% CPU, 2595238 Lips)
 m_count (6, 21).
 % 230 inferences, 0.000 CPU in 0.000 seconds (93% CPU, 2948718 Lips)
 m_count (7, 51).
 % 538 inferences, 0.000 CPU in 0.000 seconds (97% CPU, 3221557 Lips)
 m_count (8, 127).
 % 1,337 inferences, 0.000 CPU in 0.000 seconds (99% CPU, 3293103 Lips)
 m_count (9, 323).
 % 3,434 inferences, 0.001 CPU in 0.001 seconds (99% CPU, 3,400,000 Lips)
 m_count (10, 835).
 % 9,000 inferences, 0.003 CPU in 0.003 seconds (94% CPU, 3301541 Lips)
 m_count (11, 2188).
 % 23,908 inferences, 0.007 CPU in 0.024 seconds (31% CPU, 3300387 Lips)
 m_count (12, 5798).
 % 64,158 inferences, 0.019 CPU in 0.024 seconds (79% CPU, 3387792 Lips)
 m_count (13, 15511).
 % 173,579 inferences, 0.051 CPU in 0.062 seconds (83% CPU, 3397448 Lips)
 m_count (14, 41835).
 % 472,853 inferences, 0.139 CPU in 0.152 seconds (92% CPU, 3393690 Lips)
 m_count (15, 113634).

Prolog is a good language for creating decisions based on declarative descriptions!

+3

mat Mar 6 '17 at 19:02

source share

Prolog coding a recursive relation as suggested by @DavidEisenstat:

 motzkin_numbers(0, 1). motzkin_numbers(1, 1). motzkin_numbers(N, M) :- N>1, N1 is N-1, motzkin_numbers(N1, M1), N2 is N-2, aggregate_all(sum(MiP), ( between(0, N2, I), motzkin_numbers(I, M_i), I_n1i is N-2-I, motzkin_numbers(I_n1i, M_n1i), MiP is M_i * M_n1i), Ms), M is M1 + Ms.

we get

 ?- length(_,N), time(motzkin_numbers(N,M)). ... N = 14, M = 113634 ; % 4 inferences, 0.000 CPU in 0.000 seconds (89% CPU, 115724 Lips) % 1,863,722 inferences, 1.107 CPU in 1.107 seconds (100% CPU, 1683529 Lips) N = 15, M = 310572 ; % 4 inferences, 0.000 CPU in 0.000 seconds (88% CPU, 129232 Lips) % 4,499,430 inferences, 2.645 CPU in 2.646 seconds (100% CPU, 1700821 Lips) N = 16, M = 853467 ...

but SWI-Prolog recently added a tab: just adding this ad

 :- table motzkin_numbers/2.

we get

 ... % 310 inferences, 0.001 CPU in 0.001 seconds (99% CPU, 591031 Lips) N = 14, M = 113634 ; % 331 inferences, 0.001 CPU in 0.001 seconds (100% CPU, 457731 Lips) N = 15, M = 310572 ; % 352 inferences, 0.001 CPU in 0.001 seconds (100% CPU, 310880 Lips) N = 16, M = 853467 ; % 373 inferences, 0.001 CPU in 0.001 seconds (100% CPU, 703349 Lips) N = 17, M = 2356779 ; ...

+3

Capellic Mar 6 '17 at 20:26

source share

David Eisenstat · Accepted Answer · 2017-03-06T15:30:17+0000

In Python 3:

 def ubtrees(n): if n == 1: yield 't' elif n > 1: for t in ubtrees(n - 1): yield '(u {})'.format(t) for i in range(1, n - 1): for t1 in ubtrees(i): for t2 in ubtrees(n - 1 - i): yield '(b {} {})'.format(t1, t2)

This recursive enumeration procedure matches repetition

 M_1 = 1 M_n = M_{n-1} + sum_{i=1}^{n-2} M_i M_{n-1-i},

which shifts from the repetition specified in Wikipedia by one.

Improving the algorithm for enumerating binary trees

Addition

results

Motzkin numbers

Related Prolog List Items with Motzkin Number

Statistics with an ineffective brute force version

References

Bnf

Using answer from David Eisenstat

Duplicate Notes

Larger view

Prolog Output

More articles: