Extracting a grammar rule from a parsed result

I get the following result when I execute the stanford parser from nltk.

(S (VP (VB get) (NP (PRP me)) (ADVP (RB now)))) 

but I need it in the form

 S -> VP VP -> VB NP ADVP VB -> get PRP -> me RB -> now 

How can I get this result, possibly using a recursive function. Is there a built-in function?

+5
source share
1 answer

First, to navigate the tree, see How to iterate over all nodes of the tree? and How to navigate nltk.tree.Tree? :

 >>> from nltk.tree import Tree >>> bracket_parse = "(S (VP (VB get) (NP (PRP me)) (ADVP (RB now))))" >>> ptree = Tree.fromstring(bracket_parse) >>> ptree Tree('S', [Tree('VP', [Tree('VB', ['get']), Tree('NP', [Tree('PRP', ['me'])]), Tree('ADVP', [Tree('RB', ['now'])])])]) >>> for subtree in ptree.subtrees(): ... print subtree ... (S (VP (VB get) (NP (PRP me)) (ADVP (RB now)))) (VP (VB get) (NP (PRP me)) (ADVP (RB now))) (VB get) (NP (PRP me)) (PRP me) (ADVP (RB now)) (RB now) 

And you are looking for https://github.com/nltk/nltk/blob/develop/nltk/tree.py#L341 :

 >>> ptree.productions() [S -> VP, VP -> VB NP ADVP, VB -> 'get', NP -> PRP, PRP -> 'me', ADVP -> RB, RB -> 'now'] 

Please note that Tree.productions() returns a Production object, see https://github.com/nltk/nltk/blob/develop/nltk/tree.py#L22 and https://github.com/nltk/nltk /blob/develop/nltk/grammar.py#L236 .

If you need a string form of grammar rules, you can:

 >>> for rule in ptree.productions(): ... print rule ... S -> VP VP -> VB NP ADVP VB -> 'get' NP -> PRP PRP -> 'me' ADVP -> RB RB -> 'now' 

Or

 >>> rules = [str(p) for p in ptree.productions()] >>> rules [ -> VP', 'VP -> VB NP ADVP', "VB -> 'get'", 'NP -> PRP', "PRP -> 'me'", 'ADVP -> RB', "RB -> 'now'"] 
+5
source

Source: https://habr.com/ru/post/1233754/


All Articles