Extract Chomsky-Normal View Grammar from Parse Tree

I am trying to extract derivatives from a CNF grammar of a sentence from its parsing tree:

(ROOT (S (NP (DT the) (NNS kids)) (VP (VBD opened) (NP (DT the) (NN box)) (PP (IN on) (NP (DT the) (NN floor)))))) 

I put the hole tree in a string named S, and then:

 tree = Tree.fromstring(S) tree.chomsky_normal_form() for p in tree.productions(): print p 

Output signal

 (1) NN -> 'box' (2) PP -> IN NP (3) DT -> 'the' (4) ROOT -> S (5) NP -> DT NN (6) VBD -> 'opened' (7) VP|<NP-PP> -> NP PP (8) VP -> VBD VP|<NP-PP> (9) NP -> DT NNS (10) NN -> 'floor' (11) IN -> 'on' (12) NNS -> 'kids' (13) S -> NP VP 

But some of the productions (numbers 7 and 8) do not seem to be CNF! What is the problem?

+5
source share
1 answer

VP|<NP-PP> is one intolerant character. A vertical panel does not mean many options in the traditional sense. Rather, NLTK puts it there to indicate where the rule came from, i.e. "This new nonterminal symbol was derived from a combination of VP and NP-PP." This is a new manufacturing rule created by NLTK to convert your grammar to Chomsky’s normal form.

Take a look at the tree views, pre-CNF:

 ROOT -> S S -> NP VP NP -> DT NNS DT -> 'the' NNS -> 'kids' VP -> VBD NP PP *** VBD -> 'opened' NP -> DT NN DT -> 'the' NN -> 'box' PP -> IN NP IN -> 'on' NP -> DT NN DT -> 'the' NN -> 'floor' 

In particular, look at the VP -> VBD NP PP rule, which is NOT in the CNF (there should be exactly two nonterminal characters for any production rule on the RHS)

Two rules (7): VP|<NP-PP> -> NP PP and (8): VP -> VBD VP|<NP-PP> in your question are functionally equivalent to the more general rule VP -> VBD NP PP .

When a VP discovered, the rule application results in:

VBD VP|<NP-PP>

And, VP|<NP-PP> is the LHS of the created production rule, which leads to:

VBD NP PP

In particular, if you isolate the rule itself, you can take a look at a specific character (which is really the only one):

 >>> tree.chomsky_normal_form() >>> prod = tree.productions() >>> x = prod[7] # VP|<NP-PP> -> NP PP >>> x.lhs().symbol() # Singular! u'VP|<NP-PP>' 
+4
source

Source: https://habr.com/ru/post/1206848/


All Articles