How to iterate over all nodes of a tree?

I want to simplify my parsing tree trees, i.e. given node, I will get rid of the first hyphen and everything that comes after this hyphen. For example, if node is NP-TMP-FG, I want to make it NP, and if it is SBAR-SBJ, I want to make it SBAR, and so on. This is an example of a single parse tree that I have

( (S (S-TPC-2 (NP-SBJ (NP (DT The) (NN asbestos) (NN fiber) ) (, ,) (NP (NN crocidolite) ) (, ,) ) (VP (VBZ is) (ADJP-PRD (RB unusually) (JJ resilient) ) (SBAR-TMP (IN once) (S (NP-SBJ (PRP it) ) (VP (VBZ enters) (NP (DT the) (NNS lungs) )))) (, ,) (PP (IN with)(S-NOM (NP-SBJ (NP (RB even) (JJ brief) (NNS exposures) ) (PP (TO to) (NP (PRP it) ))) (VP (VBG causing) (NP (NP (NNS symptoms) ) (SBAR (WHNP-1 (WDT that) ) (S (NP-SBJ (-NONE- *T*-1) ) (VP (VBP show) (PRT (RP up) ) (ADVP-TMP (NP (NNS decades) ) (JJ later) )))))))))) (, ,) (NP-SBJ (NNS researchers) ) (VP (VBD said)(SBAR (-NONE- 0) (S (-NONE- *T*-2) ))) (. .) )) 

This is my code, but it does not work.

 import re import nltk from nltk.tree import * tree = Tree.fromstring(line) // Each parse tree is stored in one single line for subtree in tree.subtrees(): re.sub('-.*', '', subtree.label()) print tree 

Edit:

I assume that the problem is that subtree.label () shows the nodes, but cannot be changed, as this is a function. The output of print subtree.label ():

 S S-TPC-2 NP-SBJ NP DT NN , 

etc.

0
source share
2 answers

I came up with this:

 for subtree in tree.subtrees(): s = subtree.label() subtree.set_label(re.sub('-.*', "", s)) 
+1
source

You can do something like this:

 for subtree in tree.subtrees(): first = subtree.label().split('-')[0] subtree.set_label(first) 
+3
source

Source: https://habr.com/ru/post/1233756/


All Articles