NLTK: How do I cross a phrase to return a list of strings?

Question

NLTK: How do I cross a phrase to return a list of strings?

In NLTK, how do I go through a sentence to return a list of lines of phrase nouns?

I have two goals:
(1) Create a list of name phrases instead of printing them using the "traverse ()" method. I am currently using StringIO to write the output of an existing traverse () method. This is not an acceptable solution.
(2) Expand the line for Noun as follows: '(NP Michael / NNP Jackson / NNP)' becomes 'Michael Jackson'. Is there a method in NLTK for de-analysis?

The NLTK documentation recommends using traverse () to look up Noun Phrase, but how can I take 't' in this recursive method to generate a list of Noun Phrases strings?

from nltk.tag import pos_tag def traverse(t): try: t.label() except AttributeError: return else: if t.label() == 'NP': print(t) # or do something else else: for child in t: traverse(child) def nounPhrase(tagged_sent): # Tag sentence for part of speech tagged_sent = pos_tag(sentence.split()) # List of tuples with [(Word, PartOfSpeech)] # Define several tag patterns grammar = r""" NP: {<DT|PP\$>?<JJ>*<NN>} # chunk determiner/possessive, adjectives and noun {<NNP>+} # chunk sequences of proper nouns {<NN>+} # chunk consecutive nouns """ cp = nltk.RegexpParser(grammar) # Define Parser SentenceTree = cp.parse(tagged_sent) NounPhrases = traverse(SentenceTree) # collect Noun Phrase return(NounPhrases) sentence = "Michael Jackson likes to eat at McDonalds" tagged_sent = pos_tag(sentence.split()) NP = nounPhrase(tagged_sent) print(NP)

Currently prints:
(NP Michael / NNP Jackson / NNP)
(NP McDonalds / NNP)
and saves "None" to NP

+5

python parsing recursion traversal nltk

MyopicVisage Nov 19 '15 at 10:18

source share

1 answer

alvas · Accepted Answer · 2015-11-19T23:26:04+0000

 def extract_np(psent): for subtree in psent.subtrees(): if subtree.label() == 'NP': yield ' '.join(word for word, tag in subtree.leaves()) cp = nltk.RegexpParser(grammar) parsed_sent = cp.parse(tagged_sent) for npstr in extract_np(parsed_sent): print (npstr)

NLTK: How do I cross a phrase to return a list of strings?

More articles: