NLTK extracting chunker parsing tree terms

Question

NLTK extracting chunker parsing tree terms

John Edward Gray started running now that he knows he's fat
She was listening when sent by this terrible singer

I want to extract interesting suggestions from an offer. I am currently using POS tags to determine the grammatical types of each object. Then I update each token to a counter (with different weights for nouns, verbs and adjectives).

Now I want to use chunker for this. I think the leaf nodes of the parsing tree contain all the interesting words and phrases . How to extract terms from chunker output?

+4

python nlp nltk

aitchnyu Sep 06 '12 at 7:45

source share

1 answer

alvas · Answer 1 · 2013-02-03T08:54:06+0000

In linguistics, “interesting words” are called open class words . And the task you are talking about is not really a chunking / parsing task. You are looking for a kind of tagger / annotator / labeller to tag each word to see if it is "interesting" or not.

Sequence marking

If you approach your task as a task of marking a sequence, then the sentence John Edward Grey started running now that he knows he is fat will be marked as such:

 [('John','B'),('Edward','I'),('Grey','I'),('started','O'),('running','B'), ('now','O'),('that','O'),('he','O'),('knows','O'),('he','O'), ('is','O'),('fat','B')]

So, everything that is marked with B signifies the beginning of your “interesting” fragment and
the next word labeled O will be the end of the “interesting” fragment or
it can also end with a subsequent B to mark the end of the previous “interesting” fragment and the beginning of a new “interesting” fragment.

What is interesting or not?

Actually, what is interesting or not depends on your ultimate goal of the task, for me I would say that started running is an “interesting” piece because it started changing the value of the infinitive or running to give it a begin action modality.

Closed class vs Class open words

If you mean what are interesting words, then I suggest you create a dictionary of this, and then run a sequence with the script to find those that are not in the dictionary of related words of the class.

Computer Learning Approach

Another approach is to fulfill the task of classifying machine learning when you have previously annotated sample data of what is interesting and what is not. Then you define some classification functions and perform the classification to automatically tag the data with tags B , I , O

NLTK extracting chunker parsing tree terms

More articles: