Matching words with the NLTK chunk parser

Question

Matching words with the NLTK chunk parser

NLTK parser regular expressions can match POS tags, but can they also match specific words?
So, suppose I want to cut out any structure with a noun, followed by the verb “left” (let's call this pattern L). For example, the sentence "the \ DT dog \ NN left \ VB" should be broken down as
(S (DT) (L (NN dog) (VB left)))), but the sentence "dog \ DT dog \ NN sleeppt \ VB" will not be broken at all.

I was unable to find the regex chunking syntax documentation, and all the examples I saw correspond only to POS tags.

+6

python nltk

CromTheDestroyer Nov 20 '11 at 21:40

source share

1 answer

Spaceghost · Answer 1 · 2012-03-21T01:44:27+0000

I had a similar problem, and realizing that the regular expression template will only check tags, I changed the tag to the part that interests you.

For example, I tried to match the name and version of the product and use a chunk rule, such as \ NNP + \ CD, that works for Internet Explorer 8.0, but crashes in Internet Explorer 8.0 SP2, where it is marked SP2 as NNP.

Perhaps I could train the POS token, but decided instead to just change the tag to SP, and then the chunk rule, for example \ NNP + \ CD \ SP *, would follow any example.

Matching words with the NLTK chunk parser

More articles: