Matching words with the NLTK chunk parser

NLTK parser regular expressions can match POS tags, but can they also match specific words?
So, suppose I want to cut out any structure with a noun, followed by the verb β€œleft” (let's call this pattern L). For example, the sentence "the \ DT dog \ NN left \ VB" should be broken down as
(S (DT) (L (NN dog) (VB left)))), but the sentence "dog \ DT dog \ NN sleeppt \ VB" will not be broken at all.

I was unable to find the regex chunking syntax documentation, and all the examples I saw correspond only to POS tags.

+6
source share
1 answer

I had a similar problem, and realizing that the regular expression template will only check tags, I changed the tag to the part that interests you.

For example, I tried to match the name and version of the product and use a chunk rule, such as \ NNP + \ CD, that works for Internet Explorer 8.0, but crashes in Internet Explorer 8.0 SP2, where it is marked SP2 as NNP.

Perhaps I could train the POS token, but decided instead to just change the tag to SP, and then the chunk rule, for example \ NNP + \ CD \ SP *, would follow any example.

+1
source

Source: https://habr.com/ru/post/901960/


All Articles