Mixing words and PoS tags in NLTK parser grammars

I have been playing with NLTK for some time now, and am here to define a special parser grammar for special chunking. I follow the description at http://nltk.googlecode.com/svn/trunk/doc/book/ch07.html , but what I'm interested in is slightly different from what is described in this chapter. For example, in Example 7.10, the following for the verb phase is used instead: VP: {<VB. > <NP | PP | CLAUSE> + $} I would just like to match sentences that use one particular verb, not some verb. Something like: VP: {go <NP | PP | CLAUSE> + $}

In other words, I would like to match the actual word, not the PoS tag for the word, and mix and match the actual words and PoS tags in the regular expression.

Is it possible?

+4
source share
1 answer

Not with standard PoS tags released by nltk pos-tagger.

If you need to make grammar for different verbs, a pre-processing of tags and adding a token to the tag for all verbs can be a useful hack. Therefore, you can use a regular expression string that looks like VP: {+ $}

+1
source

Source: https://habr.com/ru/post/1438110/


All Articles