Get some tags with the Stanford POS Tagger

I run POS tags with the Stanford POS Tagger . The tagger returns only one possible tag for the input sentence. For example, when an input clause cries is provided, the POS marker generates (erroneous) β€œThe_DT clown_NN weeps_NNS._.”.

However, my application will try to analyze the result and may reject the POS marking because there is no way to analyze it. Therefore, in this example, he would reject "The_DT clown_NN weeps_NNS._". but will accept "The_DT clown_NN weeps_VBZ._". which I assume is a low confidence tag for the parser.

Therefore, I would like the POS tag to provide several hypotheses for labeling each word annotated with some kind of confidence value. In this way, my application can select a POS tag with a high degree of certainty that achieves valid parsing for its purposes.

I did not find a way to ask Stanford POS Tagger to create several (n-best) tags for each word (or even for the whole sentence). Is there any way to do this? (Alternatively, I am also fine using another POS tagger with comparable performance that will support this.)

+6
source share
2 answers

I do not know a tagger that offers several POS interpretations for English phrases ( this is for Spanish ). Another option for you can be changed or I use your own example in Freeling. I got the expected result.

enter image description here

In addition, you can see that Freeling also shows you another possible POS interpretation for a particular word in its context.

Note: Perhaps if you used Freeling, you know that for machine readability you can use the xml output (below your results), and for automation you can integrate Freeling with python / java, but usually I prefer just call it via the command line.

0
source

We found that the default model for the POS taggin was not good enough. It turned out that using a different model is much better. We currently use wsj-0-18-bidirectional-distim, and performance is good enough for most tasks. I include it like this:

props.put("pos.model", "edu/stanford/nlp/models/pos-tagger/wsj-bidirectional/wsj-0-18-bidirectional-distsim.tagger"); props.put("annotators", "tokenize, ssplit, pos, ..."); pipeline = new StanfordCoreNLP(props); 
-1
source

Source: https://habr.com/ru/post/946005/


All Articles