Why is the Stanford parser with nltk parsing a sentence incorrectly?

Question

Why is the Stanford parser with nltk parsing a sentence incorrectly?

I use the Stanford parser with nltk in python and got help from Stanford Parser and NLTK to set up the Stanford nlp libraries.

from nltk.parse.stanford import StanfordParser from nltk.parse.stanford import StanfordDependencyParser parser = StanfordParser(model_path="edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz") dep_parser = StanfordDependencyParser(model_path="edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz") one = ("John sees Bill") parsed_Sentence = parser.raw_parse(one) # GUI for line in parsed_Sentence: print line line.draw() parsed_Sentence = [parse.tree() for parse in dep_parser.raw_parse(one)] print parsed_Sentence # GUI for line in parsed_Sentence: print line line.draw()

I get the wrong parsing and dependency trees, as shown in the example below, it considers “sees” as a noun instead of a verb.

What should I do? It works great when I change the sentence, for example (one = "John sees Bill"). The correct conclusion for this sentence can be seen here. The correct conclusion of the parse tree.

An example of the correct output is also shown below:

+5

python parsing nlp nltk stanford-nlp

Noman dilawar Jan 23 '16 at 20:52

source share

1 answer

alvas · Accepted Answer · 2016-01-24T03:18:03+0000

Again, the model is not perfect (see Python NLTK pos_tag does not return the correct part tag of speech ); P

You can try a "more accurate" parser using NeuralDependencyParser .

First, configure the parser correctly with the correct environment variables (see Stanford Parser and NLTK and https://gist.github.com/alvations/e1df0ba227e542955a8a ), then:

 >>> from nltk.internals import find_jars_within_path >>> from nltk.parse.stanford import StanfordNeuralDependencyParser >>> parser = StanfordNeuralDependencyParser(model_path="edu/stanford/nlp/models/parser/nndep/english_UD.gz") >>> stanford_dir = parser._classpath[0].rpartition('/')[0] >>> slf4j_jar = stanford_dir + '/slf4j-api.jar' >>> parser._classpath = list(parser._classpath) + [slf4j_jar] >>> parser.java_options = '-mx5000m' >>> sent = "John sees Bill" >>> [parse.tree() for parse in parser.raw_parse(sent)] [Tree('sees', ['John', 'Bill'])]

Note that NeuralDependencyParser generates dependency trees:

Why is the Stanford parser with nltk parsing a sentence incorrectly?

More articles: