How to get dependency syntax analysis exactly like an online demo?

How can I programmatically get the same dependency analysis using stanford corenlp as shown in the online demo?

I am using the corenlp package to get dependency analysis for the following sentence.

A second Texas health worker praises Ebola, authorities say.

I am trying to get parsing programmatically using the code below

Properties props = new Properties(); props.put("annotators", "tokenize, ssplit, pos, lemma, ner, parse, dcoref"); StanfordCoreNLP pipeline = new StanfordCoreNLP(props); String text = "Second healthcare worker in Texas tests positive for Ebola , authorities say ."; // Add your text here! Annotation document = new Annotation(text); pipeline.annotate(document); String[] myStringArray = {"SentencesAnnotation"}; List<CoreMap> sentences = document.get(SentencesAnnotation.class); for(CoreMap sentence: sentences) { SemanticGraph dependencies = sentence.get(BasicDependenciesAnnotation.class); IndexedWord root = dependencies.getFirstRoot(); System.out.printf("root(ROOT-0, %s-%d)%n", root.word(), root.index()); for (SemanticGraphEdge e : dependencies.edgeIterable()) { System.out.printf ("%s(%s-%d, %s-%d)%n", e.getRelation().toString(), e.getGovernor().word(), e.getGovernor().index(), e.getDependent().word(), e.getDependent().index()); } } } 

I get the following output using the stanford corenlp 3.5.0 package.

 root(ROOT-0, worker-3) amod(worker-3, Second-1) nn(worker-3, healthcare-2) prep(worker-3, in-4) amod(worker-3, positive-7) dep(worker-3, say-12) pobj(in-4, tests-6) nn(tests-6, Texas-5) prep(positive-7, for-8) pobj(for-8, ebola-9) nsubj(say-12, authorities-11) 

But the online demo gives a different answer that marks as root and has other relationships like ccomp between words in parsing.

 amod(worker-3, Second-1) nn(worker-3, healthcare-2) nsubj(tests-6, worker-3) prep(worker-3, in-4) pobj(in-4, Texas-5) ccomp(say-12, tests-6) acomp(tests-6, positive-7) prep(positive-7, for-8) pobj(for-8, Ebola-9) nsubj(say-12, authorities-11) root(ROOT-0, say-12) 

How can I resolve my conclusion so that it matches the online demo?

+5
source share
1 answer

The reason for the split output is that if you use the parser demo, you use the standalone parser distribution and use your entire CoreNLP distribution code. Although they both use the same parser and the same models, the standard configuration of CoreNLP runs the Part of Speech (POS) tag before starting the parser, and the parser includes POS information, which can lead to different results in some cases.

To get the same results, you can disable the POS tagger by changing the annotator list:

 props.put("annotators", "tokenize, ssplit, parse, lemma, ner, dcoref"); 

Note, however, that lemma, ner, and dcoref annotators require POS tags, so you need to reorder the annotators.

There is also a CoreNLP demo that should always produce the same result as your code.

+9
source

Source: https://habr.com/ru/post/1210752/


All Articles