Corenlp analysis is too slow for poor input. It gives the following warnings and takes a long time to parse.
To enter: "The fourth son of Lincolns, Thomas" Thad "Lincoln, was born on April 4, 1853, and died of heart failure at the age of 18 on July 16, 1871.
He causes this error:
Jul 24, 2015 4:03:42 PM edu.stanford.nlp.dcoref.RuleBasedCorefMentionFinder funkyFindLeafWithApproximateSpan
WARNING: RuleBasedCorefMentionFinder: Failed to find head token:
Tree is: (ROOT (S (NP (NP (NP (DT The) (NNS Lincolns) (POS ')) (JJ fourth) (NN son)) (, ,) (NP (NNP Thomas) () (NNP Tad) ('' '') (NNP Lincoln)) (, ,)) (VP (VP (VBD was) (VP (VBN born) (PP (IN on) (NP (NP (NNP April) (CD 4)) (, ,) (NP (CD 1853)) (, ,))))) (CC and) (VP (VBD died) (PP (IN of) (NP (NN heart) (NN failure))) (PP (IN at) (NP (NP (DT the) (NN age)) (PP (IN of) (NP (CD 18))))) (PP (IN on) (NP (NNP July) (CD 16))) (, ,) (NP (CD 1871)))) (. .)))
token = |NP|0|, approx=0
Jul 24, 2015 4:03:42 PM edu.stanford.nlp.dcoref.RuleBasedCorefMentionFinder funkyFindLeafWithApproximateSpan
WARNING: RuleBasedCorefMentionFinder: Last resort: returning as head: 1871
Jul 24, 2015 4:03:42 PM edu.stanford.nlp.dcoref.RuleBasedCorefMentionFinder findHead
WARNING: Invalid index for head 34=34-0: originalSpan=[The Lincolns '], head=1871-35
Jul 24, 2015 4:03:42 PM edu.stanford.nlp.dcoref.RuleBasedCorefMentionFinder findHead
WARNING: Setting head string to entire mention
It took me 600.339 seconds to parse the cleared text of this document https://en.wikipedia.org/wiki/Abraham_Lincoln .
Is there any way to speed this up? Is there any option in corenlp to automatically skip bad sentences? or is there a way to set a time limit for parsing a sentence, after which the parser will automatically skip the sentence?
source
share