Corenlp is too slow for bad input

Corenlp analysis is too slow for poor input. It gives the following warnings and takes a long time to parse.

To enter: "The fourth son of Lincolns, Thomas" Thad "Lincoln, was born on April 4, 1853, and died of heart failure at the age of 18 on July 16, 1871.
He causes this error:


    Jul 24, 2015 4:03:42 PM edu.stanford.nlp.dcoref.RuleBasedCorefMentionFinder funkyFindLeafWithApproximateSpan
    WARNING: RuleBasedCorefMentionFinder: Failed to find head token:
    Tree is: (ROOT (S (NP (NP (NP (DT The) (NNS Lincolns) (POS ')) (JJ fourth) (NN son)) (, ,) (NP (NNP Thomas) () (NNP Tad) ('' '') (NNP Lincoln)) (, ,)) (VP (VP (VBD was) (VP (VBN born) (PP (IN on) (NP (NP (NNP April) (CD 4)) (, ,) (NP (CD 1853)) (, ,))))) (CC and) (VP (VBD died) (PP (IN of) (NP (NN heart) (NN failure))) (PP (IN at) (NP (NP (DT the) (NN age)) (PP (IN of) (NP (CD 18))))) (PP (IN on) (NP (NNP July) (CD 16))) (, ,) (NP (CD 1871)))) (. .)))
    token = |NP|0|, approx=0
    Jul 24, 2015 4:03:42 PM edu.stanford.nlp.dcoref.RuleBasedCorefMentionFinder funkyFindLeafWithApproximateSpan
    WARNING: RuleBasedCorefMentionFinder: Last resort: returning as head: 1871
    Jul 24, 2015 4:03:42 PM edu.stanford.nlp.dcoref.RuleBasedCorefMentionFinder findHead
    WARNING: Invalid index for head 34=34-0: originalSpan=[The Lincolns '], head=1871-35
    Jul 24, 2015 4:03:42 PM edu.stanford.nlp.dcoref.RuleBasedCorefMentionFinder findHead
    WARNING: Setting head string to entire mention
 


It took me 600.339 seconds to parse the cleared text of this document https://en.wikipedia.org/wiki/Abraham_Lincoln .
 Is there any way to speed this up? Is there any option in corenlp to automatically skip bad sentences? or is there a way to set a time limit for parsing a sentence, after which the parser will automatically skip the sentence?

+4
source share

Source: https://habr.com/ru/post/1599491/


All Articles