Why does the Stanford CoreNLP NER annotator load 3 models by default?

When I add the ner annotator to the StanfordCoreNLP object pipeline, I see that it loads 3 models, which takes a lot of time:

Adding annotator ner
Loading classifier from edu/stanford/nlp/models/ner/english.all.3class.distsim.crf.ser.gz ... done [10.3 sec].
Loading classifier from edu/stanford/nlp/models/ner/english.muc.7class.distsim.crf.ser.gz ... done [10.1 sec].
Loading classifier from edu/stanford/nlp/models/ner/english.conll.4class.distsim.crf.ser.gz ... done [6.5 sec].
Initializing JollyDayHoliday for SUTime from classpath: edu/stanford/nlp/models/sutime/jollyday/Holidays_sutime.xml as sutime.binder.1.
Reading TokensRegex rules from edu/stanford/nlp/models/sutime/defs.sutime.txt
Reading TokensRegex rules from edu/stanford/nlp/models/sutime/english.sutime.txt
Reading TokensRegex rules from edu/stanford/nlp/models/sutime/english.holidays.sutime.txt

Is there a way to load a subset that will work the same? In particular, I'm not sure why it loads 3rd-class and 4th-class NER models when it has a 7-class model, and I wonder if these two files will still load.

+4
source share
1 answer

You can set which models load this way:

:

-ner.model model_path1,model_path2

Java Code:

 props.put("ner.model", "model_path1,model_path2");

Where model_path1 and model_path2 should be something like this:

"edu/stanford/nlp/models/ner/english.all.3class.distsim.crf.ser.gz"

. . , .. , 1 2 , .

"ner.combinationMode" "HIGH_RECALL", . "ner.combinationMode" "NORMAL", , .

. , 3- , 7-. - , .

+2

Source: https://habr.com/ru/post/1617336/


All Articles