Stanford COreNLP does not currently have a language identifier. βalmostβ - because non-being is much more difficult to prove.
EDIT: Nonetheless, indirect evidence is given below:
- language identification is not mentioned either on the main page , the CoreNLP page , or the FAQ (although there is a question βHow to run CoreNLP in other languages?β), as well as in 2014 articles by CoreNLP authors;
- tools that integrate several NLP libraries including Stanford CoreNLP use another library for identification language, for example DKPro Core ASL ; also other users talking about language identification, and CoreNLP did not mention this feature.
- The source CoreNLP file contains
Language classes, but nothing has to do with language identification - you can manually check for all 84 occurrences of the word "language" here
Try TIKA or TextCat , or the Java Language Detection Library (they say "99% accuracy for 53 languages").
In general, the quality depends on the size of the input text: if it is long enough (say, at least a few words and is not specially chosen), then the accuracy can be pretty good - about 95%.
source share