OpenNLP does not seem to support this feature. You will need to do as Olena Vicar proposes and implement it yourself, or use another NLP library in Java, such as Mallet.
The implementation in Java for removing stop words is as follows (it does not need to be sorted):
String testText = "This is a text you want to test"; String[] stopWords = new String[]{"a", "able", "about", "above", "according", "accordingly", "across", "actually", "after", "afterwards", "again", "against", "all"}; String stopWordsPattern = String.join("|", stopWords); Pattern pattern = Pattern.compile("\\b(?:" + stopWordsPattern + ")\\b\\s*", Pattern.CASE_INSENSITIVE); Matcher matcher = pattern.matcher(testText); testText = matcher.replaceAll("");
You can use this list of English stop words.
Alternatively, using Mallet, you will need to follow the tutorial here . The part for removing stop words is determined using the tube for this purpose:
pipeList.add(new TokenSequenceRemoveStopwords(false, false));
Mallet includes a list of stop words, so you do not need to define them, but it can be expanded if necessary.
Hope this helps.
source share