I am trying to break this text into sentences using the DocumentPreprocessor core-nlps method.
Below is the code I'm using.
List<String> splitSentencesList = new ArrayList<>();
Reader reader = new StringReader(inputText);
DocumentPreprocessor dp = new DocumentPreprocessor(reader);
for(List<HasWord> sentence :dp){
splitSentencesList.add(Sentence.listToString(sentence).toLowerCase().replace(" .", ""));}
This works for most cases. But how do we deal with conjunctions within a sentence?
eg:
I like coffee and donuts for my breakfast.
Ideally, this should be further processed as:
I like coffee for my breakfast.
I like donuts for my breakfast.
One option is to make a regular expression rule for further separation. Is there a built-in method to achieve this in core-nlp.
any pointers to this are appreciated.
source
share