How to divide the text into sentences in the text; with dots, question marks, exclamation marks, etc. I try to get each sentence one by one, with the exception of quotation marks.
For example, split this:
Walked. Turned back. But why? And said "Hello world. Damn this string splitting things!" without a shame.
Like this:
Walked.
Turned back.
But why?
And said "Hello world. Damn this string splitting things!" without a shame.
I am using this code:
private List<String> FindSentencesWhichContainsWord(string text, string word)
{
string[] sentences = text.Split(new char[] { '.', '?', '!' }, StringSplitOptions.RemoveEmptyEntries);
string[] wordsToMatch = { word };
var sentenceQuery = from sentence in sentences
let w = sentence.Split(new char[] { '.', '?', '!', ' ', ';', ':', ',' },
StringSplitOptions.RemoveEmptyEntries)
where w.Distinct().Intersect(wordsToMatch).Count() == wordsToMatch.Count()
select sentence;
List<String> rtn = new List<string>();
foreach (string str in sentenceQuery)
{
rtn.Add(str);
}
return rtn;
}
But this gives the result below, which I do not like.
Walked.
Turned back.
But why?
And said "Hello world.
Damn this string splitting things!
" without a shame.
source
share