I want to split the text into sentences. Can anybody help me?
I also need to handle abbreviations. However, my plan is to replace them at an earlier stage. Mr. → Mr.
import re import unittest class Sentences: def __init__(self,text): self.sentences = tuple(re.split("[.!?]\s", text)) class TestSentences(unittest.TestCase): def testFullStop(self): self.assertEquals(Sentences("XX").sentences, ("X.","X.")) def testQuestion(self): self.assertEquals(Sentences("X? X?").sentences, ("X?","X?")) def testExclaimation(self): self.assertEquals(Sentences("X! X!").sentences, ("X!","X!")) def testMixed(self): self.assertEquals(Sentences("X! X? X! X.").sentences, ("X!", "X?", "X!", "X."))
Thanks Barry
EDIT: For starters, I would be happy to satisfy the four tests that I included above. This will help me better understand how regular expressions work. At the moment, I can define the sentence as X. etc., as defined in my tests.
source share