I have a list of well-analyzed documents with several paragraphs (all paragraphs are separated by \ n \ n and sentences separated by "."), Which I would like to divide into sentences along with a number indicating the paragraph number inside the document. For example, an input (two paragraphs):
First sentence of the 1st paragraph. Second sentence of the 1st paragraph. \n\n First sentence of the 2nd paragraph. Second sentence of the 2nd paragraph. \n\n
Ideally, the conclusion should be:
1 First sentence of the 1st paragraph. 1 Second sentence of the 1st paragraph. 2 First sentence of the 2nd paragraph. 2 Second sentence of the 2nd paragraph.
I am familiar with the Lingua :: Sentences package in Perl, which can split documents into sentences. However, it is incompatible with paragraph numbering. Therefore, I wonder if there is an alternative way to achieve the above (there are no abbreviations in the documents). Any help is appreciated. Thanks!
source share