Parsing text into sentences?

Question

Parsing text into sentences?

I am trying to parse the text of a PDF page in sentences, but it is much more complicated than I expected. There are many special cases to consider, such as initials, decimals, quotes, etc., which contain periods but do not necessarily end the sentence.

I was curious if anyone here was familiar with the NLP library for C or C ++, which could help me deal with this task or just offer some advice?

Thanks for any help.

+3

c ++ c parsing nlp

101010110101 Jun 09 '09 at 14:45

source share

4 answers

Avi · Answer 1 · 2009-06-09T15:28:48+0000

, . Wikipedia , , C.

. Unicode Unicode № 29 - .

Smerity · Answer 2 · 2009-06-09T16:01:20+0000

(SBD) . , , , C ( , )

, - Unix , Windows , . , SBD , SBD, Z. ,

./pdfconvert | SBD | my_C_tool > ...

, , , , .

, ,

MXTERMINATOR, SBD , . , , sed script. SBD . , FTP .
OpenNLP Java (JavaDoc) .
Sentrick . , .

, . OpenNLP , . , , . , , , .

, SBD, . , , . , X, X . , .

, - , .

anon · Answer 3 · 2009-06-09T15:35:31+0000

, , . , . , , , , , PDF , ?

Sharmila · Answer 4 · 2011-07-20T10:27:33+0000

I had the same requirements some time ago. I tried several solutions. The best ones were splitta ( http://code.google.com/p/splitta/ ). He coped well with all the extreme conditions that I threw at him. splitta python.

I also tried sentrick (java). http://www.denkselbst.de/sentrick/index.html

Unfortunately, I do not have a complete list of all the parameters that I have tried.

Parsing text into sentences?

More articles: