Latin basics of language segmentation grammar rules

I am working on one function, that is, apply language segmentation rules (grammatical) for the Latin language (in English).

I am currently in the phase of violating user input suggestions.

e.g.:

"I am working in language translation". "I have used Google MT API for this"

In the above example, I will break the sentence above with a complete stop (.) These are normal cases when I break the sentence into a dot, but there are n number of characters to break a sentence like (.!?, Etc.).

I have SRX rules for segmentation.

Here is my question: -

1) Is there a link? which I can use to resolve language segmentation rules.

2) Or are there any language segmentation forums? so that I can effectively discuss

Please let me know if anyone knows about this?

Many thanks.

+3
source share
2 answers

You probably want to take a look at the article by Reinar and Ratnaparha. The Maximum Entropy Approach to Defining Boundaries of Judgment (1997).

Abstract

. , , of.,? / . , lexica, , . , Romanalphabet . , , .

MxTerminator .

+1

Source: https://habr.com/ru/post/1745054/


All Articles