I want to build a model for modeling languages that should predict the next words in a sentence, given the previous word (s) and / or previous sentence.
Use case: I want to automate the recording of reports. Therefore, the model should automatically complete the sentence that I am writing. Therefore, it is important that nouns and words at the beginning of a sentence be capitalized.
Data . The data is in German and contains a lot of technical jargon.
My text body is in German , and I'm currently working on preprocessing. Since my model should predict grammatically correct sentences, I decided to use / not use the following preprocessing steps:
However, I'm not sure if the case needs to be converted to lowercase. When searching the Internet, I found different opinions. Although the lower shell is fairly common, it will make my model erroneously predict the capitalization of nouns, the beginning of sentences, etc.
.
? ? , ? ?
!