This task can be difficult to solve as a whole and can be easily solved for specific cases.
For example, if your OCR software inserts a bunch of non-ASCII characters, and all of your documents contain only letters from A to Z, lowercase letters az, numbers and punctuation, then your work is pretty simple.
To solve this problem, you can use for-loops on characters in the document and use if statements such as if(char.IsLetter(currentChar)) and if(char.IsDigit(currentChar)) , or use char.GetUnicodeCategory in the switch statement .
If there are certain words / letters, this always becomes wrong, you can make the Dictionary<string, bool> object and fill it with words that, as you know, OCR is always wrong, and / or words that you know, a person will not get it wrong . Then collapse all the words in your document and see if you have a match in the dictionary, proving that it is a person or OCR.
If you use OCR software that does not tend to be easily detected, you will have to resort to artificial intelligence to solve it. I hope you do not have to resort to this because it is really difficult to program and requires a lot of work to properly configure and maintain. From your description and your comments, it seems that you can use an easier solution.
Regardless of the fact that the software to perform this work should lead to the malfunctioning of some documents. The user can type something strange or copy / paste into some non-ASCII character (for example, the word résumé), or the OCR somehow does not detect any detectable errors. I hope you have a way to deal with this fact, or your situation is not risky enough that this is a problem.
source share