Tesseract Number Recognition: what are the most common OCR options

Here is my OCR code for recognizing numbers through the Tesseract engine:

Tesseract* tesseract = [[Tesseract alloc] initWithDataPath:@"tessdata" language:@"eng"]; //set the tesseract variables [tesseract setVariableValue:@"0123456789" forKey:@"tessedit_char_whitelist"]; NSString * temp = @"7"; [tesseract setVariableValue:temp forKey:@"tessedit_pageseg_mode"]; [tesseract setImage:argImage]; [tesseract recognize]; m_convertedText = [[tesseract recognizedText] copy]; 

Using the above, I get some images that are recognized correctly. However, sometimes I get 5 instead of 8, 6 instead of 5 and so on. My input images are pretty perfect - pure black and white after binarization.

Are there any other Tesseract options that I am missing to specify? I see that there are over 600 options and very rare documentation.

The best I could find was this site , which lists all the options, but not yet very clear for OCR beginners.

If someone has achieved 100 percent accuracy using OCR digits with tesseract, this will be really helpful.

+4
source share

Source: https://habr.com/ru/post/1501616/


All Articles