Here is my OCR code for recognizing numbers through the Tesseract engine:
Tesseract* tesseract = [[Tesseract alloc] initWithDataPath:@"tessdata" language:@"eng"]; //set the tesseract variables [tesseract setVariableValue:@"0123456789" forKey:@"tessedit_char_whitelist"]; NSString * temp = @"7"; [tesseract setVariableValue:temp forKey:@"tessedit_pageseg_mode"]; [tesseract setImage:argImage]; [tesseract recognize]; m_convertedText = [[tesseract recognizedText] copy];
Using the above, I get some images that are recognized correctly. However, sometimes I get 5 instead of 8, 6 instead of 5 and so on. My input images are pretty perfect - pure black and white after binarization.
Are there any other Tesseract options that I am missing to specify? I see that there are over 600 options and very rare documentation.
The best I could find was this site , which lists all the options, but not yet very clear for OCR beginners.
If someone has achieved 100 percent accuracy using OCR digits with tesseract, this will be really helpful.
source share