How to use OpenCV + Tesseract for accurate text recognition in Android?

I try to use OpenCV (Android) to process an image taken with a camera, and then pass it to Tesseract to recognize text (numbers), but I do not get good results until the images are (almost without noise) good. Currently, I am processing the received images as follows: 1. Applying Gaussian blur. 2. Adaptive threshold: for binarization of the image. 3. Invert colors to make a black background. Then pass the processed image to Tesseract.

But I do not get good results.

Please imagine what steps / measures I can take to process the image before moving on to Tesseract or during the processing step in Tesseract.

Also, are there any other better libraries in Android for this?

+4
source share
1 answer

You can isolate / detect characters in images. This can be done using powerful algorithms such as Transform Width Transform .

The following steps worked well with me:

  • Get grayscale images.
  • Perform canny edge detection on the grayscale image.
  • Apply Gaussian blur on grayscale image (save in a separate matrix)
  • Input matrices from steps 2 and 3 in the SWT algorithm
  • The resulting image is Binarize (threshhold).
  • Upload the image to tesseract.

: 4 ++ , Android JNI . , - , . , , .

+10

Source: https://habr.com/ru/post/1538560/


All Articles