Image cleaning to help tesseract on Android

I am trying to extract numbers from sudoku board. after detecting the board, its angles and transformation, I was left with a fairly lined image of the board only. Now I am trying to recognize numbers using Tesseract android, Tess-Two. I divided the image into 9 parts by

currentCell = undistortedThreshed.submat(rect); 

where rect is the rectangle surrounding the image.

Now to the recognition of numbers.

Some numbers, such as 4, are well recognized. Some, mostly 6,7,8, are recognized as 0 or nothing.

I want to help tesseract as much as possible by clearing the currentCell image. at the moment it looks like this Inverted 6 . (also tried without an inverted threshold). I want to get rid of the white lines (sudoku lines). I tried something like this (taken from here )

 Imgproc.Canny(currentCell, currentCell, 80, 90); Mat lines = new Mat(); int threshold = 50; int minLineSize = 5; int lineGap = 20; Imgproc.HoughLinesP(currentCell, lines, 1, Math.PI / 180, threshold, minLineSize, lineGap); for (int x = 0; x < lines.cols() && x < 1; x++) { double[] vec = lines.get(0, x); double x1 = vec[0], y1 = vec[1], x2 = vec[2], y2 = vec[3]; Point start = new Point(x1, y1); Point end = new Point(x2, y2); Core.line(currentCell, start, end, new Scalar(255), 10); } 

but he draws nothing, I tried to spoil the width and color of the line, but still nothing. I tried to draw a line on a large image, nothing works on an uncrop image.

Any suggestions?

EDIT

For some reason, it cannot find any lines. Here's what this image looks like after applying canny to it 6 after canny but HoughLines does not detect any rows. I tried both HoughLines and HoughLinesP with different values, as shown in the OpenCV documentation, but nothing works ... These are pretty obvious lines. What am I doing wrong? Thanks!

+4
source share
2 answers

In the end, I did something else.

I used findContours to get the largest outline, which is a digit.

Got my bounding box using boundingRect .

Extract it with submat and voilla. I received only a figure.

Unfortunately, this does not matter. Tesseract still cannot correctly recognize the numbers. Sometimes this does not give a result, sometimes, after expanding the numbers, it recognizes 6 as 0. But this is a question for another question.

+2
source

This is an idea right from my head:

Save the code that calculates the Hough lines in the image. This means that you can get rows matching the grid.

Now just draw these lines on the original image, but set the color to BLACK.

Most white lines will now be covered with newly drawn black lines. Since the positions of the Hough line do not exactly match the actual lines, several small white dots may remain. Eliminating them with connected components (and discarding components that are too small) or even some morphological operations — taking care to keep the actual number unchanged — could deal with these shortcomings.

Give it a try and let me know. Hope this helps you.

0
source

Source: https://habr.com/ru/post/1448104/


All Articles