Deleting text during image processing

I am working on an application where I need a function, for example Cam Scanner, where a document should be detected in the image. For this, I use the Canny Edge definition, followed by the Hough Transform.

The results look promising, but the text in the document creates problems, as explained using the images below:

Original image Source image

After detecting canny edge After Canny Edge Detection

After conversion hough After Converting Hough

My problem is the third image, the text in the original magician at the base made the hough transform detect a horizontal line (second cluster from the bottom).

I know that I can take the largest quadrangle, and in most cases this will work fine, but still I want to know any other ways in which in this processing I can ignore the effect of the text around the edges.

Any help would be appreciated.

+5
source share
2 answers

I solved the text problem with a median filter of size 15 (square) in an image of size 500x700.

The media filter does not affect the borders of the paper, but can completely eliminate the text.

Using this, I was able to get much more efficient boundaries.

+4
source

Another approach you can try is to use a threshold value to search for paper borders. This would create a binary image. Then you can look at the droplets of white pixels and see if they are enough to be paper and have the correct sizes. If it meets the criteria, you can find the minimum / maximum points of this blob to represent the paper.

There are several ways to set a threshold value, including iterative, otsu, and adaptive.

In addition, for best results, you may need to expand the binary image to close the black lines in the table, as shown in your example.

+1
source

Source: https://habr.com/ru/post/1261917/


All Articles