Segmentation for connected characters

How can I segment if characters are related? I just tried using a watershed with remote conversion (http://opencv-code.com/tutorials/count-and-segment-overlapping-objects-with-watershed-and-distance-transform/ ) to find the number of components, but it seems that it does not work well.

  • It is required that the object be separated after the threshold in order to work well.

Having said that, how can I segment characters effectively? Need help / ideas.

slightly connected An example is an example of a binary image.

heavily connected An example of a highly connected one.

Ans:

@mmgp this is my o / p

BPo / p

+5
source share
3 answers

I believe that there are two approaches: 1) repeat the binarization step that led to these images that you have right now; 2) consider various options based on image size. Let's focus on the second approach asked by the question.

Only two digits are connected in your smallest image, and this only happens when considering 8-connections. If you process the image as 4-connected, then there is nothing to do, because there are no two connected components that must be separated. This is shown below. The correct image can be obtained simply by finding points that are connected with another, only when considering 8-connectedness. In this case, there are only two such points, and deleting them, we turn off the two digits "1".

enter image description hereenter image description here

In another image, this is no longer the case. And I don’t have a simple application method on it that can be applied on a smaller image without making it worse. But, in fact, we could consider scaling both images to some common size using the nearest neighbor interpolation so that we don't go from the binary representation. By resizing both images so that they are 200 and keep the aspect ratio, we can apply both morphological methods to them. First do thinning:

enter image description here

Now, as you can see, the morphological branch points are the ones that connect your numbers (there is another six, which will be processed). We can extract these branch points and apply a morphological closure with a vertical line of 2 * height + 1 (the height from your image), so no matter where the point is, closing it will lead to a full vertical line. Since your image is not so small, this line should not be 1 point-wide, in fact I considered a line 6 pixels wide. Since some branch points are horizontal, this closing operation will connect to them in the same vertical line. If the branch point is not close to another, then erosion will remove the vertical line. And by doing this, we remove the branch point associated with the number six on the left. After applying these steps, we will get the following image on the left. Subtracting the original image from it, we get the image on the right.

enter image description hereenter image description here

If we apply the same steps to the “8011” image, we will end up with exactly the same image that we started with. But this is still good, because, using a simple method that removes points that are only connected with 8-connectedness, we get the separated components, as before.

+6
source

Typically, “smearing algorithms” are used for this. Also known as Length Smoothing Algorithm (RLSA). This is a method that breaks black and white images into blocks. You can find it here or look on the Internet to find an implementation of the algorithm.

+2
source

Not sure if I want to help you solve the captcha, but one idea is to use erosion . Depending on how many pixels you have to work with it, it may be enough to separate the characters without destroying them. This is probably best used as a preprocessing step for another segmentation algorithm.

0
source

Source: https://habr.com/ru/post/976490/


All Articles