How to extract only characters from an image?

I have this type of image from which I only want to extract characters.

enter image description here

After binarization, I get this image

img = cv2.imread('the_image.jpg')
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
thresh = cv2.adaptiveThreshold(gray, 255, cv2.ADAPTIVE_THRESH_MEAN_C, cv2.THRESH_BINARY, 11, 9)

enter image description here

Then find the contours in this image.

(im2, cnts, _) = cv2.findContours(thresh.copy(), cv2.RETR_LIST, cv2.CHAIN_APPROX_SIMPLE)
cnts = sorted(cnts, key=cv2.contourArea, reverse=True)
for contour in cnts[:2000]:
    x, y, w, h = cv2.boundingRect(contour)
    aspect_ratio = h/w
    area = cv2.contourArea(contour)
    cv2.drawContours(img, [contour], -1, (0, 255, 0), 2) 

I get

enter image description here

I need a way to filter outlines so that it selects only characters. Therefore, I can find the bounding fields and extract the roi.

I can find the outlines and filter them depending on the size of the areas, but the resolution of the source images is incompatible. These images are taken from mobile cameras.

Also, when the borders of the boxes are disabled. I can’t pinpoint the fields.

Edit:

If I deselected boxes whose size is 0.4. Then it works to some extent. But I don’t know if this will work or not for different image resolution.

for contour in cnts[:2000]:
    x, y, w, h = cv2.boundingRect(contour)
    aspect_ratio = h/w
    area = cv2.contourArea(contour)

    if aspect_ratio < 0.4:
        continue
    print(aspect_ratio)
    cv2.drawContours(img, [contour], -1, (0, 255, 0), 2)

enter image description here

+4
source share

Source: https://habr.com/ru/post/1688273/


All Articles