Word segmentation using opencv

I am working on some scanned text images, and I need to highlight all the words in this image. I know that the problem is equivalent to finding subimages with extra spaces around them.

OCR cannot be used, and I just need to outline each word with a border. Can anyone suggest how this can be done using OpenCV.

I tried reading about thresholds and segmentation. I'm just looking for someone to point me to some relevant material.

+4
source share
1 answer

I think your image has multi-line text. In this case, you first need to detect these lines.

To do this, first align the image using the Otsu method or adaptive threshold.

Then you can use something called the Horizontal Bar Chart . This looks like a histogram, but shows where there are lines and where there are spaces. Therefore, divide the images into blank lines and you will get each line. Below is a horizontal bar chart.

Horizontal histogram

Now, for each line, find the horizontal histogram. Before doing this, try to do expansion and erosion so that all letters are grouped together. You can then find related components in each line to get each word. Then draw the borders.

The horizontal and vertical histograms are shown below:

horizontal and vertical histograms

This SOF can help: How to convert an image to character segments?

+17
source

Source: https://habr.com/ru/post/976469/


All Articles