Opencv - crop handwritten lines (line segmentation)

Question

Opencv - crop handwritten lines (line segmentation)

I am trying to create a handwriting recognition system using python and opencv. Character recognition is not a problem, but segmentation. I have successfully:

segmenting words into individual characters
segments one sentence into words in the required order.

But I could not segment different lines in the document. I tried sorting outlines (to avoid line segmentation and using only word segmentation), but that didn't work. I used the following code to segment the words contained in a handwritten document, but it returns the words out of order (it returns the words in order from left to right):

import cv2 import numpy as np #import image image = cv2.imread('input.jpg') #cv2.imshow('orig',image) #cv2.waitKey(0) #grayscale gray = cv2.cvtColor(image,cv2.COLOR_BGR2GRAY) cv2.imshow('gray',gray) cv2.waitKey(0) #binary ret,thresh = cv2.threshold(gray,127,255,cv2.THRESH_BINARY_INV) cv2.imshow('second',thresh) cv2.waitKey(0) #dilation kernel = np.ones((5,5), np.uint8) img_dilation = cv2.dilate(thresh, kernel, iterations=1) cv2.imshow('dilated',img_dilation) cv2.waitKey(0) #find contours im2,ctrs, hier = cv2.findContours(img_dilation.copy(), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE) #sort contours sorted_ctrs = sorted(ctrs, key=lambda ctr: cv2.boundingRect(ctr)[0]) for i, ctr in enumerate(sorted_ctrs): # Get bounding box x, y, w, h = cv2.boundingRect(ctr) # Getting ROI roi = image[y:y+h, x:x+w] # show ROI cv2.imshow('segment no:'+str(i),roi) cv2.rectangle(image,(x,y),( x + w, y + h ),(90,0,255),2) cv2.waitKey(0) cv2.imshow('marked areas',image) cv2.waitKey(0)

Please note that I can segment all the words here , but they are displayed in order. Is there a way to sort these outlines in order from top to bottom

OR

segment the image into separate lines so that each line can be segmented into words using the above code?

+9

python opencv text-segmentation handwriting-recognition

Sidharth ramesh 18 sept. '17 at 15:09

source share

2 answers

I tried to use your code, but met with several errors. Can you share your final code?

0

Sandeep k Jun 18 '19 at 15:42

source share

Sidharth ramesh · Accepted Answer · 2017-09-19T13:45:30+0000

I got the required segmentation by making changes to the above code in line:

 kernel = np.ones((5,5), np.uint8)

I changed this to:

 kernel = np.ones((5,100), np.uint8)

Now I get the output as follows . It also works with handwritten text images with lines that are not perfectly horizontal:

EDIT: To get individual characters from a word, follow these steps:

Resize the outline containing the word using the code as follows.

 im = cv2.resize(image,None,fx=4, fy=4, interpolation = cv2.INTER_CUBIC)

Apply the same contour detection process as with line segmentation, but with a core size of (5.5), i.e.
```
 kernel = np.ones((5,5), np.uint8) img_dilation = cv2.dilate(im_th, kernel, iterations=1) 
```

Opencv - crop handwritten lines (line segmentation)

More articles: