How can I detect edges in a document image and cut sections into separate images?

The task is to take an image of a document and use straight lines surrounding different sections to divide the image into different documents for further analysis. The size of the various "sections" varies completely from page to page (we are dealing with several thousand pages). Here is an image of what one of these images looks like:

An example of how documents are laid out:

Doc example

/ . Scikit-image, "", "" . , , (Canny, Hough), "" , , . - , , .

? !

+4
1

, OpenCV, ImageMagick, . ImageMagick Linux MacOS Windows . OpenCV, , .

, 5x5 80%, , (, , , ).

convert news.jpg -depth 16 -statistic mean 5x5 -threshold 80% -negate z.png

enter image description here

" " ( 2000 ):

convert news.jpg -depth 16 -statistic mean 5x5 -threshold 80% -negate  \
   -define connected-components:verbose=true                           \
   -define connected-components:area-threshold=2000                    \
   -connected-components 4 -auto-level output.png

Objects (id: bounding-box centroid area mean-color):
  110: 1254x723+59+174 686.3,536.0 901824 srgb(0,0,0)
  2328: 935x723+59+910 526.0,1271.0 676005 srgb(0,0,0)
  0: 1370x1692+0+0 685.2,712.7 399651 srgb(0,0,0)
  2329: 303x722+1007+911 1158.0,1271.5 218766 srgb(0,0,0)
  25: 1262x40+54+121 685.2,140.5 49820 srgb(255,255,255)
  109: 1265x735+54+168 708.3,535.0 20601 srgb(255,255,255)
  1: 1274x64+48+48 675.9,54.5 16825 srgb(255,255,255)
  2326: 945x733+54+905 526.0,1271.0 16660 srgb(255,255,255)  
  2327: 312x732+1003+906 1169.9,1271.5 9606 srgb(255,255,255)  <--- THIS ONE
  421: 403x15+328+342 528.6,350.1 4816 srgb(255,255,255)
  7: 141x23+614+74 685.5,85.2 2831 srgb(255,255,255)

, ( ) ( ). , 11 , 11 . AxB+C+D A B C D .

, , 2327: 312x732+1003+906 :

convert news.jpg -fill "rgba(255,0,0,0.5)" -draw "rectangle 1003,906 1315,1638" oneArticle.png

enter image description here

:

convert news.jpg -crop 312x732+1003+906 article.jpg

enter image description here

, :

enter image description here

+1

Source: https://habr.com/ru/post/1671562/


All Articles