Remove image border using ImageMagick

I use the ImageMagick service to preprocess the receipt image before using the tesseract-OCR mechanism for extracting texts. I need to delete the background of receipts. I went through a disguise to remove the border here. But I can not create a mask for receipts.

However, I tried to remove the shadows from the receipt images.

The initial image (an example of obtaining)

enter image description here

convert input.png -colorspace gray \ \( +clone -blur 0x2 \) +swap -compose divide -composite \ -linear-stretch 5%x0% photocopy.png 

After applying the code:

enter image description here

I tried the code below to make all the colors except white to black, but this does not seem to completely obscure the background of photocopy.png.

 convert receipt.jpg -fill black -fuzz 20% +opaque "#ffffff" black_border.jpg 

enter image description here

Is there a way to remove the border of the receipt image? Or create any masks from the image? Note. I need to remove noise and border for multiple images with different backgrounds.

+5
source share
2 answers

To answer your question

"Is there a way to remove the border of the receipt image or create any masks from the image?"

The following command (based on your own code) will create an image that you can use to obtain measurements of the applicable mask:

 convert \ origscan.jpg \ -colorspace gray \ \( +clone 0 -blur 0x2 \) \ +swap \ -compose divide \ -composite \ -linear-stretch 5%x0% \ -threshold 5% \ -trim \ mask-image.png 

You can use this image mask to create a monochrome (black) mask - in one command:

 convert \ origscan.jpg \ -colorspace gray \ \( +clone 0 -blur 0x2 \) \ +swap \ -compose divide \ -composite \ -linear-stretch 5%x0% \ -threshold 5% \ \( \ -clone 0 \ -fill '#000000' \ -colorize 100 \ \) \ -delete 0 \ black-mask.png 

Below are the two commands above:

m43Iv.pngPxxNu.png

You can use identify to get the geometry mask-image.png , as well as black-mask.png :

 identify -format "%g\n" *mask*.png 2322x4128+366+144 2322x4128+366+144 

Thus, the image canvases have a width of 2322 pixels and a height of 4128 pixels. The visible parts of both images are, of course, smaller, following our -trim operation. (Part +366+144 indicates the horizontal / vertical offset in the upper left corner of the original image.)


Additional comment: Having said all this: you should really look at creating the best photos from your receipts! (If you have a camera that can create 4,128-pixel images, this should not be a problem. If you have as many receipts for processing as you say, then it might be a good idea to purchase a small platinum glass that you can place on the top of the paper so that it is straightened when photographing ...)

+1
source

If you use ImageMagick on a unix-like system, you can try my text cleaner script.

 textcleaner -f 20 -o 10 -e normalize UhSV6.jpg result.jpg 

enter image description here

0
source

Source: https://habr.com/ru/post/1210469/


All Articles