Separating Background / Foreground Layers in a Scanned Document

Question

Separating Background / Foreground Layers in a Scanned Document

I need to automatically remove the softly colored background of the scanned document image for OCR.

ScanTailor is a C ++-based open source application that makes background separation by the way, but I cannot figure out how to work only the last step, which actually removes the background.

Ideally, I could find code that does this, and either:

Put this part in C #
Modify C ++ to respond to command line execution by performing only this step in this image

Can you help me understand how I can do this?
or do you know other libraries that can do this? (any language / platform acceptable)

+1

c ++ c # image-processing background bitmap

Robin rodricks Dec 01 '10 at 17:33

source share

2 answers

Maybe the algorithm is approximately:

Determine what background color
Scan a bitmap for pixels whose color (and / or is pretty similar) to the background color
Convert these pixels to white or transparent.
It is possible (especially if the page contains images, not just text) ignore isolated pixels, which are the background color but are not next to other background pixels.

If this is a low resolution image (for example, a black and white image with high resolution), you need to apply this algorithm to groups of pixels .

+1

Chrisw Dec 01 '10 at 17:40

source share

Andrew Cash · Accepted Answer · 2010-12-02T01:31:37+0000

You mean the Thresholding, Despeckling, and Noise Removal methods that are needed in OCR applications.

The quality of the results depends on many factors -

Original print quality Scan quality Image resolution Used background colors and patterns. Noise and other marks.

You can find the IEvolution.NET library at http://www.hi-components.com/nievolution.asp useful. It has many image processing functions.

There are many commercial engines. There is no ideal function to solve image processing problems. You must adapt the functions and parameter to suit your images. http://www.recogniform.com/thresholding.htm

A search on Google will show many results.

Separating Background / Foreground Layers in a Scanned Document

More articles: