Extract characters from image


I'm trying to extract (not recognize!) Characters from a black and white image, so if the image is 123, I get an array of 3 images,

its a duplicate question, I know, but I could not find what I want, I also tried looking at the code, but could not find a working example

http://www.codeproject.com/Articles/143059/Neural-Network-for-Recognition-of-Handwritten-Digi
incomplete code


Your help is much appreciated :)

+6
source share
5 answers

As Kenny already mentioned, “related component labeling” describes a family of algorithms that identify related pixels. Connected components are also called “connected areas” or “blocks”, as well as the associated concept of “circuits”. Any such algorithm should be able to find not only the shape of the connected foreground pixels, but also the presence of “holes” inside the figure, consisting of pixels of the background color.

http://en.wikipedia.org/wiki/Connected-component_labeling

This algorithm is used for several engineering fields that are based on image processing, including computer vision, machine vision, and medical imaging. If you are going to spend some time processing images, you should become very comfortable with this algorithm and implement it at least once yourself.

The OpenCV library has a findContours () function that can be used to search for outlines, outlines within outlines, etc.
http://opencv.willowgarage.com/wiki/

If you want to see the algorithm for marking areas at work, find links to "cell counting" using the ImageJ application. Biological cell counting is an important and often cited application of region labeling for medical imaging.

http://rsbweb.nih.gov/ij/

Consider getting a textbook on this subject, rather than learning in parts online. The study of related components (aka blobs) inevitably leads to the consideration of binarization (the threshold value aka), which takes on shades of gray or a color image and generates a black and white image from it. If you work with images from the camera, the lighting becomes critical, and it takes time and skill to study.

There are many other preprocessing steps that may be required to clean the image. The need for preprocessing depends on your application.

A tutorial is often recommended here, which gives a good overview of standard image processing methods:

Digital Image Processing by Gonzalez and Woods, 3rd Edition http://www.imageprocessingplace.com/

Go to addall.com to find cheap copies. International publications are cheaper.

If the characters (or other shapes) in the image have a consistent size and shape, for example, "A" always has a height of 40 pixels and 25 pixels and prints the same font - then you can use the "normalized cross-correlation" or pattern matching method for identify the presence of one or more matching shapes. This method may work as a very crude form of OCR, but has serious limitations.

http://en.wikipedia.org/wiki/Template_matching

+4
source

If your image represents black characters on a white background (or vice versa), and if the image is of reasonable quality, and if the lines of text are horizontal and if each character is separated from its neighbors, it is a relatively trivial operation to find all the small islands of black pixels in the white sea.

Since each of these ifs is relaxed, the problem becomes more complicated, but remains the same conceptual: find the black pixel, then find all the other black pixels to which it is connected, and you find the character. Or, bearing in mind the comments on OCR and your requirement, you have found a patch of black pixels that (you claim) represent a symbol.

+3
source

I put the code on a code project that does exactly what you want.
Connected component labeling and vectorization

Its one-pass contour extraction using paper. The algorithm using Linear-Time marks using the contour tracking method of Fu Chang, Chun-Jen Chen and Chi-Jen Lu.

+1
source

You might find it helpful to learn about Blob or machine vision interaction analysis. Most libraries, including free ones, have something like this. In addition, if you know the orientation, the text will be black and white, and the text will be well located, you will need to find the edges of the characters in the 1st projection of the image in X and Y or at any angle if you have time.

0
source

In my opinion, the best answer so far is Rethunk, as it indicates that you should use segmentation labeling and the connected component. HighPerformanceMark basically describes an algorithm for marking connected components (which is very simple), but I think that mentioning the name of the algorithm is important for such an answer.

Please note, however, that labeling the segmentation and connected component is just the beginning to solve your problem. For example, some letters, such as lowercase "i", will consist of two components, and you should think that you may have ligatures (that is, two letters that are related to each other). That’s why I like M. Babcock’s comment: it’s hard to find a good solution to your problem without recognizing the characters.

For your problem, I believe that you can solve your problem using the OCR library .

0
source

Source: https://habr.com/ru/post/908780/


All Articles