Automatically find the optimal image threshold value from the histogram plot density

I am looking for an optical character recognition (OCR) function on the display and want the program to work in different lighting conditions. To do this, I need to process the threshold image so that there is no noise around each digit, which allowed me to determine the outline of the digit and perform OCR. I need the threshold value that I use to adapt to these different lighting conditions. I tried the adaptive threshold, but I could not get it to work.

My image processing is simple: upload image (i), grayscale i (g), apply histogram alignment to g (h) and apply binary threshold to h with threshold value = t. I worked with several different data sets and found that the optimal threshold for OCR continuous operation lies in the range of maximum density on the histogram graph (h) (the only part of the graph without spaces). A histogram of (h). The values ​​t = [190,220] are optimal for OCR

Histogram (h). The values ​​t = [190, 220] are optimal for OCR. A more complete set of images describing my problem can be found here: http://imgur.com/a/wRgi7

My current solution, which works, but clumsily and slowly, checks:

1. There must be 3 digits 2. The first digit must be reasonably small in size 3. There must be at least one contour recognized as a digit 4. The digit must be recognized in the digit dictionary 

The ban is to accept all cases, the threshold is increased by 10 (starting from a low value), and the attempt is made again.

The fact that I can determine the optimal threshold value on the histogram graph (h) may just be an offset bias, but I would like to know if there is a way to extract the value. This is different from the way I worked with histograms before, which was more on finding peaks / valleys.

I use cv2 for image processing and matplotlib.pyplot for histogram plots.

+5
source share
4 answers

I would also recommend using an automatic threshold method such as the Otsu method ( here's a good explanation of the method).

In Python OpenCV, you have a tutorial that explains how to do Otsu binary processing.

If you want to experiment with other automatic threshold methods, you can take a look at ImageJ / Fiji . For example, this page summarizes all implemented methods.


Grayscale Image:

Grayscale

Results:

results

If you want to override methods, you can check the source code of the Auto_Threshold plugin. I used Fiji for this demo.

+1
source

At first I thought “well, just make a histogram of the indexes in which the data appears” that will fully work, but I don’t think that it really will solve your main work that you want to do.

I think you are interpreting the histogram incorrectly. Which alignment of the histogram makes the histogram in highly concentrated areas, so if you take different sizes of the bin with the histogram, you will get a more or less equal amount inside the bins. The only reason these values ​​are dense is because they are less visible on the image. Alignment histogram makes other, more popular values, smaller. And the reason is that the range works well, as you see in the original grayscale, the values ​​between 190 and 220 are really close to where the image starts to glow again; that is, when there is a clear demarcation of bright values.

You can see how equalizeHist works directly by equalizeHist histograms with different box sizes. For example, here the cyclic movement along the size of the hopper from 3 to 20:

Multiple Histogram Values

Editing: just to be clear, you want this area to be programmed between a lower beat and a higher beat in the original histogram. You do not need to use the compared histograms for this. In fact, this is what the Otsu threshold (after the Otsu method ) actually does: you assume that the data follows the bimodal distribution, and find a point that clearly indicates the point between the two distributions.

+2
source

Check it out: link , it really doesn’t depend on density, it works because you have separated 2 maxima. Local maxima - the main classes of the foreground - the left local maximum (text pixels) and the background right local maximum (white paper). The optimal threshold should optimally separate these maxima. And the optimal threshold value lies in the local minimum region between two local maxima.

+2
source

Basically, you are asking to find the indices of the longest sequence of nonzero element in a 256 x 1 array.

Based on this answer, you should get what you want:

 import cv2 import numpy as np # load in grayscale img = cv2.imread("image.png",0) hist = cv2.calcHist([img],[0],None,[256],[0,256]) non_zero_sequences = np.where(np.diff(np.hstack(([False],hist!=0,[False]))))[0].reshape(-1,2) longest_sequence_id = np.diff(non_zero_sequences,axis=1).argmax() longest_sequence_start = non_zero_sequences[longest_sequence_id,0] longest_sequence_stop = non_zero_sequences[longest_sequence_id,1] 

Please note that it is not verified.

+1
source

Source: https://habr.com/ru/post/1271645/


All Articles