How to rotate individual letters of the image in the desired orientation for optimal recognition?

In my previous question , I converted this image:

enter image description here

in it:

enter image description here

which Tesseract OCR interprets as this:

1O351 

Placing a frame around an image

enter image description here

really improves the OCR result.

  1CB51 

However, I need all 5 characters for OCR correctly, since in the experiment I used Paint.NET to rotate and align each individual letter in its correct orientation:

enter image description here

The result of the correct answer:

 1CB52 

How can I make this correction in C #?

I have done several studies on various text alignment algorithms, but all of them assume the existence of lines of text in the original image, lines from which you can get the angle of rotation, but which already contain the correct spacing and orientation relations between the letters.

+6
source share
1 answer

You can use the code in the next article in the draft code to segment each individual character. However, trying to match these characters individually, any result you get will not be very good, because there is not much information to leave.

I tried using the AForge.NET HoughLineTransformation class , and I got angles in the range of 80 to 90 degrees. So I tried using the following code to place them:

 private static Bitmap DeskewImageByIndividualChars(Bitmap targetBitmap) { IDictionary<Rectangle, Bitmap> characters = new CCL().Process(targetBitmap); using (Graphics g = Graphics.FromImage(targetBitmap)) { foreach (var character in characters) { double angle; BitmapData bitmapData = character.Value.LockBits(new Rectangle(Point.Empty, character.Value.Size), ImageLockMode.ReadWrite, PixelFormat.Format8bppIndexed); try { HoughLineTransformation hlt = new HoughLineTransformation(); hlt.ProcessImage(bitmapData); angle = hlt.GetLinesByRelativeIntensity(0.5).Average(l => l.Theta); } finally { character.Value.UnlockBits(bitmapData); } using (Bitmap bitmap = RotateImage(character.Value, 90 - angle, Color.White)) { g.DrawImage(bitmap, character.Key.Location); } } } return targetBitmap; } 

Using the RotateImage . However, the results were not the best. Perhaps you can try and make them better.

Here is the code from the code draft article for your reference. I made a few changes to it, so it behaves much safer, for example, adding try-finally around LockBits and correctly placing objects using the using statement, etc.

 using System.Collections.Generic; using System.Drawing; using System.Drawing.Imaging; using System.Linq; namespace ConnectedComponentLabeling { public class CCL { private Bitmap _input; private int[,] _board; public IDictionary<Rectangle, Bitmap> Process(Bitmap input) { _input = input; _board = new int[_input.Width, _input.Height]; Dictionary<int, List<Pixel>> patterns = Find(); var images = new Dictionary<Rectangle, Bitmap>(); foreach (KeyValuePair<int, List<Pixel>> pattern in patterns) { using (Bitmap bmp = CreateBitmap(pattern.Value)) { images.Add(GetBounds(pattern.Value), (Bitmap)bmp.Clone()); } } return images; } protected virtual bool CheckIsBackGround(Pixel currentPixel) { return currentPixel.color.A == 255 && currentPixel.color.R == 255 && currentPixel.color.G == 255 && currentPixel.color.B == 255; } private unsafe Dictionary<int, List<Pixel>> Find() { int labelCount = 1; var allLabels = new Dictionary<int, Label>(); BitmapData imageData = _input.LockBits(new Rectangle(0, 0, _input.Width, _input.Height), ImageLockMode.ReadOnly, PixelFormat.Format24bppRgb); try { int bytesPerPixel = 3; byte* scan0 = (byte*)imageData.Scan0.ToPointer(); int stride = imageData.Stride; for (int i = 0; i < _input.Height; i++) { byte* row = scan0 + (i * stride); for (int j = 0; j < _input.Width; j++) { int bIndex = j * bytesPerPixel; int gIndex = bIndex + 1; int rIndex = bIndex + 2; byte pixelR = row[rIndex]; byte pixelG = row[gIndex]; byte pixelB = row[bIndex]; Pixel currentPixel = new Pixel(new Point(j, i), Color.FromArgb(pixelR, pixelG, pixelB)); if (CheckIsBackGround(currentPixel)) { continue; } IEnumerable<int> neighboringLabels = GetNeighboringLabels(currentPixel); int currentLabel; if (!neighboringLabels.Any()) { currentLabel = labelCount; allLabels.Add(currentLabel, new Label(currentLabel)); labelCount++; } else { currentLabel = neighboringLabels.Min(n => allLabels[n].GetRoot().Name); Label root = allLabels[currentLabel].GetRoot(); foreach (var neighbor in neighboringLabels) { if (root.Name != allLabels[neighbor].GetRoot().Name) { allLabels[neighbor].Join(allLabels[currentLabel]); } } } _board[j, i] = currentLabel; } } } finally { _input.UnlockBits(imageData); } Dictionary<int, List<Pixel>> patterns = AggregatePatterns(allLabels); patterns = RemoveIntrusions(patterns, _input.Width, _input.Height); return patterns; } private Dictionary<int, List<Pixel>> RemoveIntrusions(Dictionary<int, List<Pixel>> patterns, int width, int height) { var patternsCleaned = new Dictionary<int, List<Pixel>>(); foreach (var pattern in patterns) { bool bad = false; foreach (Pixel item in pattern.Value) { //Horiz if (item.Position.X == 0) bad = true; else if (item.Position.Y == width - 1) bad = true; //Vert else if (item.Position.Y == 0) bad = true; else if (item.Position.Y == height - 1) bad = true; } if (!bad) patternsCleaned.Add(pattern.Key, pattern.Value); } return patternsCleaned; } private IEnumerable<int> GetNeighboringLabels(Pixel pix) { var neighboringLabels = new List<int>(); for (int i = pix.Position.Y - 1; i <= pix.Position.Y + 2 && i < _input.Height - 1; i++) { for (int j = pix.Position.X - 1; j <= pix.Position.X + 2 && j < _input.Width - 1; j++) { if (i > -1 && j > -1 && _board[j, i] != 0) { neighboringLabels.Add(_board[j, i]); } } } return neighboringLabels; } private Dictionary<int, List<Pixel>> AggregatePatterns(Dictionary<int, Label> allLabels) { var patterns = new Dictionary<int, List<Pixel>>(); for (int i = 0; i < _input.Height; i++) { for (int j = 0; j < _input.Width; j++) { int patternNumber = _board[j, i]; if (patternNumber != 0) { patternNumber = allLabels[patternNumber].GetRoot().Name; if (!patterns.ContainsKey(patternNumber)) { patterns[patternNumber] = new List<Pixel>(); } patterns[patternNumber].Add(new Pixel(new Point(j, i), Color.Black)); } } } return patterns; } private unsafe Bitmap CreateBitmap(List<Pixel> pattern) { int minX = pattern.Min(p => p.Position.X); int maxX = pattern.Max(p => p.Position.X); int minY = pattern.Min(p => p.Position.Y); int maxY = pattern.Max(p => p.Position.Y); int width = maxX + 1 - minX; int height = maxY + 1 - minY; Bitmap bmp = DrawFilledRectangle(width, height); BitmapData imageData = bmp.LockBits(new Rectangle(0, 0, bmp.Width, bmp.Height), ImageLockMode.ReadWrite, PixelFormat.Format24bppRgb); try { byte* scan0 = (byte*)imageData.Scan0.ToPointer(); int stride = imageData.Stride; foreach (Pixel pix in pattern) { scan0[((pix.Position.X - minX) * 3) + (pix.Position.Y - minY) * stride] = pix.color.B; scan0[((pix.Position.X - minX) * 3) + (pix.Position.Y - minY) * stride + 1] = pix.color.G; scan0[((pix.Position.X - minX) * 3) + (pix.Position.Y - minY) * stride + 2] = pix.color.R; } } finally { bmp.UnlockBits(imageData); } return bmp; } private Bitmap DrawFilledRectangle(int x, int y) { Bitmap bmp = new Bitmap(x, y); using (Graphics graph = Graphics.FromImage(bmp)) { Rectangle ImageSize = new Rectangle(0, 0, x, y); graph.FillRectangle(Brushes.White, ImageSize); } return bmp; } private Rectangle GetBounds(List<Pixel> pattern) { var points = pattern.Select(x => x.Position); var x_query = points.Select(p => pX); int xmin = x_query.Min(); int xmax = x_query.Max(); var y_query = points.Select(p => pY); int ymin = y_query.Min(); int ymax = y_query.Max(); return new Rectangle(xmin, ymin, xmax - xmin, ymax - ymin); } } } 

With the above code, I got the following input / output:

entrance Output

As you can see, B rotates pretty well, but the rest are not so good.


An alternative to trying to match individual characters is to search for a location there using the segmentation procedure described above. Then pass each individual character through its recognition engine separately and, if that improves your results.


I used the following method to find the angle of a character using the List<Pixel> from the CCL class. It works by finding the angle between the “lower left” and “lower right” points. I have not tested if it works, if the character rotates the other way around.

 private double GetAngle(List<Pixel> pattern) { var pixels = pattern.Select(p => p.Position).ToArray(); Point bottomLeft = pixels.OrderByDescending(p => pY).ThenBy(p => pX).First(); Point rightBottom = pixels.OrderByDescending(p => pX).ThenByDescending(p => pY).First(); int xDiff = rightBottom.X - bottomLeft.X; int yDiff = rightBottom.Y - bottomLeft.Y; double angle = Math.Atan2(yDiff, xDiff) * 180 / Math.PI; return -angle; } 

Please note that my drawing code is a bit broken, so 5 disabled on the right, but this code produces the following output:

Output

Note that B and 5 rotate further than you expected due to their curvature.


Using the following code, getting the angle from the left and right edges, and then choosing the best one, the turns seem to be better. Note. I tested it only with letters that need to be rotated clockwise, so if they need to go the other way around, this may not work.

These are also the "quadrants" of pixels, so each pixel is selected from its own quadrant so as not to get two that are too close.

The idea of ​​choosing the best angle is that they are similar, currently within 1.5 degrees of each other, but can be easily updated, on average they are. Otherwise, we choose the one that is closest to zero.

 private double GetAngle(List<Pixel> pattern, Rectangle bounds) { int halfWidth = bounds.X + (bounds.Width / 2); int halfHeight = bounds.Y + (bounds.Height / 2); double leftEdgeAngle = GetAngleLeftEdge(pattern, halfWidth, halfHeight); double rightEdgeAngle = GetAngleRightEdge(pattern, halfWidth, halfHeight); if (Math.Abs(leftEdgeAngle - rightEdgeAngle) <= 1.5) { return (leftEdgeAngle + rightEdgeAngle) / 2d; } if (Math.Abs(leftEdgeAngle) > Math.Abs(rightEdgeAngle)) { return rightEdgeAngle; } else { return leftEdgeAngle; } } private double GetAngleLeftEdge(List<Pixel> pattern, double halfWidth, double halfHeight) { var topLeftPixels = pattern.Select(p => p.Position).Where(p => pY < halfHeight && pX < halfWidth).ToArray(); var bottomLeftPixels = pattern.Select(p => p.Position).Where(p => pY > halfHeight && pX < halfWidth).ToArray(); Point topLeft = topLeftPixels.OrderBy(p => pX).ThenBy(p => pY).First(); Point bottomLeft = bottomLeftPixels.OrderByDescending(p => pY).ThenBy(p => pX).First(); int xDiff = bottomLeft.X - topLeft.X; int yDiff = bottomLeft.Y - topLeft.Y; double angle = Math.Atan2(yDiff, xDiff) * 180 / Math.PI; return 90 - angle; } private double GetAngleRightEdge(List<Pixel> pattern, double halfWidth, double halfHeight) { var topRightPixels = pattern.Select(p => p.Position).Where(p => pY < halfHeight && pX > halfWidth).ToArray(); var bottomRightPixels = pattern.Select(p => p.Position).Where(p => pY > halfHeight && pX > halfWidth).ToArray(); Point topRight = topRightPixels.OrderBy(p => pY).ThenByDescending(p => pX).First(); Point bottomRight = bottomRightPixels.OrderByDescending(p => pX).ThenByDescending(p => pY).First(); int xDiff = bottomRight.X - topRight.X; int yDiff = bottomRight.Y - topRight.Y; double angle = Math.Atan2(xDiff, yDiff) * 180 / Math.PI; return Math.Abs(angle); } 

Now the following output is issued, again my drawing code is slightly broken. Note that C doesn’t look very good, but if you look closely, it’s just a form of this that caused it to happen.

Output


I improved the drawing code, and also tried to get the characters on the same baseline:

 private static Bitmap DeskewImageByIndividualChars(Bitmap bitmap) { IDictionary<Rectangle, Tuple<Bitmap, double>> characters = new CCL().Process(bitmap); Bitmap deskewedBitmap = new Bitmap(bitmap.Width, bitmap.Height, bitmap.PixelFormat); deskewedBitmap.SetResolution(bitmap.HorizontalResolution, bitmap.VerticalResolution); using (Graphics g = Graphics.FromImage(deskewedBitmap)) { g.FillRectangle(Brushes.White, new Rectangle(Point.Empty, deskewedBitmap.Size)); int baseLine = characters.Max(c => c.Key.Bottom); foreach (var character in characters) { int y = character.Key.Y; if (character.Key.Bottom != baseLine) { y += (baseLine - character.Key.Bottom - 1); } using (Bitmap characterBitmap = RotateImage(character.Value.Item1, character.Value.Item2, Color.White)) { g.DrawImage(characterBitmap, new Point(character.Key.X, y)); } } } return deskewedBitmap; } 

Then the following output is issued. Please note that each symbol is not on the same baseline due to the fact that the bottom of the preliminary rotation is taken for its operation. Improving code that uses a basic level after rotation is required. Creating an image threshold before executing the baseline will also help.

Another improvement would be the Right calculation of each of the rotated places of the characters, so when drawing the next one, it does not overlap the previous ones and does not cut out the bits. Because, as you can see at the output, 2 slightly cut into 5 .

The output is now very similar to manually created in OP.

Output

+9
source

Source: https://habr.com/ru/post/1014390/


All Articles