Get the correct image orientation using Google Cloud Vision api (TEXT_DETECTION)

I tried the Google Cloud Vision api (TEXT_DETECTION) in a 90 degree rotated image. It can still return the recognized text correctly. (see image below)

This means that the engine can recognize text, even if the image is rotated 90, 180, 270 degrees.

However, the response result does not contain information about the correct orientation of the image. (document: EntityAnnotation )

In any case, not only get recognized text, but also get orientation ?
Can Google support it in the same way ( FaceAnnotation : getRollAngle)

enter image description here

+9
source share
5 answers

As described in the Public Issue Tracker , our development team is now aware of this feature request, and there is currently no ETA to implement it.

Note. Orientation information may already be available in your image metadata. An example of how to extract metadata can be seen in this third-party library .

A wide workaround would be to check the returned " boundingPoly " "peaks" for the returned "textAnnotations". By calculating the width and height of each detected rectangle of a word, you can find out if the image is incorrect if the rectangle is "height"> "width" (otherwise the image is sideways).

+4
source

You can use the fact that we know the sequence of characters in a word to deduce the orientation of a word as follows (obviously, a slightly different logic for non-LTR languages):

for page in annotation: for block in page.blocks: for paragraph in block.paragraphs: for word in paragraph.words: if len(word.symbols) < MIN_WORD_LENGTH_FOR_ROTATION_INFERENCE: continue first_char = word.symbols[0] last_char = word.symbols[-1] first_char_center = (np.mean([vx for v in first_char.bounding_box.vertices]),np.mean([vy for v in first_char.bounding_box.vertices])) last_char_center = (np.mean([vx for v in last_char.bounding_box.vertices]),np.mean([vy for v in last_char.bounding_box.vertices])) #upright or upside down if np.abs(first_char_center[1] - last_char_center[1]) < np.abs(top_right.y - bottom_right.y): if first_char_center[0] <= last_char_center[0]: #upright print 0 else: #updside down print 180 else: #sideways if first_char_center[1] <= last_char_center[1]: print 90 else: print 270 

Then you can use the orientation of individual words to display the orientation of the document as a whole.

+3
source

I am posting my workaround that really works for images rotated 90, 180, 270 degrees. See code below.

  GetExifOrientation (annotateImageResponse.getTextAnnotations (). Get (1)); 
 /** * * @param ea The input EntityAnnotation must be NOT from the first EntityAnnotation of * annotateImageResponse.getTextAnnotations(), because it is not affected by * image orientation. * @return Exif orientation (1 or 3 or 6 or 8) */ public static int GetExifOrientation(EntityAnnotation ea) { List<Vertex> vertexList = ea.getBoundingPoly().getVertices(); // Calculate the center float centerX = 0, centerY = 0; for (int i = 0; i < 4; i++) { centerX += vertexList.get(i).getX(); centerY += vertexList.get(i).getY(); } centerX /= 4; centerY /= 4; int x0 = vertexList.get(0).getX(); int y0 = vertexList.get(0).getY(); if (x0 < centerX) { if (y0 < centerY) { // 0 -------- 1 // | | // 3 -------- 2 return EXIF_ORIENTATION_NORMAL; // 1 } else { // 1 -------- 2 // | | // 0 -------- 3 return EXIF_ORIENTATION_270_DEGREE; // 6 } } else { if (y0 < centerY) { // 3 -------- 0 // | | // 2 -------- 1 return EXIF_ORIENTATION_90_DEGREE; // 8 } else { // 2 -------- 3 // | | // 1 -------- 0 return EXIF_ORIENTATION_180_DEGREE; // 3 } } } 

Additional Information
I found that I need to add a hint for the language to make annotateImageResponse.getTextAnnotations().get(1) always follow the rule.

Sample code to add a language tooltip

 ImageContext imageContext = new ImageContext(); String [] languages = { "zh-TW" }; imageContext.setLanguageHints(Arrays.asList(languages)); annotateImageRequest.setImageContext(imageContext); 
+2
source

Sometimes it’s not possible to get orientation from metadata. For example, if the user took a photo using the camera of a mobile device with the wrong orientation. My solution is based on Jack Fan's answer and google-api-services-vision (available via Maven).

my class is TextUnit

  public class TextUnit { private String text; // X of lowest left point private float llx; // Y of lowest left point private float lly; // X of upper right point private float urx; // Y of upper right point private float ury; } 

base method:

  List<TextUnit> extractData(BatchAnnotateImagesResponse response) throws AnnotateImageResponseException { List<TextUnit> data = new ArrayList<>(); for (AnnotateImageResponse res : response.getResponses()) { if (null != res.getError()) { String errorMessage = res.getError().getMessage(); logger.log(Level.WARNING, "AnnotateImageResponse ERROR: " + errorMessage); throw new AnnotateImageResponseException("AnnotateImageResponse ERROR: " + errorMessage); } else { List<EntityAnnotation> texts = response.getResponses().get(0).getTextAnnotations(); if (texts.size() > 0) { //get orientation EntityAnnotation first_word = texts.get(1); int orientation; try { orientation = getExifOrientation(first_word); } catch (NullPointerException e) { try { orientation = getExifOrientation(texts.get(2)); } catch (NullPointerException e1) { orientation = EXIF_ORIENTATION_NORMAL; } } logger.log(Level.INFO, "orientation: " + orientation); // Calculate the center float centerX = 0, centerY = 0; for (Vertex vertex : first_word.getBoundingPoly().getVertices()) { if (vertex.getX() != null) { centerX += vertex.getX(); } if (vertex.getY() != null) { centerY += vertex.getY(); } } centerX /= 4; centerY /= 4; for (int i = 1; i < texts.size(); i++) {//exclude first text - it contains all text of the page String blockText = texts.get(i).getDescription(); BoundingPoly poly = texts.get(i).getBoundingPoly(); try { float llx = 0; float lly = 0; float urx = 0; float ury = 0; if (orientation == EXIF_ORIENTATION_NORMAL) { poly = invertSymmetricallyBy0X(centerY, poly); llx = getLlx(poly); lly = getLly(poly); urx = getUrx(poly); ury = getUry(poly); } else if (orientation == EXIF_ORIENTATION_90_DEGREE) { //invert by x poly = rotate(centerX, centerY, poly, Math.toRadians(-90)); poly = invertSymmetricallyBy0Y(centerX, poly); llx = getLlx(poly); lly = getLly(poly); urx = getUrx(poly); ury = getUry(poly); } else if (orientation == EXIF_ORIENTATION_180_DEGREE) { poly = rotate(centerX, centerY, poly, Math.toRadians(-180)); poly = invertSymmetricallyBy0Y(centerX, poly); llx = getLlx(poly); lly = getLly(poly); urx = getUrx(poly); ury = getUry(poly); }else if (orientation == EXIF_ORIENTATION_270_DEGREE){ //invert by x poly = rotate(centerX, centerY, poly, Math.toRadians(-270)); poly = invertSymmetricallyBy0Y(centerX, poly); llx = getLlx(poly); lly = getLly(poly); urx = getUrx(poly); ury = getUry(poly); } data.add(new TextUnit(blockText, llx, lly, urx, ury)); } catch (NullPointerException e) { //ignore - some polys has not X or Y coordinate if text located closed to bounds. } } } } } return data; } 

helper methods:

 private float getLlx(BoundingPoly poly) { try { List<Vertex> vertices = poly.getVertices(); ArrayList<Float> xs = new ArrayList<>(); for (Vertex v : vertices) { float x = 0; if (v.getX() != null) { x = v.getX(); } xs.add(x); } Collections.sort(xs); float llx = (xs.get(0) + xs.get(1)) / 2; return llx; } catch (Exception e) { return 0; } } private float getLly(BoundingPoly poly) { try { List<Vertex> vertices = poly.getVertices(); ArrayList<Float> ys = new ArrayList<>(); for (Vertex v : vertices) { float y = 0; if (v.getY() != null) { y = v.getY(); } ys.add(y); } Collections.sort(ys); float lly = (ys.get(0) + ys.get(1)) / 2; return lly; } catch (Exception e) { return 0; } } private float getUrx(BoundingPoly poly) { try { List<Vertex> vertices = poly.getVertices(); ArrayList<Float> xs = new ArrayList<>(); for (Vertex v : vertices) { float x = 0; if (v.getX() != null) { x = v.getX(); } xs.add(x); } Collections.sort(xs); float urx = (xs.get(xs.size()-1) + xs.get(xs.size()-2)) / 2; return urx; } catch (Exception e) { return 0; } } private float getUry(BoundingPoly poly) { try { List<Vertex> vertices = poly.getVertices(); ArrayList<Float> ys = new ArrayList<>(); for (Vertex v : vertices) { float y = 0; if (v.getY() != null) { y = v.getY(); } ys.add(y); } Collections.sort(ys); float ury = (ys.get(ys.size()-1) +ys.get(ys.size()-2)) / 2; return ury; } catch (Exception e) { return 0; } } /** * rotate rectangular clockwise * * @param poly * @param theta the angle of rotation in radians * @return */ public BoundingPoly rotate(float centerX, float centerY, BoundingPoly poly, double theta) { List<Vertex> vertexList = poly.getVertices(); //rotate all vertices in poly for (Vertex vertex : vertexList) { float tempX = vertex.getX() - centerX; float tempY = vertex.getY() - centerY; // now apply rotation float rotatedX = (float) (centerX - tempX * cos(theta) + tempY * sin(theta)); float rotatedY = (float) (centerX - tempX * sin(theta) - tempY * cos(theta)); vertex.setX((int) rotatedX); vertex.setY((int) rotatedY); } return poly; } /** * since Google Vision Api returns boundingPoly-s when Coordinates starts from top left corner, * but Itext uses coordinate system with bottom left start position - * we need invert the result for continue to work with itext. * * @return text units inverted symmetrically by 0X coordinates. */ private BoundingPoly invertSymmetricallyBy0X(float centerY, BoundingPoly poly) { List<Vertex> vertices = poly.getVertices(); for (Vertex v : vertices) { if (v.getY() != null) { v.setY((int) (centerY + (centerY - v.getY()))); } } return poly; } /** * * @param centerX * @param poly * @return text units inverted symmetrically by 0Y coordinates. */ private BoundingPoly invertSymmetricallyBy0Y(float centerX, BoundingPoly poly) { List<Vertex> vertices = poly.getVertices(); for (Vertex v : vertices) { if (v.getX() != null) { v.setX((int) (centerX + (centerX - v.getX()))); } } return poly; } 
0
source

Jack Fan answer worked for me. This is my version of VanillaJS.

 /** * * @param gOCR The Google Vision response * @return orientation (0, 90, 180 or 270) */ function getOrientation(gOCR) { var vertexList = gOCR.responses[0].textAnnotations[1].boundingPoly.vertices; const ORIENTATION_NORMAL = 0; const ORIENTATION_270_DEGREE = 270; const ORIENTATION_90_DEGREE = 90; const ORIENTATION_180_DEGREE = 180; var centerX = 0, centerY = 0; for (var i = 0; i < 4; i++) { centerX += vertexList[i].x; centerY += vertexList[i].y; } centerX /= 4; centerY /= 4; var x0 = vertexList[0].x; var y0 = vertexList[0].y; if (x0 < centerX) { if (y0 < centerY) { return ORIENTATION_NORMAL; } else { return ORIENTATION_270_DEGREE; } } else { if (y0 < centerY) { return ORIENTATION_90_DEGREE; } else { return ORIENTATION_180_DEGREE; } } } 
0
source

Source: https://habr.com/ru/post/1261723/


All Articles