Can I save text structure with Google Cloud Vision TEXT_DETECTION?

Version 1 of the Google Cloud Vision API (beta) allows optical recognition to be recognized through TEXT_DETECTION requests. Although the recognition quality is good, characters are returned without any hint of the original layout. Therefore, structured text (for example, tables, receipts, column data) is sometimes incorrectly ordered.

Can I keep the document structure using the Google Cloud Vision API? Similar questions were asked by tesseract and hOCR. For example, [1] and [2]. The documentation [3] does not contain information about the TEXT_DETECTION parameters.

[1] How to save document structure in tesseract [2] Tesseract - ambiguity in space and tabs [3] https://cloud.google.com/vision/

+4
source share
1 answer

Recognition of the text structure is a more abstract concept than recognizing the text itself: letters, words, sentences. If you already have this text structure information in your file metadata, you can do something like:

  • Segment / split your input image in subblocks.
  • Fulfill your text_detection requests.
  • Correctly configure text based on metadata.

API Cloud_detection Cloud Vision, text_detection, language_detection text_structure_detection, /.

, , .

+2

Source: https://habr.com/ru/post/1629661/


All Articles