Google Vision Text Detection answer will be in turn

I am using the Google api application to perform text recognition when receiving images. I get good results, but the format in which the return is pretty unreliable. If there is a large gap between the text, reading will print the line below, and not next to it.

For example, with the following Sample Image, I get the answer below:

4x Löwenbräu Original a 3,00 12,00 1 8x Weissbier dunkel a 3,30 26,401 3x Hefe-Weissbier a 3,30 9,90 1 1x Saft 0,25 1x Grosses Wasser 1x Vegetarische Varia 1x Gyros 1x Baby Kalamari Gefu 2x Gyros Folie 1x Schafskäse Ofen 1x Bifteki Metaxa 1x Schweinefilet Meta 1x St ifado 1x Tee 2,50 1 2,40 1 9,90 1 8,90 1 12,90 a 9,9019,80 1 6,90 1 11,90 1 13,90 1 14,90 1 2,10 1 

What starts well and as expected, but then becomes pretty useless when trying to connect prices to text, etc. The ideal answer would be:

  4x Löwenbräu Original a 3,00 12,00 1 8x Weissbier dunkel a 3,30 26,401 3x Hefe-Weissbier a 3,30 9,90 1 1x Saft 0,25 2,50 1 1x Grosses Wasser 2,40 1 1x Vegetarische Varia 9,90 1 1x Gyros 8,90 1 1x Baby Kalamari Gefu 12,90 1 2x Gyros Folie a 9,9019,80 1 1x Schafskäse Ofen 6,90 1 1x Bifteki Metaxa 11,90 1 1x Schweinefilet Meta 13,90 1 1x St ifado 14,90 1 1x Tee 2,10 1 

Or close to that.

Is there a formatting request that you can add to the api to get different answers? I have had success using tessereact where you can change the output format to achieve this result and wondered if the api vision has anything like that.

I understand that api returns the coordinates of letters that could be used, but I was hoping that you would not have to go deeper into this depth.

+5
source share
2 answers

You can add feature hints to your JSON request. For the image of such a receipt, DOCUMENT_TEXT_DETECTION give good results:

 { "requests": [ { "image": { "source": { "imageUri": "https://i.stack.imgur.com/TRTXo.png" } }, "features": [ { "type": "DOCUMENT_TEXT_DETECTION" } ] } ] } 

You can copy the above JSON and paste it into the request body in the Try this API panel on. Result:

 4x LOwenbräu Original a 3,00 12,00 1 8x Weissbier dunkel a 3, 3026, 40 1 3x Hefe-Weissbier a 3,30990 1 1x Saft 0,25 2, 50 1 1x Grosses Wasser 2, 40 1 1x Vegetarische Varia 9,90 1 1x Gyros 8,90 1 1x Baby Kalamari Gefu 12,90 ! 2x Gyros Folie a 9,9019, 80 1 1x Schaf skäse Ofen 6,90 1 1x Bifteki Metaxa 11,90 1 1x Schweinefilet Meta 13,90 1 1x Stifado 14, 90 1 1x Tee 2, 10 1 

Googie Vision is much less customizable than Tesseract at the moment. Since Google is behind both projects, guess which one will get a higher priority in the future?

+2
source

This may be a late answer, but adding it for future reference. For text that is very far apart, DOCUMENT_TEXT_DETECTION also does not provide proper line segmentation.

The following code performs simple line segmentation based on the coordinates of a polygon character.

https://github.com/sshniro/line-segmentation-algorithm-to-gcp-vision

+1
source

Source: https://habr.com/ru/post/1272306/


All Articles