I am using the Google api application to perform text recognition when receiving images. I get good results, but the format in which the return is pretty unreliable. If there is a large gap between the text, reading will print the line below, and not next to it.
For example, with the following Sample Image, I get the answer below:
4x Löwenbräu Original a 3,00 12,00 1 8x Weissbier dunkel a 3,30 26,401 3x Hefe-Weissbier a 3,30 9,90 1 1x Saft 0,25 1x Grosses Wasser 1x Vegetarische Varia 1x Gyros 1x Baby Kalamari Gefu 2x Gyros Folie 1x Schafskäse Ofen 1x Bifteki Metaxa 1x Schweinefilet Meta 1x St ifado 1x Tee 2,50 1 2,40 1 9,90 1 8,90 1 12,90 a 9,9019,80 1 6,90 1 11,90 1 13,90 1 14,90 1 2,10 1
What starts well and as expected, but then becomes pretty useless when trying to connect prices to text, etc. The ideal answer would be:
4x Löwenbräu Original a 3,00 12,00 1 8x Weissbier dunkel a 3,30 26,401 3x Hefe-Weissbier a 3,30 9,90 1 1x Saft 0,25 2,50 1 1x Grosses Wasser 2,40 1 1x Vegetarische Varia 9,90 1 1x Gyros 8,90 1 1x Baby Kalamari Gefu 12,90 1 2x Gyros Folie a 9,9019,80 1 1x Schafskäse Ofen 6,90 1 1x Bifteki Metaxa 11,90 1 1x Schweinefilet Meta 13,90 1 1x St ifado 14,90 1 1x Tee 2,10 1
Or close to that.
Is there a formatting request that you can add to the api to get different answers? I have had success using tessereact where you can change the output format to achieve this result and wondered if the api vision has anything like that.
I understand that api returns the coordinates of letters that could be used, but I was hoping that you would not have to go deeper into this depth.