Microsoft Computer Vision OCR: disable text grouping by region

I use Microsoft Computer Vision to read receipts, trying to find an alternative to Abby OCR, as there is a significant price difference.

The results that I get are always grouped by region. This, obviously, makes it difficult to identify the corresponding fields with their sums.

Is there a way through Microsoft Vision or in general that I can achieve the same aligned output as Abby's?

Here is the image with the results and getting

Ocr Results

enter image description here

+5
source share
1 answer

I understand that this is not a complete solution, but I think that is enough to get you started.

The computer vision API returns a JSON result with the lines property, which is just an array of objects with the boundingBox property.

These boundingBox es are the X, Y coordinates of the left and lower right coordinates of the "square" of each phrase.

You basically need to process this array and β€œsort” the elements based on this property.

In this JSFiddle, you will see that I sort the rows by Y coordinate and then group them.

It remains to be made smarter about the grouping - if the Y coordinates are 201 and 202, you can assume that they are on the same line and simply add them to the same line, sorted by increasing X coordinate.

Code:

 if (jsonResponse.status == 'Succeeded') { var result = ''; // Sort lines by Y coordinate jsonResponse.recognitionResult.lines.sort(function(a, b) { var topLeftYCoordA = a.boundingBox[1]; var topLeftYCoordB = b.boundingBox[1]; if (topLeftYCoordA > topLeftYCoordB) { return 1; } if (topLeftYCoordA < topLeftYCoordB) { return -1; } return 0; }) // group lines by Y coordinate var grouped = {}; jsonResponse.recognitionResult.lines.map(function(line) { var topLeftYcoordinate = line.boundingBox[1]; if (!grouped[topLeftYcoordinate]) { grouped[topLeftYcoordinate] = line; } else { grouped[topLeftYcoordinate] += line; } }); Object.keys(grouped).forEach(function(yCoordinate) { result += yCoordinate + ' - ' + grouped[yCoordinate].text + '</br>'; }) $(".right").html(result); } 

Result:

enter image description here

+1
source

Source: https://habr.com/ru/post/1262554/


All Articles