I understand that this is not a complete solution, but I think that is enough to get you started.
The computer vision API returns a JSON result with the lines property, which is just an array of objects with the boundingBox property.
These boundingBox es are the X, Y coordinates of the left and lower right coordinates of the "square" of each phrase.
You basically need to process this array and βsortβ the elements based on this property.
In this JSFiddle, you will see that I sort the rows by Y coordinate and then group them.
It remains to be made smarter about the grouping - if the Y coordinates are 201 and 202, you can assume that they are on the same line and simply add them to the same line, sorted by increasing X coordinate.
Code:
if (jsonResponse.status == 'Succeeded') { var result = ''; // Sort lines by Y coordinate jsonResponse.recognitionResult.lines.sort(function(a, b) { var topLeftYCoordA = a.boundingBox[1]; var topLeftYCoordB = b.boundingBox[1]; if (topLeftYCoordA > topLeftYCoordB) { return 1; } if (topLeftYCoordA < topLeftYCoordB) { return -1; } return 0; }) // group lines by Y coordinate var grouped = {}; jsonResponse.recognitionResult.lines.map(function(line) { var topLeftYcoordinate = line.boundingBox[1]; if (!grouped[topLeftYcoordinate]) { grouped[topLeftYcoordinate] = line; } else { grouped[topLeftYcoordinate] += line; } }); Object.keys(grouped).forEach(function(yCoordinate) { result += yCoordinate + ' - ' + grouped[yCoordinate].text + '</br>'; }) $(".right").html(result); }
Result:

source share