Recognize text at a specific position using the iPhone camera

I would like to develop an application that should be able to recognize some numbers on the printed card of the computer (located in fixed places on the card), and then send them to the web service.

I know I have to use OCR, but I'm not sure which product will fit my needs. It would be great if you would offer me any api or products on the market (open source is not necessary, but it will be very nice :) that can help me in this project.

In addition, I have one more technical question: will you add OCR recognition to the device or will you do it using a web service and call him by sending him an image? What are the pros and cons of both models?

+6
source share
1 answer

If you need a solution that defines specific fields in the image, then this is not just OCR, but a Data Capture task. There are several ways to solve this problem: write a solution for detecting fields based on OCR output, as suggested in another answer, or use tools specially designed for this, and offers visual tools to determine the layout structure.

The first method requires more programs, but cheaper in terms of licensing. You can choose not only commercial, but also open source OCR libraries, such as Tesseract, which may not be ideal, but with some settings and font training can be good enough for many tasks.

When working with low-quality images (and the images taken by the phone’s camera will have a significant part), your local location solution will need to take care of cases where some parts of the images were not recognized or mistakenly recognized and can still find the ones you need. fields. You can also cross-check several recognition options to provide reasonable combinations.

This is not trivial and it will take some time to make it work reliably. But still feasible, if you have not very complex documents, and there is only one layout, and this is very predictable. And as soon as you own the code, it can be launched both on the server and on the phone.

If you're looking for slightly more complex documents and varied layout options, this logic in pure code can become too complex. In this case, it is better to look for more advanced Data Capture technologies. There are quite a few Data Captrue products, but I only know one that is offered as an API: http://www.abbyy.com/flexicapture_engine/

It consists of two components. One of them is a visual tool for creating and debugging a document description. You simply describe the logic of the location of the field in the document, and the technology takes care of everything else: voting for different options, taking care of errors in re-registration, etc. You can define several alternative document structures and rules to check if one value matches another in the document layout. These rules also influence the selection of the best recognition options.

The second component is actually an API. You just connect it to your application and upload a description of the document template. In a mobile recognition scenario, it can only be used as server processing, since it is too powerful and heavy to fit into a mobile. However, the bright side of this is that you don’t need to transfer it to every mobile OS, it uses full-featured OCR technology, and not limited ones, which are suitable for mobile resources. This toolkit includes some advanced image processing technologies that enhance the performance of images captured by the phone.

Disclaimer: I work for ABBYY.

+2
source

Source: https://habr.com/ru/post/886654/


All Articles