If you need a solution that defines specific fields in the image, then this is not just OCR, but a Data Capture task. There are several ways to solve this problem: write a solution for detecting fields based on OCR output, as suggested in another answer, or use tools specially designed for this, and offers visual tools to determine the layout structure.
The first method requires more programs, but cheaper in terms of licensing. You can choose not only commercial, but also open source OCR libraries, such as Tesseract, which may not be ideal, but with some settings and font training can be good enough for many tasks.
When working with low-quality images (and the images taken by the phoneβs camera will have a significant part), your local location solution will need to take care of cases where some parts of the images were not recognized or mistakenly recognized and can still find the ones you need. fields. You can also cross-check several recognition options to provide reasonable combinations.
This is not trivial and it will take some time to make it work reliably. But still feasible, if you have not very complex documents, and there is only one layout, and this is very predictable. And as soon as you own the code, it can be launched both on the server and on the phone.
If you're looking for slightly more complex documents and varied layout options, this logic in pure code can become too complex. In this case, it is better to look for more advanced Data Capture technologies. There are quite a few Data Captrue products, but I only know one that is offered as an API: http://www.abbyy.com/flexicapture_engine/
It consists of two components. One of them is a visual tool for creating and debugging a document description. You simply describe the logic of the location of the field in the document, and the technology takes care of everything else: voting for different options, taking care of errors in re-registration, etc. You can define several alternative document structures and rules to check if one value matches another in the document layout. These rules also influence the selection of the best recognition options.
The second component is actually an API. You just connect it to your application and upload a description of the document template. In a mobile recognition scenario, it can only be used as server processing, since it is too powerful and heavy to fit into a mobile. However, the bright side of this is that you donβt need to transfer it to every mobile OS, it uses full-featured OCR technology, and not limited ones, which are suitable for mobile resources. This toolkit includes some advanced image processing technologies that enhance the performance of images captured by the phone.
Disclaimer: I work for ABBYY.