ICR for machine printed text?

I know that ICR is mainly used to recognize handwritten (manual printing) data, but can we use ICR to extract distorted (poor quality) machine-printed text by accident?

if not, then the best way to solve the following problem.

I have an unstructured document that can work on 2 or more pages, as there are several date fields in the document that will be processed manually. Now I want to convert this to a text file. I tried using some ocr tools (omnipage and abbyy etc.) which have ICR modules to convert to a text file. they are well versed in full-screen OCR, but when it encounters a handwritten date, it puts the spam symbol instead of using the ICR module. I don’t want to go with form processing tools like parascript and A2ia that are position-based and they only work with a structured document.

or we can use ICR to convert the typed text of the machine and handwritten (in any case, it will work for the return date of the hand in this case)

here my goal is to get the output of a text file from an unstructured document with a small amount of hand-written text (e.g. date, number)

0
source share
1 answer

I tried some osc tools (omnipage and abbyy etc.) which have ICR modules

This is incorrect, which explains the poor result. If you tried the retail versions of OmniPage and ABBYY FineReader, these software packages are OCR only without ICR support.

I do not need form processing tools

You may have to somehow, but there are several approaches. It should be a marriage of two technologies, either out of the box or independently created, but this will require more effort than just installing and running it.

Today it is assumed that there is no unstructured text ICR software that can provide high quality results. Full-page OCR or unstructured text OCR (machine text) gives a high-quality result for machine text and garbage during manual recording. You are right that ICR implies zoning, which allows you to provide data types and basic dictionaries to improve handwriting recognition.

For the simplest and fastest approach, which may also be the most economical and least time consuming, I would use an unstructured form processing package such as ABBYY FlexiCapture ( http://www.wisetrend.com/abbyy_flexicapture.shtml ). This requires some kind of non-programmable setting to “localize” the zones. Zones can change position, and this software still finds them, and then uses the appropriate algorithm (OCR / ICR) to read the contents of the zones. Support OCR, ICR, OMR (checkmarks), BCR (barcode). There is also a built-in full-featured OCR. I use this software on my own, resell it and have over 14 years of fine-tuning experience.

For a potentially more economical way, but for those who may require manual rejection of at least two technologies (two purchases instead of one plus labor), maybe not the most economical at the end of the day), I would use some kind of OCR SDK for machine text and a kind of SDK-compatible SDK for ink zones. Depending on the consistency of the location of these zones, you can simply specify the coordinates. If they move, then a deeper analysis of the location of the zones is necessary in order to transfer them to the ICR. ICR-recognized text must be returned in order to be inserted in the appropriate places among the OCRed text.

In my opinion, using a number of tools that can do this out of the box, I would use something out of my kind, instead of writing myself, because there are several serious problems that need to be solved: zone identification, two technologies integration, workflow. We did this integration a few years ago when existing tools were not available.

+1
source

Source: https://habr.com/ru/post/1480881/


All Articles