Train tesseract 3 to get a table of letters

I am trying to use a simple tesseract 3 OCR , using different parameters to get data from a letter table, where my students marked one as answers to several questions of choice, as shown below:

image with the table of letters used in tesseract

One of the best results:

EEEEEEEEEEEEEEEEEEEEEEEEE
DDDDDDDDDDDDDDDDDDDDDDDDD
CCCCCCCCCCCCCCCCCCCCCCCCC
BBBBBBBEBBBBBBBBBBBBBBBBB
AAAAAAAAAAAAAAAAAAAAAAAAA
6789012345678901234567890
2222333333333344444444445
EEEEE EEEE EE EEE EEEEEEE
DDDDDD DDD DDDDDDDDDDDD
CCCCCCCCCCCCCCCCCC CCCCC
B BEBE BB BBBBBBBBBBBBBBB
AA AAA AAAAA AAAAAAAA
1234567890123455789012345
OOOOOOOOO1111111111222222

I know that I can analyze this .txt and get the best result, but he missed a lot of information and got letters from some colored blocks.

I wanted to know what I can do to get the best result for this case.

I would also like to have a table with colored blocks that appear as another character, for example, for the first and second lines of the image:

01 A B C - E   26 A B C D E
02 A - C D E   27 A B C D E

If you have a similar impression, any information will be appreciated! Thanks in advance!

+4
1

-, , , , . , Tesseract .

-, :

  • -, hOCR . hOCR - HTML . , .

  • Tesseract , 90 °.

, , :

1. ImageMagick:

$ convert CDZjN.png -deskew 40% -contrast-stretch 7%x10% -filter lanczos -resize 250% ooo.png

2. t.conf Tesseract, :

textord_tabfind_vertical_text 0 load_system_dawg 0 load_freq_dawg 0 load_punc_dawg 0 load_number_dawg 0 load_unambig_dawg 0 load_bigram_dawg 0 load_fixed_length_dawgs 0

3. :

$ tesseract ooo.png ooo t.conf ; cat ooo.txt Tesseract Open Source OCR Engine v3.02 with Leptonica 01ABC-E 26ABCDE 02A CDE 27ABCDE o3 BCDE 28ABCDE o4 BCDE 29ABCDE o5 BCDE 30ABCDE 06ABCD. 31ABCDE 07A-CDE 32ABCDE 08ABC.E 33ABCDE o9 BCDE 34ABCDE 10A CDE 35ABCDE 11ABCD 36ABCDE 12ABC E 37ABCDE 13ABC E 38ABCDE 14ABCD 39ABCDE 15 BCDE 40ABCDE 1s BCDE 41ABCDE 17 BCDE 42ABCDE 18ABCD_ 43ABCDE 19AB DE 44ABCDE 20AB DE 45ABCDE 21ABCDE 46ABCDE 22ABCDE 47ABCDE 23ABCDE 48ABCDE 24ABCDE 49ABCDE 25ABCDE 50ABCDE

, .

+6

Source: https://habr.com/ru/post/1533418/


All Articles