How to determine the type of font to use tesseract in recognition (and not in the learning process)?

Question

How to determine the type of font to use tesseract in recognition (and not in the learning process)?

For a downloadable English dataset, I do

cat tessdata/eng.* | egrep -o ".*ttf" | sort -u

and get a list of all the fonts that were used in teaching English

Andale_Mono.ttf
Arial_Black.ttf
Arial_Bold.ttf
Arial.ttf
buttf
Comic_Sans_MS_Bold.ttf
Comic_Sans_MS.ttf
Courier_New_Bold.ttf
Courier_New.ttf
Georgia_Bold.ttf
Georgia.ttf
Gottf
Impact.ttf
Times_New_Roman_Bold.ttf
Times_New_Roman.ttf
Trebuchet_MS_Bold.ttf
Trebuchet_MS.ttf
ttf
Verdana_Bold.ttf
Verdana.ttf

Now I want to recognize text where I already know the type of font, so I want to limit it to recognition. I tried:

api.SetVariable("classify_font_name", "Arial_Bold.ttf");

but I don’t see a better result. Can someone tell me how to do this or if it is possible?

+4

c ++ fonts ocr tesseract true-type-fonts

Kenyakorn ketsombut May 2, '14 at 5:29

source share

1 answer

nguyenq · Answer 1 · 2014-05-03T00:47:18+0000

LTRResultIterator WordFontAttributes . , , . . API Tesseract.

How to determine the type of font to use tesseract in recognition (and not in the learning process)?

More articles: