Tesseract (v3.03) PDF output

Question

Tesseract (v3.03) PDF output

Why is this error returned?

root@amd-3700-2gb ~/ocr_test # tesseract -l dan pdf.png out pdf
Tesseract Open Source OCR Engine v3.03 with Leptonica
Error opening data file /usr/local/share/tessdata/osd.traineddata
Please make sure the TESSDATA_PREFIX environment variable is set to the parent directory of your "tessdata" directory.
Failed loading language 'osd'
Tesseract couldn't load any languages!
Warning: Auto orientation and script detection requested, but osd language failed to load

List of languages

root@amd-3700-2gb ~/ocr_test # tesseract --list-langs
List of available languages (3):
eng
dan
dan-frak

Txt output

This works fine and displays text on out.txt

tesseract -l dan pdf.png out

PDF output

This creates out.pdfbut also reconfigures the specified error and the text searchable in PDF does not make sense

tesseract -l dan pdf.png out pdf

+4

linux ocr tesseract

clarkk Mar 2 '14 at 18:33

source share

1 answer

nguyenq · Accepted Answer · 2014-03-02T22:20:57+0000

The error message is clear: he needs a file osd.traineddata. You can install or download Orientation and Script Discovery Data for Tesseract from https://github.com/tesseract-ocr/tessdata .

Tesseract (v3.03) PDF output

List of languages

Txt output

PDF output

More articles: