I use PDF::API2 in my Perl application to embed OCR output for the corresponding image, which allows PDF::API2 to search for the resulting PDF file, since OCR output can be extracted using pdftotext .
Currently, as soon as the application sees a non-ASCII character in the OCR output, it switches from basic PDF fonts to TTF. However, this is really hacky, as the main fonts include most Western European characters. TTF is required only for Greek, Russian, Japanese, etc.
How can I determine if a particular font contains a specific one (including a CMAP table, so extraction using pdftotext works)?
source share