How to determine if a particular font contains a specific character in PDF :: API2

I use PDF::API2 in my Perl application to embed OCR output for the corresponding image, which allows PDF::API2 to search for the resulting PDF file, since OCR output can be extracted using pdftotext .

Currently, as soon as the application sees a non-ASCII character in the OCR output, it switches from basic PDF fonts to TTF. However, this is really hacky, as the main fonts include most Western European characters. TTF is required only for Greek, Russian, Japanese, etc.

How can I determine if a particular font contains a specific one (including a CMAP table, so extraction using pdftotext works)?

+4
source share
1 answer

Have you tried glyph-specific methods?

http://search.cpan.org/dist/PDF-API2/lib/PDF/API2/Resource/BaseFont.pm#GLYPH_RELATED_METHODS

Otherwise, is it possible to display the glyph (to a separate document) and measure it?

+1
source

Source: https://habr.com/ru/post/1397616/


All Articles