Access font files in a PDF file

We are currently working with several publishers to create online books from their PDF files. Our legacy application uses flex, so for this we convert PDF to SWF files using PDF2SWF SWFTools .

The problem we are facing is that the text inside the SWF document is not highlighted by our flash reader when the user searches. After a quick investigation, we found that when extracting the text, we need to embed the fonts used in the PDF:

http://wiki.swftools.org/wiki/How_do_I_highlight_text_in_the_SWF%3F

pdf2swf -F $YOUR_FONTS_DIR$ -f input.pdf -o output.swf 

As you can see from the above code, we need the path to the font directory containing the fonts found in this PDF file.

Since we will convert a large number of PDF files, is it possible to access font files directly through PDF, and not have many fonts stored in our application?

<i> Additional information

Our application is written in Java.

We are currently using PDFBox and Ghostscript in the application, so if any solutions use these libraries, this will be the preferred option, but we are open to all ideas.

+6
source share
1 answer

PDF files do not contain font files; they may not even contain fonts at all, although this is rare. The embedded font data can be in different formats:

  • Type 1 PostScript Fonts
  • Type 3 PostScript
  • TrueType Fonts
  • PostScript CFF Fonts
  • CIDFonts with PostScript Type 1 Outlines
  • CIDFonts with PostScript Type 3 Outlines
  • CIDFonts with TrueType Outlines
  • CIDFonts with CFF Contours
  • CIDFonts with Bitmaps

Will your application read all of these font formats? If you want to use them, you must use the fonts embedded in the PDF file, because very often they will be subset fonts and are specially encoded, which means that even if you have the original font, you cannot use it because the encoding will be incorrect .

Of course, maybe all these PDF files are created in a consistent way and do not use embedded fonts, but I have doubts ....

+7
source

Source: https://habr.com/ru/post/905316/


All Articles