Following my previous questions, trying to extract text from a PDF file using CGPDF * functions, having:
CGPDFStringRef pdfString
I realized that it can be converted to an array of character codes as follows:
const unsigned char *characterCodes = CGPDFStringGetBytePtr(pdfString);
Now the text I'm trying to extract is written in one of 14 basic type 1 fonts, which is not encoded in the PDF file itself. Therefore, I analyzed the corresponding AFM file for this font, giving me a mapping from the character code to the glyph name and its size as follows:
C 61 ; WX 600 ; N equal ; B 80 138 520 376 ; C 63 ; WX 600 ; N question ; B 129 -15 492 572 ; C 64 ; WX 600 ; N at ; B 77 -15 533 622 ; C 65 ; WX 600 ; NA ; B 3 0 597 562 ; C 66 ; WX 600 ; NB ; B 43 0 559 562 ;
My question is, knowing the character code, say: "61", how do I go from this glyph name: "equals" to NSString @ "=". Especially when this character code is reassigned to another glyph name, say, for example: "question" on the PDF font encoding option.
Previous issues: iOS PDF parsing Type 1 Metric fonts and iOS PDF for simple text analyzer
source share