Fix missing ToUniCode map in PDF

I have a pdf file from which I want to extract text. But due to the lack of a toUniCode card, I cannot do this.

./pdffonts /Users/subhashlengare/Downloads/pqr39_abc.pdf
name                                 type              emb sub uni object ID
------------------------------------ ----------------- --- --- --- ---------
ATRTHG+TT1CABt00                     TrueType          yes yes no      23  0
VFQVYH+TT1CAEt00                     TrueType          yes yes no      19  0
ODNMDG+TT1CAFt00                     TrueType          yes yes no      31  0
DXGYRQ+TT1CB0t00                     TrueType          yes yes no      27  0
VFQVYH+TT1CB1t00                     TrueType          yes yes no       7  0
ArialMT                              TrueType          no  no  no     295  0
NXBBUP+TT1CC0t00                     TrueType          yes yes no      53  0
NXBBUP+TT1CC1t00                     TrueType          yes yes no      65  0
KDGXKF+TT1CC4t00                     TrueType          yes yes no     104  0
VRCBAT+TT1CC5t00                     TrueType          yes yes no     100  0
QTNBCJ+TT1CC2t00                     TrueType          yes yes no      88  0
NXBBUP+TT1CC6t00                     TrueType          yes yes no      96  0
NXBBUP+TT1CC7t00                     TrueType          yes yes no     116  0
NXBBUP+TT1CC8t00                     TrueType          yes yes no     128  0

How can we add a missing ToUniCode map so that text extraction works well?

+4
source share

Source: https://habr.com/ru/post/1678100/


All Articles