Converting PDF to JPG using ImageMagick in PHP gives an odd spacing between letters

I am trying to convert PDF to JPG with a call to PHP exec() , which looks like this:

 convert page.pdf -resize 716x716 page.jpg 

For some reason, JPG comes out with harsh text, even though the PDF file looks simple in Acrobat and Mac Preview. Here is the original PDF:

http://whit.info/dev/conversion/page.pdf

and here is the amber conclusion:

http://whit.info/dev/conversion/page.jpg

The server is a LAMP stack with PHP 5 and ImageMagick 6.2.8.

Can you help this awkward Geek?

Thanks in advance,

Iota

+4
source share
2 answers

ImageMagick is just about to invoke Ghostscript to convert this PDF to image. If you run gs in pdf, you will get the same bad output.

I suspect that Ghostscript does not handle PDF TrueType fonts very well. If you can change your output to either insert type 1 fonts or use the "main" PostScript font, you will get better results.

+4
source

I suspect this is an encoding / width issue. Both that, and another, but I cannot tell why.

Here are some suspects:

First

The text stream is defined in UTF-16 LE. charNULLcharNULL using the standard syntax for a line drawing command:

(some text) Tj

There is a way to avoid any old character value in string (). You can also define strings in hexadecimal:

<203245> Tj

None of the methods are used, just dubious built-in zeros. This can cause a problem in GS if it tries to work with pointers to char without associated lengths.

Second

The array of width is dumb. This way you can define the width in groups:

[ 32 [450 525 500] 37 [600 250] 40 [0] ]

This defines 32: 450
33: 525
34: 500
37: 600
38: 250
40: 0

These fonts define their consecutive width in separate arrays. Not illegal, but definitely wasteful / stupid, and if GS were encoded in EXPECT gaps between arrays, this could cause an error.

The array also has some extremely suspicious values. 32 - 126 are determined sequentially, but then it starts to jump from all sides: ...126 [600] 8364 [500] 8216 [222] 402 [500] 8222 [389]. 8230 [1000] 8224 [444].. ...126 [600] 8364 [500] 8216 [222] 402 [500] 8222 [389]. 8230 [1000] 8224 [444].. and then returns to sequential from 160 to 255.

Just weird.

Third

I'm not even sure, but the CIDToGIDMap stream contains a lot of errors.

Bottom row

These fonts are fishy. And I never heard of "Bellflower Books" or "UFPDF 0.1"

This version number makes me cringe. It should also make you cringe.

Google for "UFPDF" I found this note from the author:

Note. I wrote UFPDF as an experiment, not as a finished product. If you have problems with this, do not wait for me for support. Patches are welcome, though, but I don't have much time to maintain this.

UFPDF is a PHP library that sits on top of FPDF. 0.1. Just run away.

+2
source

Source: https://habr.com/ru/post/1343279/


All Articles