Convert searchable PDF to PDF without searching

I have a searchable PDF and need to convert it to non-search.

I tried using Ghostscript and changing it to JPEG and then back to PDF, which does the trick, but the file size is too large and unacceptable.

I tried using Ghostscript to convert PDF to PS first and then to PDF, which also does the trick, but the quality is not good enough.

gswin32.exe -q -dNOPAUSE -dBATCH -dSAFER -sDEVICE=pswrite -r1000 -sOutputFile=out.ps in.pdf gswin32.exe -q -dNOPAUSE -dBATCH -dSAFER -dDEVICEWIDTHPOINTS=596 -dDEVICEHEIGHTPOINTS=834 -dPDFSETTINGS=/ebook -sDEVICE=pdfwrite -sOutputFile=out.pdf out.ps 

Is there any way to give good quality pdf?

Alternatively, is there an easier way to convert searchable PDF to non-search?

+6
source share
3 answers

You can use Ghostscript to achieve this. You need to follow 2 steps:

  • Convert a PDF to a PostScript file in which all the fonts used are converted to outline shapes. The key here is the -dNOCACHE paragraph:

      gs -o somepdf.ps -dNOCACHE -sDEVICE = pswrite somepdf.pdf 
  • Convert the PS back to PDF (and possibly delete the intermediate PS again):

      gs -o somepdf-with-outlines.pdf -sDEVICE = pdfwrite somepdf.ps
     rm somepdf.ps 

Please note that the resulting PDF is likely to be larger than the original. (And without additional command-line options, all images in the original PDF file will most likely also be converted according to the default Ghostscript settings, unless you add more command-line options to do otherwise. But the quality should be better than your own attempt to use ghostscript ...)


Update

Apparently, from version 9.15 (will be released in September / October 2014) Ghostscript will support the new command line parameter :

  -dNoOutputFonts 

which will cause the output devices pdfwrite , ps2write and eps2write β€œsmooth out” the glyphs into the β€œbasic” marking operations (instead of writing the fonts in the output). "

This means that the above two steps can be avoided, and the desired result will be achieved with a single command:

  gs -o somepdf-with-outlines.pdf -dNoOutputFonts -sDEVICE=pdfwrite somepdf.pdf 

Caveats: I tested this with multiple input files using self-compiled Ghostscript based on current Git sources. In each case, it worked flawlessly.

+11
source

Possible way to create a non-printable vector PDF file from a searchable PDF file

contained in poppler utils

 for f in *.pdf; do pdftocairo -svg $f; done 

3. delete ALL pdf in the folder

4. then the batikrasterizer

convert ALL svg to pdf (this time received pdf files will be saved in vector format, but cannot be searched)

 java -jar ./batik-rasterizer.jar -m application/pdf *.svg 

final step : merge all the resulting single-page pd into a single multi-page PDF file

 pdftk *.pdf cat output out.pdf 
+3
source

I think converting to an image such as jpg is the way, maybe it is worth converting the image am, optimizing / reducing the size of the images, and then creating a PDF with them?

0
source

Source: https://habr.com/ru/post/907455/


All Articles