Comparing PDF Content with Ruby

Right now I am writing a Ruby script / app that helps me compile LaTeX (at least) in PDF format. One of the possibilities I want to have is that it should work pdflatexiteratively until the PDF converges (as I assume).

The idea is to compare PDFs generated in one iteration with one of the previous iteration using their fingerprints. In particular, I am currently using Digest::MD5.file(.).

The problem is that it never converges. A (Hopefully) the culprit is a PDF timestamp that is set to seconds, at least on pdflatex. Since runs pdflatexusually take more than one second, the result continues to change. That is, I expect the PDF to be equal to the timestamp after some point. This assumption may be wrong; hints appreciated.

What can I do about this? So far my main ideas:

  • Use a library that can do the job.
  • Delete metadata and only hash of the PDF file
  • Replace timestamps with a fixed value before comparison

Do you have more ideas or even solutions? Solutions should use only free software that runs on Linux. Those that use only Ruby are preferred, but using external software is acceptable.

By the way, I don’t know exactly how PDF is encoded, but I suspect that simply comparing the contained text will not work for me, since in subsequent iterations only graphics or links can change.

Perhaps related:

+3
source share
4 answers

, , , :

cat file.pdf | grep -a -v "/CreationDate\|/ModDate\|/ID" | md5sum

Ruby

`cat file.pdf | grep -a -v "/CreationDate\\|/ModDate\\|/ID" | md5sum`.strip

PDF , , PDF .

YMMW, PDF. diff -a file1.pdf file2.pdf, , .

+4

, latexmk? perl script, , , -. , .log .aux , tex, , ( tex - - mkindex xindy, , ).

( 3546 , ) Ruby script/app.

0

, ), ( tex, aux, bib,...), pdf ( sudh ).

, (tex, aux, bib,...), pdf.

  • A, .
  • B, , .
  • A
  • B A, . , (aux, bib,...) B, tex, , . . 3.
0

[ : Identikal]

For the project, we had a requirement to compare two PDFs in pure Ruby. The spelling of a gem called a photobot has ended . This gem compares two unencrypted PDF files and returns trueif they are identical, falseotherwise.

After installing the gem, you can compare two PDF files, as shown below:

$ identikal file_a.pdf file_b.pdf
true
0
source

Source: https://habr.com/ru/post/1787765/


All Articles