What you see marked as PdfID0 and PdfID1 on the pdftk metadata is part of the following PDF trailer code at the end of the corresponding PDF file (example):
trailer << /Size 32 /Root 24 R /Info 19 R /ID [ <28c71a8d7790a4d3e85ce879a90dec0> <4c5865d36c7a381e6166d5e362d0aafc> ] >> startxref 81799 %%EOF
An entry /ID in the trailer dictionary is only required if an Encrypt entry is present; otherwise, this is an optional key.
It is described by the PDF specification as:
"An array of two byte lines constituting the file identifier (see 14.4,“ File Identifiers ") for the file. If there is an encryption record, this array and two byte lines must be direct objects and must be unencrypted."
and besides:
"The first byte line must be a constant identifier based on the contents of the file at the time of its initial creation and does not change when the file is updated. The second byte line must be a changed identifier based on the contents of the file at the time it was last updated. When the file is first written, both identifiers must have if both identifiers coincide, when the link to the file is resolved, it is very likely that the correct and immutable file was If only the first identifier matches, b la found another version of the correct file. "
And this is NOT an optional hash . Here is what the ISO PDF specification offers (not “prescribes"):
"To ensure that file identifiers are unique, they must be computed using a message digest algorithm such as MD5 (described in Internet RFC 1321, MD5 Message-Digest Algorithm, see Bibliography)
- Current time
- A string representation of the file location, usually this is the path
- File size in bytes
- Values of all entries in the file information dictionary (see 14.3.3, “Document Information Dictionary”)
There are a few spots in the generated PDF files that may change with each new run. These keys are in the document information dictionary ( /Info specified in the trailer)
can be updated every time you create or modify a PDF.
Therefore, using your own MD5 checksum over the released PDF to check for new / changed files will not work unless you make sure that you at least "normalize" /CreationDate and /ModDate , as well as /ID before creating the MD5 hash.
Update: As user mkl correctly noted in the commentary on this answer, the /CreationDate and /ModDate of the /Info dictionary (as well as the /ID information) usually have equivalent pieces of information contained in the XML metadata embedded in the PDF. You can display the full XML metadata using the pdfinfo utility, for example:
pdfinfo -meta your.pdf