Pdftk Error: Could not open the PDF file:

I am using the pdftk library to extract form fields from pdf. Everything just works fine, except for one problem that I received a pdf file pdf file link . which causes the error below:

 Error: Failed to open PDF file: http://www.uscis.gov/sites/default/files/files/form/i-9.pdf Done. Input errors, so no output created. 

for this is

 root@ri8-MS-7788 :/home/ri-8# pdftk http://192.168.1.43/form/i-9.pdf dump_data_fields 

the same command works for all other forms.

Attempt1

I tried to encrypt the pdf version to an unsafe version, but it causes the same error. here is the team

 pdftk http://192.168.1.43/forms/i-9.pdf input_pw foopass output /var/www/forms/un-i-9.pdf 

Update

this is my complete function to handle this

 public function Formanalysis($pdfname) { $pdffile=Yii::app()->getBaseUrl(true).'/uploads/forms/'.$pdfname; exec("pdftk ".$pdffile." dump_data_fields 2>&1", $output,$retval); //got an error for some pdf if these are secure if(strpos($output[0],'Error') !== false) { $unsafepdf=Yii::getPathOfAlias('webroot').'/uploads/forms/un-'.$pdfname; //echo "pdftk ".$pdffile." input_pw foopass output ".$unsafepdf; exec("pdftk ".$pdffile." input_pw foopass output ".$unsafepdf); exec("pdftk ".$unsafepdf." dump_data_fields 2>&1", $outputunsafe,$retval); return $outputunsafe ; //$response=array('0'=>'error','error'=>$output[0]); //return $response; } //if (strpos($output[0],'Error') !== false){ echo "error to run" ; } // this is the option to handle error return $output; } 
+6
source share
2 answers

this may be a small solution, but should work for you. as @bruno said it was an encrypted file. You must decrypt this before using for pdftk . To do this, I found a way to decrypt the qpdf free qpdf source library to decrypt the PDF file, delete the owner and user password, etc. And much more. You can find it here Qpdf . install it on your system. and run this command

 qpdf --decrypt input.pdf output.pdf 

then use the output file in the pdftk . he should work.

+6
source

PdfTk is a tool that was created by compiling an obsolete version of iText into an executable file using GNU Compiler for Java (GCJ) (PdfTk is not supported by iText Group NV).

I studied your PDF and used two technologies that iText did not support at the time PdfTk was created: XFA and compressed cross-reference tables.

The latter is the cause of your problem. PdfTk expects your file to complete as follows:

 xref 0 7 0000000000 65535 f 0000000258 00000 n 0000000015 00000 n 0000000346 00000 n 0000000146 00000 n 0000000397 00000 n 0000000442 00000 n trailer <</ID [<c8bf0ac531b0fc7b5b9ec5daf0296834><ec4dde54d00305ebbec62f3f6bbca974>]/Root 5 0 R/Size 7/Info 6 0 R>> %iText-5.4.3 startxref 595 %%EOF 

This startxref marks the offset of the xref byte where the cross-reference table begins. This table contains the byte offsets of all objects in the PDF.

When you look at the PDF you are linking to, you see that it ends as follows:

 64 0 obj <</DecodeParms<</Columns 5/Predictor 12>>/Encrypt 972 0 R/Filter/FlateDecode/ID[<85C47EA3EFE49E4CB0F087350055FDDC><C3F1748360D0464FBA02D711DE864630>]/Info 970 0 R/Length 283/Root 973 0 R/Size 971/Type/XRef/W[1 3 1]>>stream hÞìÒ±JQЙ·»7J¢©ÕØ(Xþ„ù »h%¤É¤¶"€mZ+;ÁN,,ÁÆ6 XÁ&‚("î½YŒI'Bî‡áμ]ö1Áð÷³cfþ‹ûÐÚLî`z„Ýôœùw÷N×X?ÙkNv`hÁÒj¦G[œiÀå»›œ?b½Än…ÉëàÍþ gY—i7WW‡òj®îͰu¸Ò‡Ñ:óÆÛ™ñÎë&'×݈§ü†ù!ÿñ€ù%,\ácçÙ9˜ì±Þ€S¼Ãd—‰Áy~×.ø¶Åìþßn_˜$9Ôüw£X9#åxzçgRüüóÙwÝ¡œÄNJ©½'Ú+©½'R{%µWR{%ÿ·á";`_ z6Ø endstream endobj startxref 116 %%EOF 

In this case, startxref still refers to where the first cross-reference table begins (this is a linearized PDF), but the cross-reference table is stored inside the object and this object is compressed (see the gibberish between stream and endstream ).

Compressed cross-reference tables and compressed objects were introduced in PDF 1.5 (2003), but they are not supported by PdfTk. You will need to find a tool that can handle such streams (for example, the recent version of iText, which is real material compared to PdfTk), or you need to save the PDF file in PDF 1.4 format before viewing it with PdfTk (but you lose XFA because XFA was also introduced in PDF 1.5).

Update:

Since you are requesting form fields, I am adding the following attachment:

enter image description here

This screenshot was taken using iText RUPS (which proves that iText can open the document). On the right you see that the same form is defined twice:

enter image description here

If you walk down the tree under Fields , you will find all the fields that are stored in PDF using AcroForm technology. On the left you can see a description of such a field:

enter image description here

If you look under XFA, you will notice that the same form is also defined using the XML Forms architecture. If you click datasets , you will see an XML description of the dataset in the bottom panel:

enter image description here

All of this information can be programmed using iText (Java) or iTextSharp (C #). PdfTk is just a tool based on a very old version of this technology.

+7
source

Source: https://habr.com/ru/post/987277/


All Articles