PDF search for a specific string in JavaScript action in iText

My goal is to search for JavaScript of this template in PDF annotations. For this, I came up with the following code:

public static void main(String[] args) { try { // Reads and parses a PDF document PdfReader reader = new PdfReader("Test.pdf"); // For each PDF page for (int i = 1; i <= reader.getNumberOfPages(); i++) { // Get a page a PDF page PdfDictionary page = reader.getPageN(i); // Get all the annotations of page i PdfArray annotsArray = page.getAsArray(PdfName.ANNOTS); // If page does not have annotations if (page.getAsArray(PdfName.ANNOTS) == null) { continue; } // For each annotation for (int j = 0; j < annotsArray.size(); ++j) { // For current annotation PdfDictionary curAnnot = annotsArray.getAsDict(j); // check if has JS as described below PdfDictionary AnnotationAction = AnnotationDictionary.GetAsDict(PdfName.A); // test if it is a JavaScript action if (AnnotationAction.Get(PdfName.S).Equals(PdfName.JavaScript)){ // what here? } } } } catch (Exception e) { e.printStackTrace(); } } 

As far as I know, string comparisons are done by the StringCompare library . The fact is that it compares two lines, but I’m interested to know if the JavaScript action in the annotations starts with (or contains) this line: if (this.hostContainer) { try {

So, how do I check if JavaScript contains the above line in the annotations?

EDIT An example page with JS is located at: pdf with JS

+6
source share
1 answer

JavaScript actions in ISO 32000-1 are defined as follows:

12.6.4.16 JavaScript Actions

When invoking a JavaScript action, the corresponding processor must execute a script that is written in the JavaScript programming language. Depending on the nature of the script, various fields of the interactive form in the document may update their values ​​or change their visual manifestations. The JavaScript Reference for Mozilla Client Sites and the JavaScript API Reference for Adobe Acrobat (see Bibliography) provide detailed information on the contents and effects of JavaScript scripts. Table 217 shows the entries for the action dictionary specific to this type of action.

Table 217 - Additional JavaScript Related Entries

Key Type Value

S name (Required) The type of action described by this dictionary; must be javascript for javascript action.

JS text string or text stream (Required) A text string or text stream containing a JavaScript script to execute. To encode the contents of a string or stream, PDFDocEncoding or Unicode encoding is used (the latter specified by the Unicode U + FEFF prefix).

To support the use of parameterized function calls in JavaScript scripts, the JavaScript entry in the PDF document name dictionary (see 7.7.4, “Name Dictionary”) may contain a name tree that displays the string name for JavaScript actions at the document level. When a document is opened, all actions in this name tree should be performed, defining JavaScript functions for use by other scripts in the document.

Thus, if you are interested in knowing whether the JavaScript action begins in the annotation with (or contains) this line: if (this.hostContainer) { try { in the situation

  if (AnnotationAction.Get(PdfName.S).Equals(PdfName.JavaScript)){ // what here? } 

you probably want to check first whether AnnotationAction.Get(PdfName.JS) PdfString or PdfStream , in any case, extracts the contents as a string and checks if it is or any of the functions that it calls (the function can be defined in the tree JavaScript namespace) contains a string that you execute using the usual string comparison methods.

Code example

I took your code, cleaned it a bit (in particular, it was a combination of C # and Java) and added the code as described above to check the immediate JavaScript code in the annotation action element:

Java version

 System.out.println("file.pdf - Looking for special JavaScript actions."); // Reads and parses a PDF document PdfReader reader = new PdfReader(resource); // For each PDF page for (int i = 1; i <= reader.getNumberOfPages(); i++) { System.out.printf("\nPage %d\n", i); // Get a page a PDF page PdfDictionary page = reader.getPageN(i); // Get all the annotations of page i PdfArray annotsArray = page.getAsArray(PdfName.ANNOTS); // If page does not have annotations if (annotsArray == null) { System.out.printf("No annotations.\n", i); continue; } // For each annotation for (int j = 0; j < annotsArray.size(); ++j) { System.out.printf("Annotation %d - ", j); // For current annotation PdfDictionary curAnnot = annotsArray.getAsDict(j); // check if has JS as described below PdfDictionary annotationAction = curAnnot.getAsDict(PdfName.A); if (annotationAction == null) { System.out.print("no action"); } // test if it is a JavaScript action else if (PdfName.JAVASCRIPT.equals(annotationAction.get(PdfName.S))) { PdfObject scriptObject = annotationAction.getDirectObject(PdfName.JS); if (scriptObject == null) { System.out.print("missing JS entry"); continue; } final String script; if (scriptObject.isString()) script = ((PdfString)scriptObject).toUnicodeString(); else if (scriptObject.isStream()) { try ( ByteArrayOutputStream baos = new ByteArrayOutputStream() ) { ((PdfStream)scriptObject).writeContent(baos); script = baos.toString("ISO-8859-1"); } } else { System.out.println("malformed JS entry"); continue; } if (script.contains("if (this.hostContainer) { try {")) System.out.print("contains test string - "); System.out.printf("\n---\n%s\n---", script); // what here? } else { System.out.print("no JavaScript action"); } System.out.println(); } } 

(Test SearchActionJavaScript , testSearchJsActionInFile method)

C # version

 using (PdfReader reader = new PdfReader(sourcePath)) { Console.WriteLine("file.pdf - Looking for special JavaScript actions."); // For each PDF page for (int i = 1; i <= reader.NumberOfPages; i++) { Console.Write("\nPage {0}\n", i); // Get a page a PDF page PdfDictionary page = reader.GetPageN(i); // Get all the annotations of page i PdfArray annotsArray = page.GetAsArray(PdfName.ANNOTS); // If page does not have annotations if (annotsArray == null) { Console.WriteLine("No annotations."); continue; } // For each annotation for (int j = 0; j < annotsArray.Size; ++j) { Console.Write("Annotation {0} - ", j); // For current annotation PdfDictionary curAnnot = annotsArray.GetAsDict(j); // check if has JS as described below PdfDictionary annotationAction = curAnnot.GetAsDict(PdfName.A); if (annotationAction == null) { Console.Write("no action"); } // test if it is a JavaScript action else if (PdfName.JAVASCRIPT.Equals(annotationAction.Get(PdfName.S))) { PdfObject scriptObject = annotationAction.GetDirectObject(PdfName.JS); if (scriptObject == null) { Console.WriteLine("missing JS entry"); continue; } String script; if (scriptObject.IsString()) script = ((PdfString)scriptObject).ToUnicodeString(); else if (scriptObject.IsStream()) { using (MemoryStream stream = new MemoryStream()) { ((PdfStream)scriptObject).WriteContent(stream); script = stream.ToString(); } } else { Console.WriteLine("malformed JS entry"); continue; } if (script.Contains("if (this.hostContainer) { try {")) Console.Write("contains test string - "); Console.Write("\n---\n{0}\n---", script); // what here? } else { Console.Write("no JavaScript action"); } Console.WriteLine(); } } } 

Exit

When starting any version with your sample, the file receives:

 file.pdf - Looking for special JavaScript actions. Page 1 Annotation 0 - contains test string - --- if (this.hostContainer) { try { this.hostContainer.postMessage(['newPage', 'pp_vii', 0]); } catch(e) { console.println(e); }}; --- Annotation 1 - contains test string - --- if (this.hostContainer) { try { this.hostContainer.postMessage(['newPage', 'pp_ix', 0]); } catch(e) { console.println(e); }}; --- Annotation 2 - contains test string - --- if (this.hostContainer) { try { this.hostContainer.postMessage(['newPage', 'pp_xi', 0]); } catch(e) { console.println(e); }}; --- Annotation 3 - contains test string - --- if (this.hostContainer) { try { this.hostContainer.postMessage(['newPage', 'pp_3', 0]); } catch(e) { console.println(e); }}; --- Annotation 4 - contains test string - --- if (this.hostContainer) { try { this.hostContainer.postMessage(['newPage', 'pp_15', 0]); } catch(e) { console.println(e); }}; --- Annotation 5 - contains test string - --- if (this.hostContainer) { try { this.hostContainer.postMessage(['newPage', 'pp_37', 0]); } catch(e) { console.println(e); }}; --- Annotation 6 - contains test string - --- if (this.hostContainer) { try { this.hostContainer.postMessage(['newPage', 'pp_57', 0]); } catch(e) { console.println(e); }}; --- Annotation 7 - contains test string - --- if (this.hostContainer) { try { this.hostContainer.postMessage(['newPage', 'pp_81', 0]); } catch(e) { console.println(e); }}; --- Annotation 8 - contains test string - --- if (this.hostContainer) { try { this.hostContainer.postMessage(['newPage', 'pp_111', 0]); } catch(e) { console.println(e); }}; --- Annotation 9 - contains test string - --- if (this.hostContainer) { try { this.hostContainer.postMessage(['newPage', 'pp_136', 0]); } catch(e) { console.println(e); }}; --- Annotation 10 - contains test string - --- if (this.hostContainer) { try { this.hostContainer.postMessage(['newPage', 'pp_160', 0]); } catch(e) { console.println(e); }}; --- Annotation 11 - contains test string - --- if (this.hostContainer) { try { this.hostContainer.postMessage(['newPage', 'pp_197', 0]); } catch(e) { console.println(e); }}; --- Annotation 12 - contains test string - --- if (this.hostContainer) { try { this.hostContainer.postMessage(['newPage', 'pp_179', 0]); } catch(e) { console.println(e); }}; --- Annotation 13 - contains test string - --- if (this.hostContainer) { try { this.hostContainer.postMessage(['newPage', 'pp_201', 0]); } catch(e) { console.println(e); }}; --- Annotation 14 - contains test string - --- if (this.hostContainer) { try { this.hostContainer.postMessage(['newPage', 'pp_223', 0]); } catch(e) { console.println(e); }}; --- Page 2 No annotations. Page 3 No annotations. 
0
source

Source: https://habr.com/ru/post/1013077/


All Articles