How to check a checkbox in a PDF file with the same variable name with iText and Java

I used the iText library for Java to automatically populate a PDF document. The first thing I do is map each field. As soon as I get every field displayed, I save the variable name in Strings to be easily accessible.

So far so good. The problem is that I have a group of six flags with the same variable name. For example, they are called topmostSubform[0].Page2[0].p2_cb01[0] .

With some tests, I could figure out that if I check the first box, then topmostSubform[0].Page2[0].p2_cb01[0] = 1

If I check the second (which automatically cancels the first), then topmostSubform[0].Page2[0].p2_cb01[0] = 2

Then topmostSubform[0].Page2[0].p2_cb01[0] = 3 sequentially until it gets the number 6 , which is the last.

I am using form.setField("topmostSubform[0].Page2[0].p2_cb01[0]", "1"); to fill in the fields. When I fill in the value 1 , the first flag is set, but when I fill in the number 2 , which the second flag should check, it does not work. It doesn’t matter if I choose 2, 3, 4, 5 or 6 , it just doesn’t work, the checkboxes remain empty, and I can’t check them.

Here is the code snippet:

 String _5_1 = "topmostSubform[0].Page2[0].p2_cb01[0]"; AcroFields form = stamper.getAcroFields(); form.setField(_5_1, "3"); 

Please, I need some suggestions.

+6
source share
1 answer

Let me quote from section ISO-32000-1 of section 12.7.3.2 "Field names":

For different field dictionaries the same use of the name of a qualified field is possible if they are descendants of a common ancestor with this name and do not have the names of partial fields (T records). Such field dictionaries are different representations of the same underlying field; they should differ only in properties that indicate their appearance. In particular, field dictionaries with the same full field name must have the same field type (FT), value (V), and default value (DV).

If we apply this to your question: it is possible that different field dictionaries have the same name topmostSubform[0].Page2[0].p2_cb01[0] . Such field dictionaries are different representations of the same field , and they should have the same meaning.

There are two options:

  • If you have a PDF file with field dictionaries with the name ( topmostSubform[0].Page2[0].p2_cb01[0] ), which have different meanings, then you do not have a valid PDF file: it violates ISO-32000-1, which is the official pdf specification file.
  • You may think that you have flags with the same field name and different values, but maybe these flags are actually a radio field with different radio buttons. You may not be using the correct values. Maybe something else is playing. For the SO reader to help you, he needed to see a PDF file.

If option 1 applies, give up hope: you have a bad PDF. Fasten it or throw it away. If option 2 is applied, please share the PDF.

Update after checking the PDF file:

Option 2 applies. You have a hybrid form, which means that the form is described twice inside the PDF, after using AcroForm technology and after using XFA. Please start by reading my answer to the following question: PDFTK and removing the XFA format

When you open a PDF file in Adobe Reader, you will notice that the fields act as if they were radio buttons. When you click one, it will be selected, but when you click another, it will be selected, but the first one will no longer be selected.

What you see is the form described in XFA, and there are some important differences between the XFA form and the AcroForm description. It's not a mistake. It is inherent in hybrid forms.

When you fill out the form using:

 form.setField("topmostSubform[0].Page2[0].p2_cb01[0]", "1"); 

iText fills in AcroForm correctly, but it does not fill out the XFA form, because iText gives a reasonable assumption (and not an exact guess) as to where the corresponding value should be set in the XFA stream (which is actually expressed in XML). For more details: this is explained in Chapter 8 iText in Action - Second Edition .

What I usually do in such cases is exactly what the person who asked if he could safely throw away the XFA part, I: I delete the XFA part:

 AcroFields form = stamper.getAcroFields(); form.removeXfa(); 

This greatly simplifies the work, but it still does not solve your problem. To solve your problem, we need to look inside the PDF:

enter image description here

As you can see in the screenshot (taken from iText RUPS ), there are two different descriptions for the form: you have /Fields (AcroForm description), and you have the /XFA part, which consists of different streams, which if you join to them, form a large XML file.

We also see that where you think there is one field topmostSubform[0].Page2[0].p2_cb01[0] , in fact there are 6 fields:

 topmostSubform[0].Page2[0].p2_cb01[0] topmostSubform[0].Page2[0].p2_cb01[1] topmostSubform[0].Page2[0].p2_cb01[2] topmostSubform[0].Page2[0].p2_cb01[3] topmostSubform[0].Page2[0].p2_cb01[4] topmostSubform[0].Page2[0].p2_cb01[5] 

Now consider these fields.

This is the topmostSubform[0].Page2[0].p2_cb01[0] field:

enter image description here

This is the topmostSubform[0].Page2[0].p2_cb01[0] field:

enter image description here

These are AcroForm flags, but there is an instruction for people that says: select only one. This manual can only be understood by people, not by machines or software.

My first attempt to write an example FillHybridForm failed because I made a similar error for yours. I did not look very carefully at different states of appearance. I thought that the On value of topmostSubform[0].Page2[0].p2_cb01[0] was 0 , from topmostSubform[0].Page2[0].p2_cb01[1] it was 1 and so on. It was not ... The On value of topmostSubform[0].Page2[0].p2_cb01[0] was 1 , of topmostSubform[0].Page2[0].p2_cb01[1] was 2 , etc.

Here's how you can fill in all the checkboxes:

 public void manipulatePdf(String src, String dest) throws DocumentException, IOException { PdfReader reader = new PdfReader(src); PdfStamper stamper = new PdfStamper(reader, new FileOutputStream(dest)); AcroFields form = stamper.getAcroFields(); form.removeXfa(); form.setField("topmostSubform[0].Page2[0].p2_cb01[0]", "1"); form.setField("topmostSubform[0].Page2[0].p2_cb01[1]", "2"); form.setField("topmostSubform[0].Page2[0].p2_cb01[2]", "3"); form.setField("topmostSubform[0].Page2[0].p2_cb01[3]", "4"); form.setField("topmostSubform[0].Page2[0].p2_cb01[4]", "5"); form.setField("topmostSubform[0].Page2[0].p2_cb01[5]", "6"); stamper.close(); reader.close(); } 

Now all the checkboxes are checked. See f8966_filled.pdf :

enter image description here

Of course: as a human being, we know that we should not do this, because we should treat the fields as if they were switches, but there are no technical reasons in the description of AcroForm why we could not. The logic that prevents us from doing this is only present in the XFA description.

This solves your problem if you can drop part of XFA. It will also solve your problem if it is OK to smooth the form, in which case you should add:

 stamper.setFormFlattening(true); 

If you have not selected the above options, you should not throw away the XFA part, but fill in the AcroForm part as described above and use iText to retrieve the XML dataset (see datasets in the first screenshot), update it as the US government expects that you update it, and use iText to return the update dataset to the datasets object.

Phew ... This is one of the longest answers I've ever written on StackOverflow.

+4
source

Source: https://habr.com/ru/post/985360/


All Articles