How to create an accessible PDF file with the Java library PDFBox 2.0.8, which can also be checked using the PAC 2 tool?

Background

I have a small project on GitHub in which I am trying to create a PDF file with a section 508 (section508.gov) that has form elements in a complex table structure. The recommended tool for checking these PDFs is at http://www.access-for-all.ch/en/pdf-lab/pdf-accessibility-checker-pac.html , and my releases of the PDF programs really go through most of these checks. I will also know what each field means at runtime, so adding tags to structure elements should not be a problem.

Problem

The PAC 2 tool has a problem with two separate elements in the output PDF file. In particular, the annotations of the widget of my radio buttons are not nested inside the form structure element, and my marked content is not marked (text and table cells). PAC 2 checks structure element <marked content ...

However, PAC 2 identifies the marked content as an error (i.e., the Text / Path object is not marked). In addition, radio button icons have been detected , but there seems to be no APIs to add them to the form structure element.

What i tried

- , Tagged PDF PDFBox, , PDF /UA ( ). , , , PDF , https://taggedpdf.com/508-pdf-help-center/object-not-tagged/.

PAC 2 PDF Apache PDFBox, -? , API- PDFBox ( )?

Side Note: StackExchange ( ), , ! , . , GitHub, PDF- https://github.com/chris271/UAPDFBox.

1: PDF

* EDIT 2: API- PDFBox PDF PDFDebugger PDF PDF... , , , , ... !

3: PDF.

4: PDF

generated pdf

PDF

compatible pdf

5: PAC 2 /, Tilman Hausherr! , , , .

+4
1

PDF PDFBox , PAC 2. PDF ( ), github. . ( !)

1 ( )

,

//Setup new document
    pdf = new PDDocument();
    acroForm = new PDAcroForm(pdf);
    pdf.getDocumentInformation().setTitle(title);
    //Adjust other document metadata
    PDDocumentCatalog documentCatalog = pdf.getDocumentCatalog();
    documentCatalog.setLanguage("English");
    documentCatalog.setViewerPreferences(new PDViewerPreferences(new COSDictionary()));
    documentCatalog.getViewerPreferences().setDisplayDocTitle(true);
    documentCatalog.setAcroForm(acroForm);
    documentCatalog.setStructureTreeRoot(structureTreeRoot);
    PDMarkInfo markInfo = new PDMarkInfo();
    markInfo.setMarked(true);
    documentCatalog.setMarkInfo(markInfo);

.

//Set AcroForm Appearance Characteristics
    PDResources resources = new PDResources();
    defaultFont = PDType0Font.load(pdf,
            new PDTrueTypeFont(PDType1Font.HELVETICA.getCOSObject()).getTrueTypeFont(), true);
    resources.put(COSName.getPDFName("Helv"), defaultFont);
    acroForm.setNeedAppearances(true);
    acroForm.setXFA(null);
    acroForm.setDefaultResources(resources);
    acroForm.setDefaultAppearance(DEFAULT_APPEARANCE);

XMP PDF/UA.

//Add UA XMP metadata based on specs at https://taggedpdf.com/508-pdf-help-center/pdfua-identifier-missing/
    XMPMetadata xmp = XMPMetadata.createXMPMetadata();
    xmp.createAndAddDublinCoreSchema();
    xmp.getDublinCoreSchema().setTitle(title);
    xmp.getDublinCoreSchema().setDescription(title);
    xmp.createAndAddPDFAExtensionSchemaWithDefaultNS();
    xmp.getPDFExtensionSchema().addNamespace("http://www.aiim.org/pdfa/ns/schema#", "pdfaSchema");
    xmp.getPDFExtensionSchema().addNamespace("http://www.aiim.org/pdfa/ns/property#", "pdfaProperty");
    xmp.getPDFExtensionSchema().addNamespace("http://www.aiim.org/pdfua/ns/id/", "pdfuaid");
    XMPSchema uaSchema = new XMPSchema(XMPMetadata.createXMPMetadata(),
            "pdfaSchema", "pdfaSchema", "pdfaSchema");
    uaSchema.setTextPropertyValue("schema", "PDF/UA Universal Accessibility Schema");
    uaSchema.setTextPropertyValue("namespaceURI", "http://www.aiim.org/pdfua/ns/id/");
    uaSchema.setTextPropertyValue("prefix", "pdfuaid");
    XMPSchema uaProp = new XMPSchema(XMPMetadata.createXMPMetadata(),
            "pdfaProperty", "pdfaProperty", "pdfaProperty");
    uaProp.setTextPropertyValue("name", "part");
    uaProp.setTextPropertyValue("valueType", "Integer");
    uaProp.setTextPropertyValue("category", "internal");
    uaProp.setTextPropertyValue("description", "Indicates, which part of ISO 14289 standard is followed");
    uaSchema.addUnqualifiedSequenceValue("property", uaProp);
    xmp.getPDFExtensionSchema().addBagValue("schemas", uaSchema);
    xmp.getPDFExtensionSchema().setPrefix("pdfuaid");
    xmp.getPDFExtensionSchema().setTextPropertyValue("part", "1");
    XmpSerializer serializer = new XmpSerializer();
    ByteArrayOutputStream baos = new ByteArrayOutputStream();
    serializer.serialize(xmp, baos, true);
    PDMetadata metadata = new PDMetadata(pdf);
    metadata.importXMPMetadata(baos.toByteArray());
    pdf.getDocumentCatalog().setMetadata(metadata);

2 ( )

.

//Adds a DOCUMENT structure element as the structure tree root.
void addRoot() {
    PDStructureElement root = new PDStructureElement(StandardStructureTypes.DOCUMENT, null);
    root.setAlternateDescription("The document root structure element.");
    root.setTitle("PDF Document");
    pdf.getDocumentCatalog().getStructureTreeRoot().appendKid(root);
    currentElem = root;
    rootElem = root;
}

( ) MCID , 3.

//Assign an id for the next marked content element.
private void setNextMarkedContentDictionary(String tag) {
    currentMarkedContentDictionary = new COSDictionary();
    currentMarkedContentDictionary.setName("Tag", tag);
    currentMarkedContentDictionary.setInt(COSName.MCID, currentMCID);
    currentMCID++;
}

( ) . , P.

            //Set up the next marked content element with an MCID and create the containing TD structure element.
            PDPageContentStream contents = new PDPageContentStream(
                    pdf, pages.get(pageIndex), PDPageContentStream.AppendMode.APPEND, false);
            currentElem = addContentToParent(null, StandardStructureTypes.TD, pages.get(pageIndex), currentRow);

            //Make the actual cell rectangle and set as artifact to avoid detection.
            setNextMarkedContentDictionary(COSName.ARTIFACT.getName());
            contents.beginMarkedContent(COSName.ARTIFACT, PDPropertyList.create(currentMarkedContentDictionary));

            //Draws the cell itself with the given colors and location.
            drawDataCell(table.getCell(i, j).getCellColor(), table.getCell(i, j).getBorderColor(),
                    x + table.getRows().get(i).getCellPosition(j),
                    y + table.getRowPosition(i),
                    table.getCell(i, j).getWidth(), table.getRows().get(i).getHeight(), contents);
            contents.endMarkedContent();
            currentElem = addContentToParent(COSName.ARTIFACT, StandardStructureTypes.P, pages.get(pageIndex), currentElem);
            contents.close();
            //Draw the cell text as a P structure element
            contents = new PDPageContentStream(
                    pdf, pages.get(pageIndex), PDPageContentStream.AppendMode.APPEND, false);
            setNextMarkedContentDictionary(COSName.P.getName());
            contents.beginMarkedContent(COSName.P, PDPropertyList.create(currentMarkedContentDictionary));
            //... Code to draw actual text...//
            //End the marked content and append it P structure element to the containing TD structure element.
            contents.endMarkedContent();
            addContentToParent(COSName.P, null, pages.get(pageIndex), currentElem);
            contents.close();

( ) .

//Add a radio button widget.
            if (!table.getCell(i, j).getRbVal().isEmpty()) {
                PDStructureElement fieldElem = new PDStructureElement(StandardStructureTypes.FORM, currentElem);
                radioWidgets.add(addRadioButton(
                        x + table.getRows().get(i).getCellPosition(j) -
                                radioWidgets.size() * 10 + table.getCell(i, j).getWidth() / 4,
                        y + table.getRowPosition(i),
                        table.getCell(i, j).getWidth() * 1.5f, 20,
                        radioValues, pageIndex, radioWidgets.size()));
                fieldElem.setPage(pages.get(pageIndex));
                COSArray kArray = new COSArray();
                kArray.add(COSInteger.get(currentMCID));
                fieldElem.getCOSObject().setItem(COSName.K, kArray);
                addWidgetContent(annotationRefs.get(annotationRefs.size() - 1), fieldElem, StandardStructureTypes.FORM, pageIndex);
            }

//Add a text field in the current cell.
            if (!table.getCell(i, j).getTextVal().isEmpty()) {
                PDStructureElement fieldElem = new PDStructureElement(StandardStructureTypes.FORM, currentElem);
                addTextField(x + table.getRows().get(i).getCellPosition(j),
                        y + table.getRowPosition(i),
                        table.getCell(i, j).getWidth(), table.getRows().get(i).getHeight(),
                        table.getCell(i, j).getTextVal(), pageIndex);
                fieldElem.setPage(pages.get(pageIndex));
                COSArray kArray = new COSArray();
                kArray.add(COSInteger.get(currentMCID));
                fieldElem.getCOSObject().setItem(COSName.K, kArray);
                addWidgetContent(annotationRefs.get(annotationRefs.size() - 1), fieldElem, StandardStructureTypes.FORM, pageIndex);
            }

3

, , , . . (addWidgetContent() addContentToParent()) COSDictionary.

//Adds the parent tree to root struct element to identify tagged content
void addParentTree() {
    COSDictionary dict = new COSDictionary();
    nums.add(numDictionaries);
    for (int i = 1; i < currentStructParent; i++) {
        nums.add(COSInteger.get(i));
        nums.add(annotDicts.get(i - 1));
    }
    dict.setItem(COSName.NUMS, nums);
    PDNumberTreeNode numberTreeNode = new PDNumberTreeNode(dict, dict.getClass());
    pdf.getDocumentCatalog().getStructureTreeRoot().setParentTreeNextKey(currentStructParent);
    pdf.getDocumentCatalog().getStructureTreeRoot().setParentTree(numberTreeNode);
}

, - PAC 2 PDFDebugger.

Verified PDF

Debugger

, , ! , , , .

+1

Source: https://habr.com/ru/post/1695835/


All Articles