Extract page number from PDF file

I have a PDF document that can be created by extracting multiple pages from another PDF document. I am wondering how can I get the page number. Since the start page number is 572, which for the complete PDF document should have been 1.

Do you think converting PDF to XMl will sort this problem?

+1
source share
2 answers

Most likely the document contains an entry /PageLabels in the Document Catalog . This entry also indicates the numbering style of page numbers and start number.

You may need to update the start number or delete the entry completely. The following document contains additional information about the entry /PageLabels :

Example 2 in a document can be useful if you decide to update a record.

+1
source

Finally realized using iText. There would be no without Bovrovsky hint. Tons of thanks to him. Code Posting:

 public void process(PdfReader reader) { PRIndirectReference obj = (PRIndirectReference) dict.get(com.itextpdf.text.pdf.PdfName.PAGELABELS); System.out.println(obj.getNumber()); PdfObject ref = reader.getPdfObject(obj.getNumber()); PdfArray array = (PdfArray)((PdfDictionary) ref).get(com.itextpdf.text.pdf.PdfName.NUMS); System.out.println("Start Page: " + resolvePdfIndirectReference(array, reader)); } private static int resolvePdfIndirectReference(PdfObject obj, PdfReader reader) { if (obj instanceof PdfArray) { PdfDictionary subDict = null; PdfIndirectReference indRef = null; ListIterator < PdfObject > itr = ((PdfArray) obj).listIterator(); while (itr.hasNext()) { PdfObject pdfObj = itr.next(); if (pdfObj instanceof PdfIndirectReference) indRef = (PdfIndirectReference) pdfObj; if (pdfObj instanceof PdfDictionary) { subDict = (PdfDictionary) pdfObj; break; } } if (subDict != null) { return resolvePdfIndirectReference(subDict, reader); } else if (indRef != null) return resolvePdfIndirectReference(indRef, reader); } else if (obj instanceof PdfIndirectReference) { PdfObject ref = reader.getPdfObject(((PdfIndirectReference) obj).getNumber()); return resolvePdfIndirectReference(ref, reader); } else if (obj instanceof PdfDictionary) { PdfNumber num = (PdfNumber)((PdfDictionary) obj).get(com.itextpdf.text.pdf.PdfName.ST); return num.intValue(); } return 0; } 
+1
source

Source: https://habr.com/ru/post/1275960/


All Articles