I work with very large PDF files larger than 7 GB. PDF files have up to 20,000 pages and many full-color color images. I would like to use the PDFBox to work with PDF files, but due to the size, I get OutOfMemoryError when I try to open PDF files.
I am working with pdfbox-app-1.6.0 version on Windows 7 using Intellij, java 6.
At first I tried to write a simple program that just opened a PDF file in PDDocument and moved each page to another PDDocument: http://ideone.com/arKhB
Next, I tried using the PDFBox CopyDoc example.
Both examples end in memory.
I assume this is because the PDFBox is trying to read the entire document in memory. Is there a way to open it only one page at a time? I know this will be slower, but at the moment I canβt process anything.
Pengo source share