PDFBox: work with very large PDF files.

I work with very large PDF files larger than 7 GB. PDF files have up to 20,000 pages and many full-color color images. I would like to use the PDFBox to work with PDF files, but due to the size, I get OutOfMemoryError when I try to open PDF files.

I am working with pdfbox-app-1.6.0 version on Windows 7 using Intellij, java 6.

At first I tried to write a simple program that just opened a PDF file in PDDocument and moved each page to another PDDocument: http://ideone.com/arKhB

Next, I tried using the PDFBox CopyDoc example.

Both examples end in memory.

I assume this is because the PDFBox is trying to read the entire document in memory. Is there a way to open it only one page at a time? I know this will be slower, but at the moment I can’t process anything.

+6
source share
1 answer

In versions 2.0. * open the pdf as follows:

PDDocument doc = PDDocument.load(file, MemoryUsageSetting.setupTempFileOnly()); 

This allows you to configure the use of memory buffering only for the use of temporary files (without main memory) with a limited size.

+3
source

Source: https://habr.com/ru/post/919533/


All Articles