Extract text between two bookmarks using Apache PdfBox

I am using Apache PDFBox to read a PDF document that has a bookmark-defined hierarchy. The hierarchy is in the form of a tree with content only at the sheet level.

Extract text between two sheet-level tabs using the following code:

Stripper.setStartBookmark(), Stripper.setEndBookmark(), Stripper.writeText()), 

Instead, returns text throughout the page. In short, my problem is similar to the problem mentioned in this thread .

Is there a way to extract content between two bookmarks?

If so, what should be the change in my code?

+6
source share
1 answer

I assume that your bookmark does not contain the correct data.

It appears that the bookmark you are using only points to the page where your content begins, and not the location on the page.

Here is an example of a bookmark containing location data:

 <Title Action="GoTo" Style="bold" Page="2 FitH 518"> Title Name </Title> 
0
source

Source: https://habr.com/ru/post/910054/


All Articles