Extract text from PDF to bookmarks

I need to extract text from PDF right where the bookmark is.

PDFBox retrieves the entire page where the bookmark is located, as described here .

But I need to extract the text starting with the bookmark.

+3
source share
1 answer

I believe iText can handle this.

Rectangle2D bookmarkRect = getRectFromBookmark(someBookmarkThingy);

FilteredTextRenderListener filter = 
  new FilteredTextRenderListener( new LocationTextExtractionStrategy(), 
                                  new RegionTextRenderFilter( bookmarkRect ));

String bookmarkText = PdfTextExtractor.getTextFromPage(reader, pageNum, filter);

someBookmarkThingythere will probably be the PdfDictionary of the bookmark in question.

WARNING Bookmarks can actually contain almost any action. Usually they contain one of several variations of the GoTo * action.

GoTo , , . , , , PDF. , , . , .

, , PDF, 12.6.4.2 "Go-to Actions". . , , "", 12.3.2. :

  • [pageRef/XYZ left top zoom]
  • [pageRef/Fit]
  • [pageRef/FitH top]
  • [pageRef/FitV left]
  • [pageRef/FitR ]
  • [pageRef/FitB]
  • [pageRef/FitBH top]
  • [pageRef/FitBV ]

!

0

Source: https://habr.com/ru/post/1766211/