Pdfminer3k does not have a method called create_pages in PDFPage

Since I want to switch from python 2 to 3, I tried working with pdfmine.3kr in python 3.4. It seems they all edited it. Their change logs do not reflect the changes they made, but I was not successful in pdf analysis using pdfminer3k. For instance:

They moved the PDFDocument to a pdfparser file (sorry if I pronounce it wrong). PDFPage is used to create the create_pages method, which is now missing. All that I see inside PDFPage are internal methods. Does anyone have a working example pdfminer3k? There seems to be no new documentation to reflect any changes.

+5
source share
2 answers

If you are interested in reading text from a pdf file, the following code works with pdfminer3k using python 3.4.

from pdfminer.pdfparser import PDFParser, PDFDocument from pdfminer.pdfinterp import PDFResourceManager, PDFPageInterpreter from pdfminer.converter import PDFPageAggregator from pdfminer.layout import LAParams, LTTextBox, LTTextLine fp = open('file.pdf', 'rb') parser = PDFParser(fp) doc = PDFDocument() parser.set_document(doc) doc.set_parser(parser) doc.initialize('') rsrcmgr = PDFResourceManager() laparams = LAParams() device = PDFPageAggregator(rsrcmgr, laparams=laparams) interpreter = PDFPageInterpreter(rsrcmgr, device) # Process each page contained in the document. for page in doc.get_pages(): interpreter.process_page(page) layout = device.get_result() for lt_obj in layout: if isinstance(lt_obj, LTTextBox) or isinstance(lt_obj, LTTextLine): print(lt_obj.get_text()) fp.close() 
+22
source

Perhaps you could use pdfminer.six. This description:

fork PDFMiner using six for compatibility with Python 2 + 3

After installation with pip:

pip install pdfminer.six

Using this method is similar to pdfminer , at least in my code.

Hope this can save your day :)

+2
source

Source: https://habr.com/ru/post/1204865/


All Articles