Extract page sizes from PDF in Python

I want to read a PDF file and get a list of its pages and the size of each page. I do not need to manipulate it in any way, just read it.

He is currently trying to use pyPdf, and he is doing everything I need, except for the way to get the page sizes. Understanding that I will probably have to iterate over, since page sizes can vary in a pdf document. Is there any other libray / method that I can use?

I tried using PIL, some online recipes even had the use of d = Image (imagefilename), but it NEVER reads any of my PDF files - it reads everything else that I throw at it - even some things that I did not know PIL could do.

Any guidance is appreciated - I'm on Windows 7 64, python25 (because I'm doing GAE stuff too), but I'm happy to do this on Linux or more modern pythia.

+11
source share
5 answers

This can be done using PyPDF2 :

>>> from PyPDF2 import PdfFileReader >>> input1 = PdfFileReader(open('example.pdf', 'rb')) >>> input1.getPage(0).mediaBox RectangleObject([0, 0, 612, 792]) 

(formerly known as pyPdf and still refers to the documentation.)

+25
source

for pdfminer python 3.x (pdfminer.six) (not tried on python 2.7):

 parser = PDFParser(open(pdfPath, 'rb')) doc = PDFDocument(parser) pageSizesList = [] for page in PDFPage.create_pages(doc): print(page.mediabox) # <- the media box that is the page size as list of 4 integers x0 y0 x1 y1 pageSizesList.append(page.mediabox) # <- appending sizes to this list. eventually the pageSizesList will contain list of list corresponding to sizes of each page 
+3
source

With pdfrw :

 >>> from pdfrw import PdfReader >>> pdf = PdfReader('example.pdf') >>> pdf.pages[0].MediaBox ['0', '0', '595.2756', '841.8898'] 

The length is in points (1 pt = 1/72 of an inch). Format: ['0', '0', width, height] (thanks, Astrophe !).

+3
source

Another way is to use popplerqt4

 doc = popplerqt4.Poppler.Document.load('/path/to/my.pdf') qsizedoc = doc.page(0).pageSize() h = qsizedoc.height() # given in pt, 1pt = 1/72 in w = qsizedoc.width() 
0
source

With PyMuPDF :

 >>> import fitz >>> doc = fitz.open("example.pdf") >>> page = doc.loadPage(0) >>> print(page.MediaBox) Rect(0.0, 0.0, 595.0, 842.0) #format is (0.0, 0.0, width, height) if page is not rotated 
0
source

Source: https://habr.com/ru/post/889722/


All Articles