Extract page sizes from PDF in Python

Question

Extract page sizes from PDF in Python

I want to read a PDF file and get a list of its pages and the size of each page. I do not need to manipulate it in any way, just read it.

He is currently trying to use pyPdf, and he is doing everything I need, except for the way to get the page sizes. Understanding that I will probably have to iterate over, since page sizes can vary in a pdf document. Is there any other libray / method that I can use?

I tried using PIL, some online recipes even had the use of d = Image (imagefilename), but it NEVER reads any of my PDF files - it reads everything else that I throw at it - even some things that I did not know PIL could do.

Any guidance is appreciated - I'm on Windows 7 64, python25 (because I'm doing GAE stuff too), but I'm happy to do this on Linux or more modern pythia.

+11

python pdf

Bif Jun 03 '11 at 17:33

source share

5 answers

Josh lee · Answer 1 · 2011-06-03T18:08:29+0000

This can be done using PyPDF2 :

>>> from PyPDF2 import PdfFileReader >>> input1 = PdfFileReader(open('example.pdf', 'rb')) >>> input1.getPage(0).mediaBox RectangleObject([0, 0, 612, 792])

(formerly known as pyPdf and still refers to the documentation.)

Myonaiz · Answer 2 · 2018-02-20T13:34:21+0000

for pdfminer python 3.x (pdfminer.six) (not tried on python 2.7):

 parser = PDFParser(open(pdfPath, 'rb')) doc = PDFDocument(parser) pageSizesList = [] for page in PDFPage.create_pages(doc): print(page.mediabox) # <- the media box that is the page size as list of 4 integers x0 y0 x1 y1 pageSizesList.append(page.mediabox) # <- appending sizes to this list. eventually the pageSizesList will contain list of list corresponding to sizes of each page

Jamy mahabier · Answer 3 · 2018-05-17T09:17:33+0000

With pdfrw :

 >>> from pdfrw import PdfReader >>> pdf = PdfReader('example.pdf') >>> pdf.pages[0].MediaBox ['0', '0', '595.2756', '841.8898']

The length is in points (1 pt = 1/72 of an inch). Format: ['0', '0', width, height] (thanks, Astrophe !).

Alexander Marin · Answer 4 · 2016-08-15T19:16:45+0000

Another way is to use popplerqt4

 doc = popplerqt4.Poppler.Document.load('/path/to/my.pdf') qsizedoc = doc.page(0).pageSize() h = qsizedoc.height() # given in pt, 1pt = 1/72 in w = qsizedoc.width()

cges30901 · Answer 5 · 2019-10-08T14:34:21+0000

With PyMuPDF :

 >>> import fitz >>> doc = fitz.open("example.pdf") >>> page = doc.loadPage(0) >>> print(page.MediaBox) Rect(0.0, 0.0, 595.0, 842.0) #format is (0.0, 0.0, width, height) if page is not rotated

Extract page sizes from PDF in Python

More articles: