I am trying to remove only the first page from several PDF files and merge into one file. (I get 150 PDF files per day, the first page is the invoice I need, the next three to 12 pages is just a backup that I don't need). Thus, the input of 150 PDF files of various sizes, and the output I want is 1 PDF file containing only the first page of each of the 150 files.
What I seem to have done is to merge all the pages EXCEPT for the first page (which is the only one I need).
import PyPDF2, os
pdfFiles = []
for filename in os.listdir('.'):
if filename.endswith('.pdf'):
pdfFiles.append(filename)
pdfFiles.sort(key=str.lower)
pdfWriter = PyPDF2.PdfFileWriter()
for filename in pdfFiles:
pdfFileObj = open(filename, 'rb')
pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
for pageNum in range(1 , pdfReader.numPages):
pageObj = pdfReader.getPage(pageNum)
pdfWriter.addPage(pageObj)
pdfOutput = open('CombinedFirstPages.pdf', 'wb')
pdfWriter.write(pdfOutput)
pdfOutput.close()
source
share