This question is for reference and comparison. The solution is the accepted answer below .
For many hours, I searched for a quick and easy, but mostly accurate way to get the number of pages in a PDF document. Since I work in a graphics and printing company that works a lot with PDF files, the number of pages in a document must be known before they are processed. PDF documents come from different clients, so they are not created in the same application and / or do not use the same compression method.
Here are some of the answers that I found insufficient or just NOT working :
Using Imagick (PHP extension)
Imagick requires a lot of installation, apache needs to be restarted, and when it finally worked for me, the processing took an amazingly long time (2-3 minutes per document), and it always returned 1 page in each document (I didnβt see a working copy of Imagick before so far), so I threw it away. It was like with the getNumberImages() and identifyImage() methods.
Using FPDI (PHP library)
FPDI is easy to use and install (it simply extracts files and calls the PHP script), BUT many compression methods are not supported by FPDI. Then it returns an error:
FPDF error: this document (test_1.pdf) probably uses a compression method that is not supported by the free analyzer that comes with FPDI.
Opening a stream and searching with a regular expression:
This opens the PDF file in the stream and looks for some line containing the number of pages or something like that.
$f = "test1.pdf"; $stream = fopen($f, "r"); $content = fread ($stream, filesize($f)); if(!$stream || !$content) return 0; $count = 0; // Regular Expressions found by Googling (all linked to SO answers): $regex = "/\/Count\s+(\d+)/"; $regex2 = "/\/Page\W*(\d+)/"; $regex3 = "/\/N\s+(\d+)/"; if(preg_match_all($regex, $content, $matches)) $count = max($matches); return $count;
/\/Count\s+(\d+)/ (searches for /Count <number> ) does not work, because only a few documents have the /Count option inside, so most of the time it does not return anything. Source./\/Page\W*(\d+)/ (searches /Page<number> ) does not get the number of pages, basically it contains some other data. Source./\/N\s+(\d+)/ (looking for /N <number> ) also does not work, since documents can contain several values /N ; most, if not all, not containing page counts. Source.
So, what works reliably and accurately?
See the answer below.
php pdf
Richard de Wit Feb 01 '13 at 10:33 2013-02-01 10:33
source share