Reading pdf files with php

I have a large PDF file, which is a floor map for a building. It has layers for all office furniture, including text boxes for the location.

My goal is to read this file using PHP, search for a document for text layers, get their contents and coordinates in a file. That way I can determine the location of the places -> x / y.

Is there any way to do this through PHP? (Or even Ruby or Python, if necessary)

+41
php pdf
Jun 16 '09 at 23:56
source share
4 answers

Check FPDF (with FPDI):

http://www.fpdf.org/

http://www.setasign.de/products/pdf-php-solutions/fpdi/

This will allow you to open the PDF file and add content to it in PHP. I assume that you can also use their functionality to search through existing content for the required values.

Another possible library is TCPDF: http://www.tecnick.com/public/code/cp_dpage.php?aiocp_dp=tcpdf

Update to add a more modern library: PDF Parser

+27
Oct 17 '09 at 17:49
source share

There is a php library (pdfparser) that does exactly what you want.

project website

http://www.pdfparser.org/

Github

https://github.com/smalot/pdfparser

Demo page / api

http://www.pdfparser.org/demo

After including pdfparser in your project, you can get all the text from mypdf.pdf as follows:

 <?php $parser = new \installpath\PdfParser\Parser(); $pdf = $parser->parseFile('mypdf.pdf'); $text = $pdf->getText(); echo $text;//all text from mypdf.pdf ?> 

Simular, you can get metadata from PDF in the same way as you get PDF objects (like images).

+18
Jan 23 '14 at 10:42 on
source share

Hmm ... not exactly php, but you can call the program from php to convert pdf to temporary html file and then parse the resulting file with php. I did something similar for my project, and this is the program I used:

Pdftohtml

What's cool about the program is that it will spit out text elements in a <div> with absolute position coordinates. This seems to be exactly what you are trying to do.

0
Jun 17 '09 at 0:39
source share

You might also want to try this application http://pdfbox.apache.org/ . A working example can be found at https://www.jinises.com

0
Oct 11 '13 at 8:58
source share



All Articles