Get sizes and coordinates of text fields in PDF

Is it possible to get the X / Y coordinates and the height / width of all text fields in a PDF document using PHP or linux library? I use PDFTK to extract all text fields in a PDF, but this does not give me coordinates and / or size information. If not, is it possible to intersect the PDF document and calculate x, y data and height / width for text fields?

+4
source share
2 answers

It is possible, but hardly feasible.

You can open PDF documents in PHP using FPDI . It generates an abstract tree of PDF objects in memory. TCPDF and FPDF can save it back.

However, crossing the specified tree and finding the right attributes is very. (I happen to be a verb.)

Now the PDF format is actually available to humans. And this will undoubtedly contain the coordinates in a readable format (mainly in IIRC clauses). Therefore, you can detect this with a simple regular expression if you only knew where to look. Some sites just have to be gzuncompress () ed, and you are not trying to modify the document or save it anyway. So try FPDI and print_r () to develop a strategy.

0
source

Yes, it is not too difficult. the best tool I know to work is pdfminer . it is python, but if you do not want to use python, you can simply upload the pdf information in xml format and analyze what is with your weapon of choice :) answer if you have problems :)

0
source

Source: https://habr.com/ru/post/1338592/


All Articles