Extract text from PDF: PDFLib vs PDF extract vs pdf2xml

I am looking for a library (if possible, in Java or PHP) to extract text from a PDF. There are many programs available, including:

What tools would you choose? What do you think of them?

Thanks so much for your kind help!

+3
source share
2 answers

iText (java), PDF , PDF , .

+3

itext irs i1040.pdf :

< 1 > article.gmane.org/gmane.comp.java.lib.itext.general/65680

, , . : < 2 > www.verypdf.com/wordpress/201109/pdf-to-text-converter-cant-extract-text-which-render-by-embedded-fonts-2452.html < 3 > 9.10.1:     www.adobe.com/content/dam/Adobe/en/devnet/pdf/pdfs/PDF32000_2008.pdf

< 3 > :

... Unicode .

, " Unicode" .

-1

Source: https://habr.com/ru/post/1765794/


All Articles