How to extract texts from PDF files using xpdf?

I have many pdf files in a folder. I want to extract text from these PDF files using xpdf. For instance:

  • example1.pdf extract to example1.txt
  • example2.pdf extract to example2.txt
  • etc..

here is my code:

<?php $path = 'C:/AppServ/www/pdfs/'; $dir = opendir($path); $f = readdir($dir); while ($f = readdir($dir)) { if (eregi("\.pdf",$f)){ $content = shell_exec('C:/AppServ/www/pdfs/pdftotext '.$f.' '); $read = strtok ($f,"."); $testfile = "$read.txt"; $file = fopen($testfile,"r"); if (filesize($testfile)==0){} else{ $text = fread($file,filesize($testfile)); fclose($file); echo "</br>"; echo "</br>"; } } } 

I get an empty result. What is wrong with my code?

+1
source share
3 answers

try using this:

 $dir = opendir($path); $filename = array(); while ($filename = readdir($dir)) { if (eregi("\.pdf",$filename)){ $content = shell_exec('C:/AppServ/www/pdfs/pdftotext '.$filename.' '); $read = strtok ($filename,"."); $testfile = "$read.txt"; $file = fopen($testfile,"r"); if (filesize($testfile)==0){} else{ $text = fread($file,filesize($testfile)); fclose($file); echo "</br>"; echo "</br>"; } } 
+2
source

You do not need to create a temporary txt file

 $command = '/AppServ/www/pdfs/pdftotext ' . $filename . ' -'; $a = exec($command, $text, $retval); echo $text; 

if it does not work, check the server error logs.

0
source

Lines

 echo "</br>"; echo "</br>"; 

it should be

 echo "</br>"; echo $text."</br>"; 

Hope this helps

0
source

Source: https://habr.com/ru/post/1272944/


All Articles