Does grep work properly in PDF files?

Is it possible to search for multiple pdf files using the grep command. It doesn't seem to work, how do people search for content in multiple PDF files?

+3
source share
7 answers

Use something like Solr or clucene I think they can do what you want.

+2
source

Well, PDF is a binary format, and grep can search for binary files as if they were text

grep -a

or you can just use pdftotext (which comes with xpdf) like this:

pdftotext whee.pdf | grep pattern
+4
source

, , Mac OS X mdfind :

mdfind -onlyin search/directory/path "kind:pdf search text"
+3

Pdf - , grep . pdf grep :

ls dir_with_pdfs/*.pdf|xargs strings|grep "keyword"

pdf2text pdf, grep.

+2

PDF - , . , grep, Postscript . . , PDF , .

The article describes in more detail the text in PDF files http://pdf.jpedal.org/java-pdf-blog/bid/27187/Understanding-the-PDF-file-format-text-streams

+1
source

This pdfgrep tool will do the job. It has syntax similar to grep. Searching in multiple files is just a simple shell script. For instance:

$> ls Documents/*.pdf | xargs pdfgrep -n -H "system"
Documents/2005-DoddGutierrezRO-MAN1.pdf:1: designed episodic memory system
Documents/2005-DoddGutierrezRO-MAN1.pdf:1: how ISAC episodic memory system is
Documents/2005-DoddGutierrezRO-MAN1.pdf:1: cognitive system employs a combination
....
+1
source

If you have pdftotext installed via the popplar package, try this perl script:

#!/usr/bin/perl
my $p = shift;
foreach my $fn (@ARGV) {
    open(F,"pdftotext $fn - |");
    while (<F>) { print "$fn:$_" if /$p/; }
    close(F);
}
0
source

Source: https://habr.com/ru/post/1742228/


All Articles