Does grep work properly in PDF files?

Question

Does grep work properly in PDF files?

Is it possible to search for multiple pdf files using the grep command. It doesn't seem to work, how do people search for content in multiple PDF files?

+3

grep pdf full-text-search

Sam Apr 22 '10 at 5:44

source share

7 answers

Well, PDF is a binary format, and grep can search for binary files as if they were text

grep -a

or you can just use pdftotext (which comes with xpdf) like this:

pdftotext whee.pdf | grep pattern

+4

Will hayworth Apr 22 '10 at 5:49

source share

, , Mac OS X mdfind :

mdfind -onlyin search/directory/path "kind:pdf search text"

+3

Coxy 22 . '10 7:00

Pdf - , grep . pdf grep :

ls dir_with_pdfs/*.pdf|xargs strings|grep "keyword"

pdf2text pdf, grep.

+2

Bozhidar Batsov 22 . '10 5:48

PDF - , . , grep, Postscript . . , PDF , .

The article describes in more detail the text in PDF files http://pdf.jpedal.org/java-pdf-blog/bid/27187/Understanding-the-PDF-file-format-text-streams

+1

mark stephens Apr 22 '10 at 8:08

source share

This pdfgrep tool will do the job. It has syntax similar to grep. Searching in multiple files is just a simple shell script. For instance:

$> ls Documents/*.pdf | xargs pdfgrep -n -H "system"
Documents/2005-DoddGutierrezRO-MAN1.pdf:1: designed episodic memory system
Documents/2005-DoddGutierrezRO-MAN1.pdf:1: how ISAC episodic memory system is
Documents/2005-DoddGutierrezRO-MAN1.pdf:1: cognitive system employs a combination
....

+1

dlobato Mar 01 '12 at 11:54

source share

If you have pdftotext installed via the popplar package, try this perl script:

#!/usr/bin/perl
my $p = shift;
foreach my $fn (@ARGV) {
    open(F,"pdftotext $fn - |");
    while (<F>) { print "$fn:$_" if /$p/; }
    close(F);
}

0

Jeff burdges Nov 06 '11 at 4:19

source share

xenoterracide · Accepted Answer · 2010-04-22T05:48:13+0000

Use something like Solr or clucene I think they can do what you want.

Does grep work properly in PDF files?

More articles: