PDF bullets come as question marks when parsing with Apache Tika in java

Question

PDF bullets come as question marks when parsing with Apache Tika in java

I am parsing PDF files using Apache Tika (tika-app-1.3) using this code:

InputStream input = new FileInputStream("Introduction.pdf");  
AutoDetectParser parser = new AutoDetectParser();
BodyContentHandler handler = new BodyContentHandler(100 * 1024 * 1024);
Metadata metadata = new Metadata();
parser.parse(input, handler, metadata);
System.out.println(handler.toString());

handler.toString()displays the text in PDF format, but this text also contains markers that appear as a symbol ?, but I want these cartridges to be as they are. Is there a way to get the original, how is the content using Apache Tika? Or where coding is needed in the analysis?

+1

java

Puneet srivastava Jul 9 '13 at 14:27

source share

1 answer

dsh · Accepted Answer · 2013-07-09T15:16:53+0000

, , , , , , , , . . .

- . , ? (U + 003F) .

, PDF .

PDF bullets come as question marks when parsing with Apache Tika in java

More articles: