I have thousands of PDF documents that are 11-15 mb. My program says that my document contains more than 100 thousand characters.
Error output:
An exception in the stream "main" org.apache.tika.sax.WriteOutContentHandler $ WriteLimitReachedException: Your document contains more than 100,000 characters, and therefore your requested limit has been reached. Get the full text of the document, increase your limit.
How can I increase the limit to 10-15 mb?
I found a solution that is a new class of Tika facades, but I could not find a way to integrate it with mine.
Tika tika = new Tika(); tika.setMaxStringLength(10*1024*1024);
Here is my code:
BodyContentHandler handler = new BodyContentHandler(); Metadata metadata = new Metadata(); String location = "C:\\Users\\Laptop\\Dropbox\\MainTextbookTrappe2ndEd.pdf"; FileInputStream inputstream = new FileInputStream(location); ParseContext pcontext = new ParseContext(); PDFParser pdfparser = new PDFParser(); pdfparser.parse(inputstream, handler, metadata, pcontext);
Output:
System.out.println("Content of the PDF :" + pcontext);
source share