I am analyzing a single document containing RTF content using Apache tika, but it gives some exception. it does not provide the contents of the document.
Here is the code snippet:
public String contentEx(File f) throws IOException, SAXException, TikaException { System.out.println(f.getName()); InputStream is = new FileInputStream(f); Parser ps = new AutoDetectParser(); BodyContentHandler bch = new BodyContentHandler(); Metadata metadata = new Metadata(); ps.parse(is, bch, metadata, new ParseContext()); return bch.toString(); }
But when I called this method the following:
public static void main(String[] args) throws IOException, SAXException, TikaException { StanfrdEntityExtr see = new StanfrdEntityExtr() File Resum_F = new File("/home/rahul/Documents/resumes/212/swetank.docx"); String s1 = see.contentEx(Resum_F); }
he gives an exception:
Exception in thread "main" org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.rtf.RTFParser@39614c6 at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:244) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120) at stranfordParse.StanfrdEntityExtr.contentEx(StanfrdEntityExtr.java:92) at stranfordParse.StanfrdEntityExtr.main(StanfrdEntityExtr.java:50) Caused by: java.lang.ArrayIndexOutOfBoundsException: 9 at org.apache.tika.parser.rtf.TextExtractor.processControlWord(TextExtractor.java:872) at org.apache.tika.parser.rtf.TextExtractor.parseControlWord(TextExtractor.java:566) at org.apache.tika.parser.rtf.TextExtractor.parseControlToken(TextExtractor.java:492) at org.apache.tika.parser.rtf.TextExtractor.extract(TextExtractor.java:459) at org.apache.tika.parser.rtf.TextExtractor.extract(TextExtractor.java:448) at org.apache.tika.parser.rtf.RTFParser.parse(RTFParser.java:56) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) ... 4 more
How to solve this exception? How to correctly print the contents of this document using apache Tika? I found some solution, but they do not work.
Give me some idea! Any help would be greatly appreciated!
source share