I am looking for an alternative to C / C ++ for the Apache Tika Java-based infrastructure . In particular, I am looking for a file butcher and structured text extraction within a single structure. After doing some online searching and looking at the nearest object that I have, a GNU libextractor and several separate file filters that parse documents to extract text data (pdftoext, xls2csv..etc)
Can anyone recommend a good library comparable to Apache Tika?
thanks
Tika has network server mode, so you can always start Tika using this and then send it from your C ++ code?
Alternatively, Tika has CLI mode, so you can start a new Tika process each time and read data from the channel.
KDE provides a library called KFileMetaData which they use internally for the file indexer.
It uses C ++, Qt5 and supports most basic formats, such as ms-office-2007, odfs, pdfs, images, video, audio and ebooks.
Source: https://habr.com/ru/post/889742/More articles:How to install User-Agent with LWP? - perlHow do you set the title to a UITabBarItem? - iosCyrillic characters in an INSERT request - phpProblems with Php, Mysql and UTF8 - phpWhat is a good open source package for creating flexible spam detection on a large Rails site? - linuxDoubts about the use of polymorphism, as well as how polymorphism is associated with casting? - javaTwitter api authorization of my application - javascriptHow to say (programmatically) if there are / are not any registered applications that support the opening of a certain type of document? - iosinteractive region overlay - CSS - htmlAndroid How to simulate an HTTP connection for an offline demonstration - androidAll Articles