Solr and .Net Filters

I am relatively new to the wonderful world of Solra and ask the following question. What is the best way to process documents in terms of extracting the structure of the document and transferring it to Solr for indexing.

I would like to be able to extract text from Word Docs, PDF's, Spreadsheets, HTML pages, etc. In fact, almost any document containing text.

I took a look at Windows Filters, and at first glance they seem to provide the required functions.

How do you do this?

sime

+3
source share
2 answers

, Solr Cell. , #, , , / java-.

Solr Cell Apache Tika, , ( ) , Word PDF.

+2

, SolrCell . , - SolrNet, :

  • ,
  • , HTTP- Solr, SolrNet .

, iTextSharp/Aspose SolrCell - .

+2

Source: https://habr.com/ru/post/1765999/


All Articles