Full-text search with multiple index and complex requirements

We are creating an application that will require us to index data for each of our users so that we can provide a full text search of their data. Here are some notes about the app:

A) The data for each user is not completely related to each other user. This gives us several advantages:

  • we can keep our indexes small in size.
  • merging / matching a fragmented index will take less time.
  • if some indexes become unavailable for some reason (corruption?), only those users are affected. Other users are not affected, and the service is available to them.

B) Each user can have several different types of data. We want to save each type in separate folders for the same reasons as above.

So, our index hierarchy will look something like this: /user1/type1/<index files>
/user1/type2/<index files>
/user2/type1/<index files>
/user3/type3/<index files>

C) Often, perhaps, at each iteration, we will add β€œtypes” of data that can be indexed.
Therefore, we want to have an efficient / software way of adding schemas for different "types". We would like to avoid setting a fixed schema for indexing. I like the Lucene index transfer method without regard.

D) Users can run search queries that will search: - Within a specific type for this user - For all types for this user: in this case we want to run a parallel query, such as Lucene. ( ParallelMultiSearcher )

E) . .

F) . :
  , , . .

Lucene, Sphinx Solr . :

  • Sphinx: A, B, C, F. ?
  • Luecne: , . F .
  • Solr: , A, B, C . ?

, - ? Solr, Lucene, .

+3
2

, Solr A B, Solr ( shard). Solr C, . Solr , , Lucene ( Embedded Solr, ). , Lucene .

+2

, Solr .

Solr, , . . http://wiki.apache.org/solr/CoreAdmin

, / Solr. (A) (B). , (, , Solr), , . (D) (F). ​​ "", .

(C), Solr . . http://wiki.apache.org/solr/SchemaXml#Dynamic_fields

(E), Solr "" . , Solr .

+1

Source: https://habr.com/ru/post/1796229/


All Articles