Solr Index and Multilingual Data Search

In my Solr scheme, when indexing, Solr discovers the language of the indexed data and applies different indexing rules according to the detected language. All data is stored in the corresponding language fields, for example:

  • English names are stored in the title_en field.
  • Spanish names are stored in the title_es field.

-

 <field name="title_en" type="text_en" indexed="true" stored="true"/> <field name="title_es" type="text_es" indexed="true" stored="true"/> 

All search queries are performed against one β€œtotal” field β€œall”:

 <field name="text" type="text_general" indexed="true" stored="false" multiValued="true"/> 

All fields related to a particular language are copied to the text field to be available for a search query:

 <copyField source="title_en" dest="text"/> <copyField source="title_es" dest="text"/> 

I take care that since the "text" field does its own indexing by applying, I assume that the "text_general" rules are indexed, then reindexing occurs, and I think that all the previous language indexing rules for language fields (title_en, title_es ) are lost.

If so, how can I search in one query for all the data, while preserving language-specific indexes?

+2
source share
1 answer

Yes, the data stored in text (defined as text_general ) is processed only in accordance with the rules for this field - and they are not affected by title_en or title_es . copyField occurs before any processing of the value, since you usually (as in this case) want to perform various tokenization and analysis in the field.

A simple solution is to query the title_en and title_es fields if you want to search both using the query fields parameter: qf=title_en,title_es . This will search for both English and Spanish versions of your processed content as per your request.

+1
source

Source: https://habr.com/ru/post/1273507/


All Articles