Solr: facet indices cannot be strings?

When creating a list of facet values, I have this in my schema:

<field name="contract_facet_sector_ids" type="text" indexed="true" stored="true" multiValued="true" required="false" /> 

The faces that I want to save are strings like "1_1", "2_43", "2_99", etc. However, when I look at the returned face data, the underline seems to be removed:

 [facet_fields] => Array ( [contract_facet_sector_ids] => Array ( [11] => 0 [243] => 0 [299] => 0 

Can someone please help me with where I'm wrong? The definition of the type of the text field is as follows:

 <fieldType name="text" class="solr.TextField" positionIncrementGap="100"> <analyzer type="index"> <tokenizer class="solr.WhitespaceTokenizerFactory" /> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" /> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1" /> <filter class="solr.LowerCaseFilterFactory" /> <filter class="solr.SnowballPorterFilterFactory" language="English" protected="protwords.txt" /> </analyzer> <analyzer type="query"> <tokenizer class="solr.WhitespaceTokenizerFactory" /> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true" /> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" /> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="1" /> <filter class="solr.LowerCaseFilterFactory" /> <filter class="solr.SnowballPorterFilterFactory" language="English" protected="protwords.txt" /> </analyzer> </fieldType> 

Thank you very much in advance!

Seb

+4
source share
2 answers

WordDelimiterFilterFactory makes your underscores get deleted. Based on the following:

Separates words into subwords and performs optional conversions to subwords. By default, words are broken up into subwords with the following rules:

Separate between words (all non-individual characters). Wi-Fi β†’ Wi, Fi

...

Based on the description of how you use this field - "The faces I want to keep are strings ...". I would suggest that you use a fieldType string , as defined below (from the Solr example), if you really do not need additional analyzers to be used.

 <!-- The StrField type is not analyzed, but indexed/stored verbatim. --> <fieldType name="string" class="solr.StrField" sortMissingLast="true" omitNorms="true"/> 
+3
source

The Solr Analysis page will show you how your text is analyzed at different stages.

+1
source

Source: https://habr.com/ru/post/1390260/


All Articles