How to sort solr without stop words

I'm trying to sort a solr request by a field that ignores stop words, but can't seem to find a way to do this. For example, I want the results to be sorted as follows:

  • Charley
  • Fox
  • Helicopter

Is it possible? Right now, the field type is defined as follows:

<fieldType name="alphaOnlySort" class="solr.TextField" sortMissingLast="true" omitNorms="true">
  <analyzer>
    <tokenizer class="solr.KeywordTokenizerFactory"/>
    <filter class="solr.LowerCaseFilterFactory" />
    <filter class="solr.TrimFilterFactory" />
    <filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true"/>
    <charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/>
    <filter class="solr.PatternReplaceFilterFactory" pattern="([^a-z])" replacement="" replace="all" />
  </analyzer>
</fieldType>

And the field is added as:

<field name="title" type="alphaOnlySort" indexed="true" stored="false"/>

It seems someone else should have done this? Or sorting without stopwatch no no?

+3
source share
3 answers

You need to actually add a residual filter to the parser chain. Paste your indexing text into the field analyzer in Solr Admin and you will see that A in Fox is not discarded!

+1
source

, , , - , , . , "THE", . , "", .

, ? , . ( ).

+1

KeywordTokenizerFactory , StopFilterFactory ( ) - . , , WhitespaceTokenizerFactory, . , , - :

  • - KeywordTokenizerFactory,
  • StopFilterFactory
  • and remove the stop words from the contents using a regular expression using PatternReplaceFilterFactory (which is currently used for marking up numbers).

Generally, the only stop words you want to sort (not search) are "A", "AN", "THE". I am not very good at reg expressions, but I am sure that for many this is trivial.

+1
source

Source: https://habr.com/ru/post/1757753/


All Articles