Solr Snow Barrier incompatible with Spanish

I have this field:

<fieldtype name="textes" class="solr.TextField"> <analyzer type="index"> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords-es.txt" enablePositionIncrements="true"/> <filter class="solr.SnowballPorterFilterFactory" language="Spanish" protected="protwords-es.txt"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true"/> <filter class="solr.SnowballPorterFilterFactory" language="Spanish" protected="protwords-es.txt"/> </analyzer> </fieldtype> 

The expected result of the search query alquileres (annuity) will match the alquiler (annuity). But when I go to "Field Analysis" on the Solr Admin website and check the alquiler index alquiler and the alquileres query alquileres , the following happens:

  • When indexing an alquiler it falls into alquil .
  • When requesting alquileres it gets into alquiler .

Thus, the simple case of searching for a plural form of a word ( alquileres ) would not correspond to its special form ( alquiler ).

Should not both indexes and queries be inserted into the same trunk (either alquiler or alquil )? Is this a limitation of the algorithm or a misunderstanding / misconfiguration on my part?

+4
source share
3 answers

Snowboarding is very limited ... You will get the best result using the dictionary (Hunspell stemmer): http://wiki.apache.org/solr/Hunspell p>

+1
source

This link works correctly for alquileres

http://www.molinolabs.com/lematizador.html#alquileres

+2
source

I am using hunspell from openoffice and this is a great job.

My example:

 URL-Elastic/_analyze?analyzer=es_AR&text=alquileres 

And return:

 { tokens: [ { token: "alquiler", start_offset: 0, end_offset: 10, type: "<ALPHANUM>", position: 1 } ] } 

Link: https://www.openoffice.org/download/index.html

0
source

Source: https://habr.com/ru/post/1384713/


All Articles