Removing accent with apache solr

Question

Removing accent with apache solr

I am trying to search using apache solr using this scheme http://pastie.org/5114389 , but when I type "josé" the file is found but when I write "jose" I do not get the result.

Efetuei searched the Internet for an answer and was supposed to use a class, but when I insert, it doesn't make any difference.

+4

search apache schema solr

user1127871 Oct 25 '12 at 13:15

source share

1 answer

Paige cook · Accepted Answer · 2012-10-25T13:46:01+0000

I see from your schema that you are using ASCIIFoldingFilterFactory already in the text fieldType field that is assigned to the default field. However, it only applies to indexing this field. I would suggest that you also use it to query your field to make sure your query conditions stack up to fit the elements in the index. As a rule, in this case, when you add a factory filter to the indexing, you also add it to the query so that the query conditions and index terms are converted / matched accordingly.

So, I would modify your schema as follows:

 <fieldType name="text" class="solr.TextField" omitNorms="false"> <analyzer type="index"> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0"/> <filter class="solr.LowerCaseFilterFactory" /> <filter class="solr.ASCIIFoldingFilterFactory" words="mapping-FoldToASCII.txt"/> <filter class="solr.RemoveDuplicatesTokenFilterFactory" /> </analyzer> <analyzer type="query"> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0"/> <filter class="solr.LowerCaseFilterFactory" /> <filter class="solr.ASCIIFoldingFilterFactory" words="mapping-FoldToASCII.txt"/> <filter class="solr.RemoveDuplicatesTokenFilterFactory" /> </analyzer> </fieldType>

Removing accent with apache solr

More articles: