Solr StandardTokenizer: How are underscores handled with wildcards?

So, I have a Solr instance that processes inputs and queries using the StandardTokenizer (as well as ClassicFilterfactory , LowercaseFilterFactory and Stopfilterfactory ).

There are several files in my index with highlighted underscore names (e.g. some_indexed_file.jpg ).

I noticed that if I some_indexed_file.jpg for some_indexed_file.jpg , I get the file that I'm looking for, returned correctly.

However, if I alternatively looked for some_indexed_file.jp* , (which with an asterisk that I suppose acts as a wildcard), which, in my opinion, should give similar results, I get no results.

Any idea what is going on: I assume I don’t understand something about how solr requests are handled?

Edit : as requested, XML configuration entries are presented here:

  <fieldType name="default" class="solr.TextField"> <analyzer type="index"> <tokenizer class="solr.StandardTokenizerFactory" /> <filter class="solr.ClassicFilterFactory" /> <filter class="solr.LowerCaseFilterFactory" /> <filter class="solr.StopFilterFactory" /> </analyzer> <analyzer type="query"> <tokenizer class="solr.StandardTokenizerFactory" /> <filter class="solr.ClassicFilterFactory" /> <filter class="solr.LowerCaseFilterFactory" /> <filter class="solr.StopFilterFactory" /> </analyzer> </fieldType> <field name="filename" type="default" multiValued="true" omitNorms="false" termVectors="false"/> 
+4
source share
1 answer

Well, a bit more research has solved the problem: The main problem is that Solr does not apply text analysis to wildcard queries.

This meant that he was looking for an exact match for some_indexed_file.jp* . However, when the file name was indexed, it was tagged with "some" "indexed" and file.jpg that did not match this search query.
The search some_indexed_file.jpg was correctly labeled and therefore returned the correct results.

+2
source

Source: https://habr.com/ru/post/1446501/


All Articles