How do I configure SOLR so that users can perform a prefix search by default?

Question

How do I configure SOLR so that users can perform a prefix search by default?

I am using SOLR 3.2. My application issues search queries in an SOLR instance for a text field type. How can I get SOLR to return results such as a book, a bookshelf, bookasd, etc., when the user issues a query like book. Do I have to add "*" characters to the query string manually or is there a parameter in SOLR so that it performs prefix searches on the field?

This is the schema.xml section for the text field:

<fieldType name="text" class="solr.TextField" positionIncrementGap="100"> <analyzer type="index"> <charFilter class="solr.HTMLStripCharFilterFactory"/> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" /> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="1" splitOnCaseChange="0"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.SnowballPorterFilterFactory" language="English" protected="protwords.txt"/> </analyzer> <analyzer type="query"> <charFilter class="solr.HTMLStripCharFilterFactory"/> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenat0All="1" splitOnCaseChange="0"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.SnowballPorterFilterFactory" language="English" protected="protwords.txt"/> </analyzer> </fieldType>

+6

wildcard search solr prefix

dude 21 sept '11 at 7:59

source share

4 answers

Okke klein · Answer 1 · 2011-09-21T11:20:49+0000

There are several ways to do this, but given performance, you can use EdgeNgramFilterFacortory

Dorin · Answer 2 · 2011-09-21T09:04:20+0000

I had the same project requirement. I had to implement the proposal. What I did was define this fieldType argument

 <fieldType class="solr.TextField" name="suggester"> <analyzer type="index"> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.ShingleFilterFactory" minShingleSize="2" maxShingleSize="3" outputUnigrams="true" outputUnigramsIfNoShingles="false" /> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords_en.txt" enablePositionIncrements="true" /> </analyzer> <analyzer type="query"> <tokenizer class="solr.KeywordTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/> </analyzer> </fieldType>

I used ShingleFilterFactory because I needed to get a sentence consisting of one or more words.

Then I used facies queries to get offers.

Facet.Limit = 10
Facet.Prefix = "Book"
Facet.Field = "Suggestester" // this is a field with fieldType = "summester" in which I saved the data

I know that it uses the results of the face, but maybe it solves your problem.

If my or Jayendra Patil's answer does not provide you with a solution, you can also take a look at EdgeNGramFilterFactory

Jayendra · Answer 3 · 2011-09-21T08:13:09+0000

You will either have to do the client-side processing by adding wildcards at the end of the search queries.

Impact: -

Ordered queries affect performance.
Wildcard queries are not parsed. Therefore, query time analysis will not apply to your search queries.

Another option is to implement a custom query parser with the processing you need.

tedders · Answer 4 · 2012-02-22T19:20:48+0000

I am sure that you understand this so far, but here is the answer:

I handled this by taking the last term and putting OR with the last term plus a wildcard, for example. “My favorite book” becomes “my + favorite + (OR book *)” and returns “my favorite bookshelf”. You probably want to do some input processing (escaping, etc.).

If you are specifically looking for text typed in accordance with the beginning of the result, then edge n-grams are the way to go, but reading your question, it seems you really did not ask for it.

How do I configure SOLR so that users can perform a prefix search by default?

More articles: