Solr stopwords - documents do not match

I am using solr-3.4, my part of the circuit looks like,

<fieldType name="text" class="solr.TextField" positionIncrementGap="100" autoGeneratePhraseQueries="true"> <analyzer type="index"> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords_en.txt" enablePositionIncrements="true"/> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/> <filter class="solr.PorterStemFilterFactory"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords_en.txt" enablePositionIncrements="true"/> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/> <filter class="solr.PorterStemFilterFactory"/> </analyzer> </fieldType> 

stopwords_en.txt contains


and
are
as

etc..

Now, when I search for โ€œ buy a house โ€, Solr does not return documents with the text โ€œ buy a house โ€ to me.
Also, when I search for โ€œ buy a house, โ€ Solr does not return documents with the text โ€œ buy a house .โ€

DebugQuery part

 <str name="rawquerystring">cContent:"buy a house"</str> <str name="querystring">cContent:"buy a house"</str> <str name="parsedquery">PhraseQuery(cContent:"bui ? hous")</str> <str name="parsedquery_toString">cContent:"bui ? hous"</str> 

A similar (but not exact) question was found here
But there was no satisfactory answer to solve this problem.

Any idea how I can solve this problem? or what's wrong?

+4
source share
2 answers

You are searching with PhraseQuery, so in the first case, "buy a house" will not match "buy a house." If you add slop (cContent: "buy house" ~ 2) to PhraseQuery, you will also get matches.

In the second case, although the stopwatch is being filtered out, he still expects that something will be in this position, so โ€œbuy a houseโ€ will correspond to โ€œbuy one houseโ€, but not โ€œbuy a houseโ€. Maybe slop can fix this too, but I'm not sure.

+3
source

In fact, I think your problem with PorterStemmer is that the "house" is being transformed into a "house". If you really donโ€™t think you need it, I would turn off the porter streamer. In my experience, this usually does more harm than good.

0
source

Source: https://habr.com/ru/post/1390812/


All Articles