Tile auto-completion and termvector component

One way to use autocompletion like Google is to combine tile and termvector components in Solr 1.4.

First we generate all the n-gram distributions with the tile component, and then use the termvector to get the closest prediction to the sequence of user terms (based on the frequency of the document).

Scheme:

<fieldType name="shingle_text_fivegram" class="solr.TextField" positionIncrementGap="100">
    <analyzer>
        <tokenizer class="solr.LowerCaseTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="false" />
        <filter class="solr.ShingleFilterFactory" maxShingleSize="5" outputUnigrams="false"/>
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
    </analyzer>
</fieldType>

Solr config:

<searchcomponent name="termsComponent" class="org.apache.solr.handler.component.TermsComponent"/>
<requesthandler name="/terms" class="org.apache.solr.handler.component.SearchHandler">
    <lst name="defaults">
        <bool name="terms">true</bool>
        <str name="terms.fl">shingleContent_fivegram</str>
    </lst>
    <arr name="components">
        <str>termsComponent</str>
    </arr>
</requesthandler>

With the setting above, I need to drop the stacks anywhere on the edges of n-grams and save them inside a sequence of n-grams.

Say, from the sequence "India and China" I need the following sequence:

india
china
india and china

and skip the rest.

Is this possible in combination with other Solr components / filters?

UPD: Lucene 4 ( SOLR):

" , - ( ()) ( )? /state keep (capture/restorteState), ?" -

from: http://blog.mikemccandless.com/2013/08/suggeststopfilter-carefully-removes.html

+3
2

Solr 1.4 - EdgeNGramFilterFactory, , . "i", "in" "ind" .., .

+1

KeywordTokenizerFactory, ( ):

        <analyzer type="index">
            <tokenizer class="solr.LowerCaseTokenizerFactory"/>
            <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="false" />
            <filter class="solr.ShingleFilterFactory" maxShingleSize="5" outputUnigrams="false"/>
            <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
        </analyzer>
        <analyzer type="query">
            <tokenizer class="solr.KeywordTokenizerFactory"/>
            <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="false" />
            <filter class="solr.LowerCaseFilterFactory"/>
            <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
        </analyzer>
+1

Source: https://habr.com/ru/post/1791154/


All Articles