There are words in Dutch and German that can be combined with new words; compound words.
For example, "accountmanager" is considered a single word, exacerbated by the words "account" and "manager". Our users will use the "accountmanager" and "account manager" in documents and queries and expect the same results for both queries.
To be able to decode (separate) the words, solr has a dictionary filter, which I configured in the circuit:
<filter class="solr.DictionaryCompoundWordTokenFilterFactory" dictionary="../../compound-word-dictionary.txt" minWordSize="8" minSubwordSize="4" maxSubwordSize="15" onlyLongestMatch="true"/>
The compound vocabulary word .txt file contains a list of words that are used to decompose compound words. In this list you will find, for example, the words "account" and "manager".
The decompression result is in order when it is analyzed in the Solr debugger when searching with the request "accountmanager": (term text):
- AccountManager
- Account
- manager
This result, however, is considered as an OR operator and finds all documents that contain at least one of the terms. I want it to behave like an AND operator (so I want only those results that have both the terms “account” and “manager” in the document).
I tried to set the defaultOperator parameter in the schema to "AND", but when using edismax this is ignored. Therefore, I set the proposed Min-should-Match to 100% (mm = 100%), again without the desired result. Setting dictionary filter attributes in a schema does not change the behavior to "AND".
Has anyone encountered this behavior when using the dictionary dictionary of the word factory and know a solution that allows it to behave like an AND operator?
source share