I'm trying to get Solr to extract only the second 7-digit part of the ticket, formatted as n-nnnnnnn
Initially, I was hoping to keep the full ticket together. According to the documentation, numbers with numbers should be stored together, but after you removed this problem for some time and looked at the code, I do not think this is the case. Solr always generates two members. Therefore, instead of a large number of matches for the first digit, n-I think that I can get the best query results only from the second part. Substituting A for the dash:
<charFilter class="solr.PatternReplaceCharFilterFactory" pattern="\b\d[A](\d\d\d\d\d\d\d)\b" replacement="$1" replace="all" maxBlockChars="20000"/>
will parse 1A1234567 fine But - \ b "replacement =" $ 1 "replace =" all "maxBlockChars =" 20000 "/">
will not analyze 1-1234567
So this seems like a hyphen problem. I tried - (shielded) and [-] and \ u002D and \ x {45} and \ x045 with no success.
I tried putting char filters around it:
<charFilter class="solr.MappingCharFilterFactory" mapping="mapping.txt"/> <charFilter class="solr.PatternReplaceCharFilterFactory" pattern="\b\d[-](\d\d\d\d\d\d\d)\b" replacement="$1" replace="all" maxBlockChars="20000"/> <charFilter class="solr.MappingCharFilterFactory" mapping="mapping2.txt"/>
with mappings:
"-" => "z"
and then
"z" => "-"
It seems to me that the hyphen is eaten up in Flex token and is not even available for the char filter.
Has anyone had more success with a hyphen / dash in Solr / Lucene? Thanks
source share