Perhaps you could try the Regular Expression Tokenizer and make a suitable regular expression for your article numbers. Lucene (on which Solr is built) is very focused on tokenization for prose.
What do you want here, probably this is an N-gram split as well as 1 gram? And itβs possible that dashes are replaced by spaces, something like
DV-5PBRP β {DV 5PBRP, DV, 5P, BR, PB, RP, D, V, 5, P, B, R}
As you can see, the index will be quite large for very small fields. Make sure the ranking of the results is heavily weighted for large ngrams.
I really think you should remove the stop word list for the article number field.
The size of N-grams should start with 1 or 2.
Just make sure that different analyzers do not:
- learn dash
- delete one or more characters (they are often found in stop-word lists)
- deletes numbers
claj source share