text_en uses an interrupt, so if you do a fakes search, you can match fake , fake's , faking , etc. With a field without a stitch, fakes will match only fakes .
Each field uses a different βchainβ of analyzers. Text_en uses a filter chain that better indexes English. See tokenizer and filter entries.
Schema excerpt for text_general:
<!-- A general text field that has reasonable, generic cross-language defaults: it tokenizes with StandardTokenizer, removes stop words from case-insensitive "stopwords.txt" <fieldType name="text_general" class="solr.TextField" positionIncrementGap="100"> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" /> <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/> <filter class="solr.LowerCaseFilterFactory"/>
Schema excerpt for text_en:
<fieldType name="text_en" class="solr.TextField" positionIncrementGap="100"> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_en.txt" enablePositionIncrements="true" /> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.EnglishPossessiveFilterFactory"/> <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/> <filter class="solr.PorterStemFilterFactory"/>
source share