I read a document several times in which the Solr viewfinder highlighted the wiki and searched for it everywhere, but I couldn’t even get the main selection for working with my Solr installation. I am running Solr 3.5 on a Jetty 6.1 demo server.
I have indexed 250K documents and I can search for them just fine. In addition to customizing my document field definitions, most of Solr’s configuration is “stock”, although I temporarily commented on the solrconfig.xml “Default Highlight” file to make sure that they do not cause this problem:
My url is very simple. I tried many variations, but here is my last one when it returns the main request:
hl=on&hl.fl=title&indent=on&version=2.2&q=toyota&fq=&start=0&rows=1&fl=*%2Cscore
Here is the XML result:
<?xml version="1.0" encoding="UTF-8"?> <response> <lst name="responseHeader"> <int name="status">0</int> <int name="QTime">32</int> <lst name="params"> <str name="explainOther"/> <str name="indent">on</str> <str name="hl.fl">title</str> <str name="wt"/> <str name="hl">true</str> <str name="version">2.2</str> <str name="rows">1</str> <str name="fl">*,score</str> <str name="start">0</str> <str name="q">toyota</str> <str name="qt"/> <str name="fq"/> </lst> </lst> <result name="response" numFound="9549" start="0" maxScore="0.9960097"> <doc> <float name="score">0.9960097</float> <str name="id">2-33-200</str> <str name="title">1992 Toyota Camry 2.2L CV Boots</str> </doc> </result> <lst name="highlighting"> <lst name="2-33-200"/> </lst> </response>
How can I debug this problem further? Thanks!
Edit Here is the <highlighting>
section from solrconfig.xml. As I said, these are stocks. This may be a problem, but I'm new to Solr and am not familiar with backlighting yet (obviously).
<highlighting> <fragmenter name="gap" default="true" class="solr.highlight.GapFragmenter"> <lst name="defaults"> <int name="hl.fragsize">100</int> </lst> </fragmenter> <fragmenter name="regex" class="solr.highlight.RegexFragmenter"> <lst name="defaults"> <int name="hl.fragsize">70</int> <float name="hl.regex.slop">0.5</float> <str name="hl.regex.pattern">[-\w ,/\n\"']{20,200}</str> </lst> </fragmenter> <formatter name="html" default="true" class="solr.highlight.HtmlFormatter"> <lst name="defaults"> <str name="hl.simple.pre"><![CDATA[<em>]]></str> <str name="hl.simple.post"><![CDATA[</em>]]></str> </lst> </formatter> <encoder name="html" class="solr.highlight.HtmlEncoder" /> <fragListBuilder name="simple" default="true" class="solr.highlight.SimpleFragListBuilder"/> <fragListBuilder name="single" class="solr.highlight.SingleFragListBuilder"/> <fragmentsBuilder name="default" default="true" class="solr.highlight.ScoreOrderFragmentsBuilder"> </fragmentsBuilder> <fragmentsBuilder name="colored" class="solr.highlight.ScoreOrderFragmentsBuilder"> <lst name="defaults"> <str name="hl.tag.pre"><![CDATA[ <b style="background:yellow">,<b style="background:lawgreen">, <b style="background:aquamarine">,<b style="background:magenta">, <b style="background:palegreen">,<b style="background:coral">, <b style="background:wheat">,<b style="background:khaki">, <b style="background:lime">,<b style="background:deepskyblue">]]></str> <str name="hl.tag.post"><![CDATA[</b>]]></str> </lst> </fragmentsBuilder> <boundaryScanner name="default" default="true" class="solr.highlight.SimpleBoundaryScanner"> <lst name="defaults"> <str name="hl.bs.maxScan">10</str> <str name="hl.bs.chars">.,!? 	 </str> </lst> </boundaryScanner> <boundaryScanner name="breakIterator" class="solr.highlight.BreakIteratorBoundaryScanner"> <lst name="defaults"> <str name="hl.bs.type">WORD</str> <str name="hl.bs.language">en</str> <str name="hl.bs.country">US</str> </lst> </boundaryScanner> </highlighting>
Edit Although my "title" field was originally set to indexed = "false", I have since tested its true value (no change / no highlight) and also termVectors = "true" termPositions = "true" termOffsets = "true". .. still no effect. (I tried them based on reading this post for SO .)
And here is my definition of the "title" field at the moment:
<field name="title" type="string" indexed="true" stored="true" required="true" termVectors="true" termPositions="true" termOffsets="true" />
First I started with:
<field name="title" type="string" indexed="false" stored="true" required="true" />
Change I also tried this definition:
<field name="title" type="text_general" indexed="true" stored="true" required="true" termVectors="true" termPositions="true" termOffsets="true" />
and no change in backlight, still not working. My definition of text_general is the standard one that comes with the Solr demo:
<fieldType name="text_general" class="solr.TextField" positionIncrementGap="100"> <analyzer type="index"> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" /> <filter class="solr.LowerCaseFilterFactory"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" /> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/> <filter class="solr.LowerCaseFilterFactory"/> </analyzer> </fieldType>
Change I also tried re-indexing the header with type text_en_splitting, which uses the WhitespaceTokenizerFactory instead of the StandardTokenizerFactory and still does not highlight. For what it's worth, I use a standard query analyzer, which according to debugQuery = on is LuceneQParser.
FINALLY! Thanks @javanna for the help. I experimented a lot, and two key crashes:
- You must use the type of tokenization field. String field type will not work. There is no need to index = true or termVectors = true, but the type of the field must be indicated.
- You must be careful to access your fields with the proper case. In addition to screwing up the token, I also changed the situation on my fields during development and forgot to change the case of determining hl.fl (highlighted field) - preventing separation from work.
- Make sure you reindex between each configuration change. To be safe, I deleted all documents from the index and rebuilt it from scratch, but this may not be necessary.
My definition now looks like:
<field name="Title" type="text_general" indexed="false" stored="true" required="true" />
And my solrconfig.xml has this set:
<str name="hl">on</str> <str name="hl.fl">Title</str>