Why does this simple attempt to isolate Solr fail?

Question

Why does this simple attempt to isolate Solr fail?

I read a document several times in which the Solr viewfinder highlighted the wiki and searched for it everywhere, but I couldn’t even get the main selection for working with my Solr installation. I am running Solr 3.5 on a Jetty 6.1 demo server.

I have indexed 250K documents and I can search for them just fine. In addition to customizing my document field definitions, most of Solr’s configuration is “stock”, although I temporarily commented on the solrconfig.xml “Default Highlight” file to make sure that they do not cause this problem:

<!-- Highlighting defaults <str name="hl">on</str> <str name="hl.fl">title snippet</str> <str name="f.name.hl.fragsize">0</str> <str name="f.name.hl.alternateField">name</str> -->

My url is very simple. I tried many variations, but here is my last one when it returns the main request:

 hl=on&hl.fl=title&indent=on&version=2.2&q=toyota&fq=&start=0&rows=1&fl=*%2Cscore

Here is the XML result:

 <?xml version="1.0" encoding="UTF-8"?> <response> <lst name="responseHeader"> <int name="status">0</int> <int name="QTime">32</int> <lst name="params"> <str name="explainOther"/> <str name="indent">on</str> <str name="hl.fl">title</str> <str name="wt"/> <str name="hl">true</str> <str name="version">2.2</str> <str name="rows">1</str> <str name="fl">*,score</str> <str name="start">0</str> <str name="q">toyota</str> <str name="qt"/> <str name="fq"/> </lst> </lst> <result name="response" numFound="9549" start="0" maxScore="0.9960097"> <doc> <float name="score">0.9960097</float> <str name="id">2-33-200</str> <str name="title">1992 Toyota Camry 2.2L CV Boots</str> </doc> </result> <lst name="highlighting"> <lst name="2-33-200"/> </lst> </response>

How can I debug this problem further? Thanks!

Edit Here is the <highlighting> section from solrconfig.xml. As I said, these are stocks. This may be a problem, but I'm new to Solr and am not familiar with backlighting yet (obviously).

  <highlighting> <!-- Configure the standard fragmenter --> <!-- This could most likely be commented out in the "default" case --> <fragmenter name="gap" default="true" class="solr.highlight.GapFragmenter"> <lst name="defaults"> <int name="hl.fragsize">100</int> </lst> </fragmenter> <!-- A regular-expression-based fragmenter (for sentence extraction) --> <fragmenter name="regex" class="solr.highlight.RegexFragmenter"> <lst name="defaults"> <!-- slightly smaller fragsizes work better because of slop --> <int name="hl.fragsize">70</int> <!-- allow 50% slop on fragment sizes --> <float name="hl.regex.slop">0.5</float> <!-- a basic sentence pattern --> <str name="hl.regex.pattern">[-\w ,/\n\&quot;&apos;]{20,200}</str> </lst> </fragmenter> <!-- Configure the standard formatter --> <formatter name="html" default="true" class="solr.highlight.HtmlFormatter"> <lst name="defaults"> <str name="hl.simple.pre"><![CDATA[<em>]]></str> <str name="hl.simple.post"><![CDATA[</em>]]></str> </lst> </formatter> <!-- Configure the standard encoder --> <encoder name="html" class="solr.highlight.HtmlEncoder" /> <!-- Configure the standard fragListBuilder --> <fragListBuilder name="simple" default="true" class="solr.highlight.SimpleFragListBuilder"/> <!-- Configure the single fragListBuilder --> <fragListBuilder name="single" class="solr.highlight.SingleFragListBuilder"/> <!-- default tag FragmentsBuilder --> <fragmentsBuilder name="default" default="true" class="solr.highlight.ScoreOrderFragmentsBuilder"> <!-- <lst name="defaults"> <str name="hl.multiValuedSeparatorChar">/</str> </lst> --> </fragmentsBuilder> <!-- multi-colored tag FragmentsBuilder --> <fragmentsBuilder name="colored" class="solr.highlight.ScoreOrderFragmentsBuilder"> <lst name="defaults"> <str name="hl.tag.pre"><![CDATA[ <b style="background:yellow">,<b style="background:lawgreen">, <b style="background:aquamarine">,<b style="background:magenta">, <b style="background:palegreen">,<b style="background:coral">, <b style="background:wheat">,<b style="background:khaki">, <b style="background:lime">,<b style="background:deepskyblue">]]></str> <str name="hl.tag.post"><![CDATA[</b>]]></str> </lst> </fragmentsBuilder> <boundaryScanner name="default" default="true" class="solr.highlight.SimpleBoundaryScanner"> <lst name="defaults"> <str name="hl.bs.maxScan">10</str> <str name="hl.bs.chars">.,!? &#9;&#10;&#13;</str> </lst> </boundaryScanner> <boundaryScanner name="breakIterator" class="solr.highlight.BreakIteratorBoundaryScanner"> <lst name="defaults"> <!-- type should be one of CHARACTER, WORD(default), LINE and SENTENCE --> <str name="hl.bs.type">WORD</str> <!-- language and country are used when constructing Locale object. --> <!-- And the Locale object will be used when getting instance of BreakIterator --> <str name="hl.bs.language">en</str> <str name="hl.bs.country">US</str> </lst> </boundaryScanner> </highlighting>

Edit Although my "title" field was originally set to indexed = "false", I have since tested its true value (no change / no highlight) and also termVectors = "true" termPositions = "true" termOffsets = "true". .. still no effect. (I tried them based on reading this post for SO .)

And here is my definition of the "title" field at the moment:

 <field name="title" type="string" indexed="true" stored="true" required="true" termVectors="true" termPositions="true" termOffsets="true" />

First I started with:

 <field name="title" type="string" indexed="false" stored="true" required="true" />

Change I also tried this definition:

 <field name="title" type="text_general" indexed="true" stored="true" required="true" termVectors="true" termPositions="true" termOffsets="true" />

and no change in backlight, still not working. My definition of text_general is the standard one that comes with the Solr demo:

  <!-- A general text field that has reasonable, generic cross-language defaults: it tokenizes with StandardTokenizer, removes stop words from case-insensitive "stopwords.txt" (empty by default), and down cases. At query time only, it also applies synonyms. --> <fieldType name="text_general" class="solr.TextField" positionIncrementGap="100"> <analyzer type="index"> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" /> <!-- in this example, we will only use synonyms at query time <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/> --> <filter class="solr.LowerCaseFilterFactory"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" /> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/> <filter class="solr.LowerCaseFilterFactory"/> </analyzer> </fieldType>

Change I also tried re-indexing the header with type text_en_splitting, which uses the WhitespaceTokenizerFactory instead of the StandardTokenizerFactory and still does not highlight. For what it's worth, I use a standard query analyzer, which according to debugQuery = on is LuceneQParser.

FINALLY! Thanks @javanna for the help. I experimented a lot, and two key crashes:

You must use the type of tokenization field. String field type will not work. There is no need to index = true or termVectors = true, but the type of the field must be indicated.
You must be careful to access your fields with the proper case. In addition to screwing up the token, I also changed the situation on my fields during development and forgot to change the case of determining hl.fl (highlighted field) - preventing separation from work.
Make sure you reindex between each configuration change. To be safe, I deleted all documents from the index and rebuilt it from scratch, but this may not be necessary.

My definition now looks like:

 <field name="Title" type="text_general" indexed="false" stored="true" required="true" />

And my solrconfig.xml has this set:

 <str name="hl">on</str> <str name="hl.fl">Title</str>

+6

solr

Mason G. Zhwiti Mar 23 '12 at 16:27

source share

1 answer

javanna · Accepted Answer · 2012-03-24T08:41:03+0000

The way you make the selection seems good, but your solrconfig.xml looks a little dirty. Unfortunately, the example you provided uses basically all the available options, and I think you don't need them. If you do not need something different from the standard, I would start to comment on your entire backlight setting, as well as your default settings. Then I'll play around with the url parameters you need in just a couple: hl = on and hl.fl = title. Once you find the options you need, you can configure them by default.

However, given your fieldType header, I suspect it is not tokenized unless you change the default definition of the row type. In this case, your request will not match the header field, so you will not select it. Perhaps you are using edismax (or smax)? If yes, what is your qf parameter? Is it possible that the term toyota is in a different field that matches your query? If you use edismax, you can try to find q = title: toyota ans if you get results.

You can also check where your match allows debugQuery = on and check the debug output.

UPDATE
I saw that you changed the fieldType header to text_general , but that does not change anything, because this type is not indicated in spaces. You haven't said yet which query analyzer you are using, anyway, if I'm right, you should use the WhitespaceTokenizerFactory instead of the StandardTokenizerFactory :

 <tokenizer class="solr.WhitespaceTokenizerFactory"/>

After that, do not forget to reindex all your data, otherwise you will not see any changes. Basically, if you index something like toyota whatever without tokenization on spaces, you won’t get any toyota search result, and you won’t even mark toyota in this field because it does not match. My assumption is that you use a dismax or edismax query parser and search in more than one field, and some of them do not match the name of your search, so you will get the results, but do not title only field on the title that You have chosen to highlight. Can you post toyota search results? Is the term toyota for some other fields than title ?

Why does this simple attempt to isolate Solr fail?

More articles: