Hierarchical Facsimile Search Example with Solr

Question

Where can I find a complete example showing how a hierarchical faceted search works from indexing documents to getting search results?

My research so far

Stackoverflow has several messages, but all of them affect only certain aspects of the hierarchical grant search; therefore, I would not duplicate them. I am looking for a complete example to figure this out. I continue to skip the last query in which aggregates work.

There is documentation on the Solr webpage, but did not understand the example given here.

Example (conceptually)

I would like to create a complete example of the passage here and hope you can provide the missing final part.

Testdata h3>

Enter

Say we have 3 documents with each document being a person.

Alice (document 1) - Blond - Europe Jane (document 2) - Brown - Europe/Norway Bob (document 3) - Brown - Europe/Norway - Europe/Sweden 

Output

The expected output for this (currently incorrect) request

 http://server:8983/solr/my_core/select?q=*%3A*&wt=json&indent=true&facet=true&facet.field=tags_ss 

it should be

 Hair_color (3) - blond (1) - brown (1) - black (1) Location (3) - Europe (4) // This should be 4 not 3, ie the sum of the leaves, because Alice is tagged with "Europe" only, without a country - Norway (2) - Sweden (1) 

because all documents are found.

Example (software)

I need help here. How to implement the above conceptual example?

That's how far I got.

1. Creating XML test data

This is the contents of the documents.xml file in the subfolder solr-5.1.0/testdata :

 <add> <doc> <field name="id">Alice</field> <field name="tags_ss">hair_color/blond</field> <field name="tags_ss">location/Europe</field> </doc> <doc> <field name="id">Jane</field> <field name="tags_ss">hair_color/brown</field> <field name="tags_ss">location/Europe/Norway</field> </doc> <doc> <field name="id">Bob</field> <field name="tags_ss">hair_color/black</field> <field name="tags_ss">location/Europe/Norway</field> <field name="tags_ss">location/Europe/Sweden</field> </doc> </add> 

_ss defined in schema.xml as

 <dynamicField name="*_ss" type="string" indexed="true" stored="true" multiValued="true"/> 

Please note that all tags, for example. hair_color and location , and all tags that will be added in the future, are stored in the same tags_ss field.

2. Test Data Index with Solr

 c:\solr-5.1.0>java -classpath dist/solr-core-5.1.0.jar -Dauto=yes -Dc=gettingstarted -Ddata=files -Drecursive=yes -Durl=http://server:8983/solr/my_core/update org.apache.solr.util.SimplePostTool .\testdata 

Solr statistics page

3. Get all the data using the Solr request (without cutting)

Request

 http://server:8983/solr/my_core/select?q=*%3A*&wt=json&indent=true 

Result

 { "responseHeader": { "status": 0, "QTime": 0, "params": { "indent": "true", "q": "*:*", "_": "1430830360536", "wt": "json" } }, "response": { "numFound": 3, "start": 0, "docs": [ { "id": "Alice", "tags_ss": [ "hair_color/blond", "location/europe" ], "_version_": 1500334369469890600 }, { "id": "Jane", "tags_ss": [ "hair_color/brown", "location/europe/Norway" ], "_version_": 1500334369469890600 }, { "id": "Bob", "tags_ss": [ "hair_color/black", "location/europe/Norway", "location/europe/Sweden" ], "_version_": 1500334369469890600 } ] } } 

4. Get all the data using the Solr query (with cut)

Request

 http://server:8983/solr/my_core/select?q=*%3A*&wt=json&indent=true&facet=true&facet.field=tags_ss 

Result

 { "responseHeader": { "status": 0, "QTime": 0, "params": { "facet": "true", "indent": "true", "q": "*:*", "_": "1430830432389", "facet.field": "tags_ss", "wt": "json" } }, "response": { "numFound": 3, "start": 0, "docs": [ { "id": "Alice", "tags_ss": [ "hair_color/blond", "location/europe" ], "_version_": 1500334369469890600 }, { "id": "Jane", "tags_ss": [ "hair_color/brown", "location/europe/Norway" ], "_version_": 1500334369469890600 }, { "id": "Bob", "tags_ss": [ "hair_color/black", "location/europe/Norway", "location/europe/Sweden" ], "_version_": 1500334369469890600 } ] }, "facet_counts": { "facet_queries": {}, "facet_fields": { "tags_ss": [ "location/europe/Norway", 2, "hair_color/black", 1, "hair_color/blond", 1, "hair_color/brown", 1, "location/europe", 1, "location/europe/Sweden", 1 ] }, "facet_dates": {}, "facet_ranges": {}, "facet_intervals": {}, "facet_heatmaps": {} } } 

Pay attention to this section at the bottom of the result:

 "facet_fields": { "tags_ss": [ "location/europe/Norway", 2, "hair_color/black", 1, "hair_color/blond", 1, "hair_color/brown", 1, "location/europe", 1, "location/europe/Sweden", 1 ] }, 

It shows all tags as a flat list (not hierarchical).

5. Get all the data using the Solr query (with hierarchical cut)

Request

Here is my problem. I do not know how to build a query that returns the following result (the result is already shown in the conceptual example above).

Result (dummy, manually created for illustration)

 { "responseHeader":{ "status":0, "QTime":0, "params":{ "facet":"true", "indent":"true", "q":"*:*", "facet.field":"tags_ss", "wt":"json", "rows":"0"}}, "response":{"numFound":3,"start":0,"docs":[] }, "facet_counts":{ "facet_queries":{}, "facet_fields":{ "tags_ss":[ "hair_color,3, // This aggregations is missing "hair_color/black",1, "hair_color/blond",1, "hair_color/brown",1, "location/europe",4, // This aggregation should be 4 but is 1 "location/europe/Norway",2, "location/europe/Sweden",1]}, "facet_dates":{}, "facet_ranges":{}, "facet_intervals":{}, "facet_heatmaps":{}}} 

This tag list is still flat, but at least location/europe = 4 will aggregate correctly, but it is not currently. I keep getting location/europe = 1 because it is set only for Alice and Bob Norway and Sweden not aggregated to also count on Europe .

Ideas

  • I may need to use facet.pivot , but I don't know how to do it.
  • I may need to use facet.prefix , but I don't know how to do it.

Version

  • Solr 5.1.0
  • Windows 7
+6
source share
1 answer

You can get all of your aggregates to be filled if you insert them into the index in stages. If Bob is from Norway, you can fill in up to three values ​​in the facet field:

 location location/Europe location/Europe/Norway 

(As an alternative design, you might have a hair color field separate from the location field, and then the “location” should never be filled in the field itself.)

Then your results are still flat, but your aggregated totals are present. At this point, you will need to do some software work with the result set to create a nested data structure built by dividing all the values ​​on the separator character ( / in this case). After you have a nested data structure, then its hierarchical display should be manageable. It’s hard to talk about this part of the implementation in detail because your nested data structure and display will largely depend on your development environment.

Another, somewhat risky option, in order not to add duplicate entries to the Solr facet field, is to add only the value that you are using now (for example, location/Europe/Norway ), but summarize the final values ​​of the sheet as an iteration over the list of faces and build your nested data structure. There is a risk that if a person is truly connected to several countries in Europe, you may receive an inflated amount for a higher location/Europe level. I chose in my projects to fill in the individual values ​​as above. Despite the fact that they seem redundant, the aggregate totals become more accurate.

(As usual in Solr, this is just one of several ways to do something. This model is best suited for systems with a controlled number of common leaves, where it makes sense to get all the facet values ​​in front and you don't need to do additional detailed queries.)

Rotation parameter

Matching a Solr ball can return a hierarchically structured result directly from Solr, but runs the risk of creating spurious connections between values ​​in certain situations.

So, say you upload your documents as follows:

 <add> <doc> <field name="id">Alice</field> <field name="continent">Europe</field> </doc> <doc> <field name="id">Jane</field> <field name="continent">Europe</field> <field name="country">Norway</field> </doc> <doc> <field name="id">Bob</field> <field name="continent">Europe</field> <field name="country">Norway</field> <field name="country">Sweden</field> </doc> </add> 

Now you are performing a facet reference query with facet.pivot.mincount=1&facet.pivot=continent,country . The results so far may be large:

 "facet_pivot":{ "continent,country":[{ "field":"continent", "value":"Europe", "count":3, "pivot":[{ "field":"country", "value":"Norway", "count":2,}, { "field":"country", "value":"Sweden", "count":1,}]}]} 

So far so good. The problem arises when you add a new person to the data:

 <add> <doc> <field name="id">Susan</field> <field name="continent">Europe</field> <field name="country">Norway</field> <field name="continent">South America</field <field name="country">Brazil</field> </doc> </add> 

Now Solr really does not know that Norway is in Europe, and Brazil is in South America, so you will begin to get graphs for "Europe> Brazil" and "South America> Norway".

The problem is solvable if you add continent prefixes to all the values ​​in your country:

 <add> <doc> <field name="id">Susan</field> <field name="continent">Europe</field> <field name="country">Europe/Norway</field> <field name="continent">South America</field <field name="country">South America/Brazil</field> </doc> </add> 

Thus, you will still receive inconsistent values ​​of the summary values, but you can block any values ​​of facets of the country level that do not have a prefix corresponding to their continent. For this to be a problem, a multi-valued field in a code must have values ​​associated with values ​​that appear later in the same bar. If you do not expect to have multiple values ​​for these fields in one record or if your values ​​do not have a strong association (that is, a specific origin), then turning faces may be the ideal solution. But in some cases, the dissociation of the summary boundary between the values ​​in the included fields can create an irreparable mess.

+4
source

Source: https://habr.com/ru/post/986573/


All Articles