Grouping Lucene Search Results and Calculating Frequency by Category

I am working on a store search API using Lucene.

I need to show store search results for each city, a combination of conditions with its frequency in brackets ... for example:

Los Angles,CA (450)
Atlanta,GA (212)
Boston, MA (78)
.
.
.

At the moment, my search results return about 7000 Lucene documents, on average, if the user says "Show me all the stores." In this case, I show about 800 unique City, State entries, as shown above.

I override the HitCollectorclass method Collectand extract the vectors as follows:

var vectors = _reader.GetTermFreqVectors(doc);

Then I repeat this collection and calculate the frequency for each unique combination of City, State.

... Lucene?

, , , Lucene, /.

!

+3
3

, OOTB Lucene - :

Jira Lucene

OOTB Solr, - . , :

http://localhost:8983/solr/select?q=ipod&rows=0&facet=true&facet.limit=-1&facet.field=cat&facet.field=inStock

:

<response>
<responseHeader><status>0</status><QTime>2</QTime></responseHeader>
<result numFound="4" start="0"/>
<lst name="facet_counts">
 <lst name="facet_queries"/>
 <lst name="facet_fields">
  <lst name="cat">
        <int name="search">0</int>
        <int name="memory">0</int>
        <int name="graphics">0</int>
        <int name="card">0</int>
        <int name="music">1</int>
        <int name="software">0</int>
        <int name="electronics">3</int>
        <int name="copier">0</int>
        <int name="multifunction">0</int>
        <int name="camera">0</int>
        <int name="connector">2</int>
        <int name="hard">0</int>
        <int name="scanner">0</int>
        <int name="monitor">0</int>
        <int name="drive">0</int>
        <int name="printer">0</int>
  </lst>
  <lst name="inStock">
        <int name="false">3</int>
        <int name="true">1</int>
  </lst>
 </lst>
</lst>
</response>

- Solr:

http://wiki.apache.org/solr/SimpleFacetParameters

EDIT: SOLR aproach , , Lucene:

http://sujitpal.blogspot.com/2007/01/faceted-searching-with-lucene.html

Lucene 2.0 .

+3

, , "", , .

- , , , , ...

0
Steve, I think you want a faceted search . This is not related to Lutsena. I suggest you try SOLR , which has faceting as its main and convenient feature.
0
source

Source: https://habr.com/ru/post/1706254/