How to get distribution results?

Is there a way in solr to get results according to some distribution of one of the indexed fields?

For example, imagine I have a catalog of books with the fields synopsis , publication_year and genre .

I would like to create a query that returns the most relevant results according to the synopsis review in favor of the most recent published books. However, in the final results (say 1000), I would like the genre to be distributed as close as possible to some given distribution. For example: 50% of science fiction, 25% of scientific literature, 10% of politics and so on.

I know that I could get a large set of results and do some weighted sampling from the tank to get the last 1000 books outside of solr, but I'm looking for a solution that would only be Solr.

Is this possible, and if so, how?

+5
source share
1 answer

Until you can connect your distribution, but you can use Minimize and Expand or "Group Results" to get n number of results for each genre. Then you ignore any result returned above your threshold for this group.

You will need to set the number of documents for each group to the maximum bucket size in your distribution of the total number of calls, i.e. 500 in the above example. This can give you a very large set of documents to work with, so I will try to keep the total number returned by pr. the genre is pretty small in the beginning, at least.

Perhaps there is a way to make group sizes more dynamic by expanding one of the two functions above and adding your own code to limit the number of documents collected in each genre.

0
source

Source: https://habr.com/ru/post/1273467/


All Articles