Optimizing Solr for Sorting

I use Solr for real-time search. My data set is about 60 million documents. Instead of sorting by relevance, I need to sort by time. I am currently using the sort flag in a query to sort by time. This is great for specific searches, but when search queries return a large number of results, Solr must take all the documents received and sort them by time before returning. This is slow, and there should be a better way.

What is the best way?

+4
source share
3 answers

I have found the answer.

If you want to sort by time rather than relevance, use fq = instead of q = for all your filters. Thus, Solr does not waste time calculating the weighted value of documents corresponding to q =. It turns out that Solr spent too much time without sorting.

In addition, you can speed up sorting by preheating the sort fields in the newSearcher and firstSearcher event listeners in the solrconfig.xml file. This ensures that sorts are done using the cache.

+4
source

The obvious first question: what type of time field? If it's a string, sorting is obviously very slow. tdate even faster than date .

Another point: do you have enough memory for Solr? If it starts to change, performance is terrible right away.

And third: if you have a senior Lucene, then date is just a string that is very slow.

+1
source

Warning A wild sentence not based on previous experience or known facts. :)

  • Run the query without sorting and rows = 0 to get the number of matches. Disable faceting, etc. To increase productivity, we only need the total number of matches.
  • Depending on the number of matches from step # 1, the distribution of your data and the number / offset of the desired results, run another query that sorts by date and also adds a filter to the date, for example fq=date:[NOW()-xDAY TO *] where x is the approximate period of time in days during which we will find the required number of relevant documents.
  • If the number of results from step # 2 is less than necessary, slightly loosen the filter and run another query.

First, you can use the following to evaluate x :

If you uniformly add n documents per day to the index of documents of size n and the specific query matched by d documents in step # 1, then to get top r results, you can use x = (N*r*1.2)/(d*n) . If you need to put off the filter too often in step 3, then slowly increase the value of 1.2 in the formula as necessary.

0
source

Source: https://habr.com/ru/post/1340748/


All Articles