This is probably a bit of an old question and a lot of useful answers and recommendations, but I will try to summarize the results and describe a solution for breaking up large data sets with the cursor , bec. I recently ran into this problem.
Like Yonik mentioned , the problem with regular start / rows is that when we have a large dataset and start little further (much further) than zero, we have good overhead in terms of efficiency and memory. This is due to the fact that retrieving 20 documents from the “average” 500K + records using sorting, at least requires sorting the entire data set (sorting internal unique elements). Moreover, if the search is widespread, it will be even more resource intensive, bec. a dataset (of 500,020 rows) from each shard must be returned to the node aggregator to combine to find the applicable 20 rows.
Solr cannot calculate which corresponding document is the result of 999001st in sorted order, without first determining that the first 999,000 comparable sorted results.
The solution here is to use Solr cursorMark .
In the first query, you declare &cursorMark=* . This means the following:
You might think that this is like start=0 as a way to tell Solr to “start at the beginning of my sorted results”, except that it also tells Solr that you want to use the cursor.
! Here you can describe that your sort clauses should include a uniqueKey field . This field can be id if its unique.
Part of the first request will look like this:
?sort=price desc,id asc&start=0&cursorMark=* ...
As a result, you get the following structure
{ "response":{"numFound":20,"start":0,"docs":[ /* docs here */ ]}, "nextCursorMark":"AoIIRPoAAFBX" // Here is cursor mark for next "page" }
To get the next page, the following query would look like this:
?sort=price desc,id asc&start=0&cursorMark=AoIIRPoAAFBX ...
Pay attention to cursorMark from the previous answer. And as a result, you get the next page of results (the same structure as the first answer, but with a different value for nextCursorMarker ). Etc...
This approach is ideal for infinite scroll scrolling, but there are some things to think about to use it in classic pagination :).
Here are some reference materials I found to solve this problem, hope this helps someone to do this.