Recent Access Sort by Lucene / Solr

In my Solr queries, I want to sort the most recently processed documents at the top (the “accessible” meaning opened by the user action). It has no other search criteria for me: from documents with text matching the query, I want them in the order of recent use. I can only think of two ways to do this:

1) Include the last access date field in each document to have Solr sorting. I am told that Trie Date fields can be sorted very quickly. Of course, the problem is to update the field, which will require storing each text of the document so that I can delete and re-add any document with the updated "last access" field. Variable fields can avoid this, but Lucene / Solr still does not offer mutable fields.

2) Alternatively, save the modified last access dates and save them in another db. This would require Solr to return a complete list of relevant documents, which may be above hundreds of thousands of documents. This huge list of document IDs will be matched with dates in db and then sorted. It will work fine for unusual search queries, but not for broad general search queries.

Thus, the trade-off is between 1) the size of the index and the cost of processing each time you access the document and 2) large overhead queries, especially for non-concentrated search queries

Do I have any alternatives?

+6
source share
3 answers

You can do this with the atomic update function.

http://wiki.apache.org/solr/Atomic_Updates

This functionality is available with Solr 4.0. It allows you to update a single field in a document without having to reindex the entire document. I only know about this functionality from the documentation. I did not use it myself, so I can’t say how well it works or there are some pitfalls.

0
source

Definitely use parameter 1, using SOLR queries and updating the lastAccessed field if necessary.

Since updates to a partial SOLR 4.0 document are supported by several falsities: https://cwiki.apache.org/confluence/display/solr/Updating+Parts+of+Documents

For your application, it seems that a simple atomic update is enough.

In terms of performance, this should work well for large collections and quick document updates.

0
source

Source: https://habr.com/ru/post/908285/


All Articles