Sorting CouchDB data using a lucene couch

I have some summarization data that is very easy to generate using some relatively simple map / reduce representations. But we want to sort the data based on the reduced values ​​of the group view (not the keys). It has been suggested that we could use couchdb-lucene for this. But how? I don’t understand how to use the full text index to quickly rank this kind of data.

What we already have

An example of a simplified example looks something like this:

by_sender: { map: "function(doc) { emit(doc.sender, 1); }", reduce: "function(keys, values, rereduce) { return sum(values); }" } 

Returns results similar to the following (when starting with group=true ):

  {"rows":[ {"key":" a@example.com ","value":2}, {"key":" aaa@example.com ","value":1}, {"key":" aaap@example.com ","value":34}, {"key":" aabb@example.com ","value":1}, ... thousands or tens of thousands of rows ... ]} 

What do we want

Those are sorted by key, but I need to sort it by values, for example:

  {"rows":[ {"key":" xyzzy@example.com ","value":847}, {"key":" adam@example.com ","value":345}, {"key":" karl@example.com ","value":99}, {"key":" aaap@example.com ","value":34}, ... thousands or tens of thousands of rows ... ]} 

And I need it to be sorted as fast as possible (for example, if only 100 ms is needed to update indexes, it should not take 1 minute before new data is reflected in the requests).

More context: what we have tried

The best answer to Sorting CouchDB Views By Value gives you four viable options that we tried in increasing order of complexity:

  • At first we sorted the client part of the results, but it was too slow.
  • Then we created a list function that sorts the data. A little faster, but still too slow.
  • Reduced card chains should easily deal with this problem.
    • Someone pointed out Cloudant Associated Cards - Zoom Out Views . They are not located in BigCouch , but they are part of Cloudant services, which, unfortunately, are not yet in our budget.
    • I started implementing the application layer using the _bulk_docs API. This is difficult if you want to constantly update updates while avoiding race conditions, etc. I can continue this approach, but this is not a relaxation .:(
  • An answer is suggested using couchdb-lucene . But I'm not familiar with full-text search enough to figure out how to get it to do something more complex than indexing a document and returning a search result. I don’t even know where to start.
+4
source share
1 answer

I had a similar problem. You need to count the votes for the article and sort the articles by their number of votes. I decided to use a separate document to track each vote and another document that stores the number of votes in each article. Let me call them: article, voice, rating. I wrote a cron script that updates the score for each article, counting the “unregistered” votes. The script calls the view using the _count reduction _count , in which only "unregistered" voices are emitted (== FALSE registered). I use the group_results parameter to have the number of unregistered votes for an article, and then update the score for each article, noting the “registered” votes. At the moment, I have a view that emits both a key score for each article and a value id value. Thus, articles can be ordered according to the score. Conflicts can be avoided using this method.

0
source

Source: https://habr.com/ru/post/1403707/


All Articles