I have some summarization data that is very easy to generate using some relatively simple map / reduce representations. But we want to sort the data based on the reduced values of the group view (not the keys). It has been suggested that we could use couchdb-lucene for this. But how? I don’t understand how to use the full text index to quickly rank this kind of data.
What we already have
An example of a simplified example looks something like this:
by_sender: { map: "function(doc) { emit(doc.sender, 1); }", reduce: "function(keys, values, rereduce) { return sum(values); }" }
Returns results similar to the following (when starting with group=true ):
{"rows":[ {"key":" a@example.com ","value":2}, {"key":" aaa@example.com ","value":1}, {"key":" aaap@example.com ","value":34}, {"key":" aabb@example.com ","value":1}, ... thousands or tens of thousands of rows ... ]}
What do we want
Those are sorted by key, but I need to sort it by values, for example:
{"rows":[ {"key":" xyzzy@example.com ","value":847}, {"key":" adam@example.com ","value":345}, {"key":" karl@example.com ","value":99}, {"key":" aaap@example.com ","value":34}, ... thousands or tens of thousands of rows ... ]}
And I need it to be sorted as fast as possible (for example, if only 100 ms is needed to update indexes, it should not take 1 minute before new data is reflected in the requests).
More context: what we have tried
The best answer to Sorting CouchDB Views By Value gives you four viable options that we tried in increasing order of complexity:
- At first we sorted the client part of the results, but it was too slow.
- Then we created a list function that sorts the data. A little faster, but still too slow.
- Reduced card chains should easily deal with this problem.
- Someone pointed out Cloudant Associated Cards - Zoom Out Views . They are not located in BigCouch , but they are part of Cloudant services, which, unfortunately, are not yet in our budget.
- I started implementing the application layer using the _bulk_docs API. This is difficult if you want to constantly update updates while avoiding race conditions, etc. I can continue this approach, but this is not a relaxation .:(
- An answer is suggested using couchdb-lucene . But I'm not familiar with full-text search enough to figure out how to get it to do something more complex than indexing a document and returning a search result. I don’t even know where to start.