How to sort / order data?

I already had experience with MongoDB, CouchDB, Redis, Tokyo Cabinet and other NoSQL databases. I recently came across Riak, and it looks very interesting to me. To get started with this, I decided to write a little Twitter clone, "hello world" in the NoSQL world. To get a fully working clone, you need to order tweets in chronological order. After reading the Riak Docs, I found that Map-Reduce is the right tool for this job. In my development environment, this works pretty well, but how is production performance, with hundreds of concurrent requests? Are there other, maybe faster methods for sorting data, or is it possible to store data in an ordered form (for example, in Cassandra)?

I think I found another solution to this problem - a simple linked list. Thus, one of the possible implementations may be that each user gets his own “timeline repository”, where links to the tweet data itself are stored (tweets are stored separately in the “tweet” basket). As you know, this time interval must contain a key with the name "first", which refers to the last timeline object and is the starting point of the list. To insert a new tweet into the timeline, simply insert a new item into the time basket, set the “next” -link of this new item to “first” -item, then set the new item to “first”. "

In short: insert an element as you would in a linked list ...

Like Twitter, a personal schedule contains only 20 tweets shown to the user. To get the last 20 tweets, you need only 2 requests. To speed up the process, in the first request, Riak uses the ability to follow links to get the last 20 objects tagged with the "next" tag. Finally, in the second and last request, the keys calculated by the first request are used to get the tweets themselves (using map / lower).

To delete the tweets of the users you just unsubscribed, I would use the Riak 1.0 secondary index feature to retrieve related objects / timeline tweets.

+6
source share
2 answers

It is not possible to store data in an ordered form in Riak without having to re-write parts of the Riak core. Data is stored, roughly speaking, in a bucket + key order. The actual order depends on the backup mechanism that you use for Riak.

Riak 1.0 has some features that may help you. Secondary indexes are supported there, as well as operations to reduce the map are improved - in particular, they work much better in scenarios with a high degree of parallelism.

Alexander Siculars wrote an article about Pagination with Riak . This describes the problem very well. Yammer also makes extensive use of Riak, and two of their engineers put together a presentation on Riak at Yammer . This is not part of the many implementation details, but you can learn a lot about how they developed their solution.

Combining secondary index queries with Map Reduce makes it easy to solve your problem.

+2
source

As Jeremiah says, it is impossible to store data in a sorted order, but you can still return the sorted results using secondary indexes and display / decrement. The problem, as described, is that you cannot effectively restrict the request in a sorted way.

Here is an example of using a range query to display all keys, and then sort them using the built-in functions in * riak_kv_mapreduce * ::

{ok, Pid} = riakc_pb_socket:start_link("127.0.0.1", 8087), riakc_pb_socket:mapred(Pid , {index, colonel_riak:bucket(context), <<"$key">>, <<0>>, <<255>>} , [{reduce, {modfun, riak_kv_mapreduce, reduce_sort}, none, true}]) 

You can use the functions in the lists module in erlang or use your own javascript sort function. The order can be reached by lists:reverse/1 in erlang.

0
source

Source: https://habr.com/ru/post/898310/


All Articles