How to store frequently modified lists in a database in a natural way so that they are just ready to read?

For a social networking site, I need to store frequently modified lists for each object (and millions of such objects) that:

  • often added to
  • often read
  • sometimes decreases
  • lists are entered using the primary key

I already store some other data types in an RDBMS. I know that I could store these lists in RDBMS as many, many relationships as follows: Create a listItems table with two columns, listId and listItem and to create any specific list, simply execute a SELECT query for all WHERE listId = x entries WHERE listId = x . But storing lists in this way in a DBMS is not very ideal when it comes to high scalability. Instead, I would like to keep prepared lists in a natural way so that search performance is maximized. Because I need to collect about a hundred of these lists for the user, whenever I log in and view the page myself.

So how can I solve this? Which database should be used for this data, possibly one that allows the no variable from the columns to be added to the key using the primary key, such as Cassandra?

+4
source share
7 answers

I used the same method as for storing a row of two columns for each record, which I turned to a txt file with formatted html, which we then changed to json and finally to mongodb.

But since you have frequent operations, I suggest using cassandra, hbase, and googles for large tables such as data clouds and hypertables.

Cloudata may be right for you.

+5
source

As you pointed out, the solution should be realistic and scalable: I suggest you use Redis with its LIST data structure and O (1) inserts and O (N) outputs (N - elements to extract, re-extract the last from the lists) and scale it by horizontally using some hashing algorithm. I do not know how much data you are going to store and how many machines are available, but it will definitely be the best choice for performance, since nothing can compare with the speed of access to memory.

If the amount of data is huge, and you cannot save it all in RAM, then Cassandra can complete the task. Saving time-ordered lists is best suited for it using a split strategy, as mentioned by Zanson.

Another thought: you said that reading performance should be maximum, and as soon as the user logs in, you will need a hundred lists for this user. Why not prepare one list for each user? Thus, there will be more entries, but the reading will be optimized, since you will need to extract only the latest entries from one list. I'm not sure if this fits your task, just a thought. :)

+3
source

I would recommend SSDB ( https://github.com/ideawu/ssdb ), the Google leveldb network shell. SSDB is designed to store collection data, such as list, map, zset (sorted set). You can use it as follows:

 ssdb->hset(listId, listItem1); ssdb->hset(listId, listItem2); ssdb->hset(listId, listItem3); ... list = ssdb->hscan(listId, 100); // now list = [listItem1, listItem2, listItem3, ...] 

The number of items on one card is limited only by the size of the hard drive. Another solution is Redis, but Redis stores all the data in memory (say no more than 30 GB), so it probably won't fit your project.

C ++, PHP, Python, Java, Lua and other clients are supported by SSDB.

+2
source

Cassandra has built-in support for storing sets / maps / lists. If your requests always pull it all down, then this is a very easy way to handle this type of thing.

http://www.datastax.com/dev/blog/cql3_collections http://cassandra.apache.org/doc/cql3/CQL.html#collections

If your lists are tied to a user, you can create different columns in a row / section of users, and then queries for several lists will be fast, since they will all be in the same section for this user.

+2
source

Cassandra can be used very well for such use cases. Create as many columns as you want for the returned datasets / queries. Cassandra works best with de-normalized data or sets, such as 1: m, m: m relationships.

+1
source

I know that you did not want to consider relational databases, but I think that for this simple situation there is also a scalable solution with a relational database. The main advantage will be that you do not need to maintain a separate database system.

To get scalable, all NoSQL solutions will distribute your data across multiple nodes. You can do this in your application code by distributing data from multiple relational databases. To balance the load, you may need to periodically move the data, but it may be quite simple to create a new database for each N lists.

+1
source

In cassandra, you can have wide rows, up to 2B columns per row ... if this is enough for an element of cumulative lists of objects, you can store entire lists of objects on one line and then extract them all together. with the cassandra “composite column” you can store the elements of each list sequentially and sorted, and you can delete one column (list element) whenever you want, and when you have an insert, you just need to insert the column ...

something like that: (!)

  |list_1_Id : item1Id |list_1_Id : item2Id | list_2_Id : item1Id |...| list_n_Id : item3Id | entity| item1Value | item2Value | item1Value |...| item3Value | 

so practically you are dealing with columns (= elements), not lists ... and this makes your work a lot easier. depends on your list size coordinator, using splitting the entiti line into multiple lines ... something like this: (!)

  | item1Id | item2Id | item3Id | item4Id |... entiId_list_1_Id | item1Value | item2Value | item3Value | item4Value |... | item1Id | item2Id | item3Id | item4Id |... entiId_list_2_Id | item1Value | item2Value | item3Value | item4Value |... ... 

and you can put itemValue in the column name and leave the column value blank to reduce the size ... for example, you can insert a new element by simply doing: // columns are sorted by their identifier if they have insert into entityList [entityId] [ listId] [itemId] = item value; or // columns are sorted by their value insert into entityList [entityId] [listId] [itemvalue] = nothing; and delete: delete from entityList, where entityId = 'd' and listId = 'o' and itemId = 'n';

or through your application, you can do this using a rich client like Hector ...

0
source

Source: https://habr.com/ru/post/1498521/


All Articles