Indexing Using Redis Sorted Sets

I would like to receive some feedback and suggestions regarding the two approaches that I am considering for implementing searchable indexes using sorted Redis sets.

Situation and purpose

We currently have some tables of key values โ€‹โ€‹that we store in Cassandra and for which we would like to have indexes. For example, one table will contain records of people, and the Cassandra table will have id as its primary key, and the serialized object as value. An object would have fields such as first_name, last_name, last_updated, and others.

We want to be able to execute queries such as "last_name = 'Smith" AND first_name> "Joel", "last_name <" Aaronson "," last_name =' Smith "AND first_name = 'Winston'" and so on. Match IDs should be shown in the search results, so we can retrieve objects from Cassandra. I think that the above searches could be done with a single index, sorted by lexicography by last_name, first_name and last_updated. If we need some searches using a different order (for example, "first_name =" Zeus "), we can have a similar index that would allow us to use such (for example, first_name, last_updated).

We are considering using Redis for this because we need to be able to process a large number of records per minute. I read some common ways to sort Redis sets and came up with two possible implementations:

Option 1: one sorted set for an index

For our index by last_name, first_name, last_updated, we would have a sorted set in Redis under the key indexes: people: last_name: first_name: last_updated, which would contain strings with the format last_name: first_name: last_updated: id. For instance:

blacksmith: Joel: 1372761839.444: 0azbjZRHTQ6U8enBw6BJBw

(For the separator, I could use "::" rather than ":" or something else to work better with the lexicographic order, but ignore it for now)

Elements will be given a score of 0, so the sorted set will simply be sorted lexicographically by the lines themselves. If then I want to make a query like "last_name = 'smith" AND first_name <' bob '", I will need to get all the items in the list that go before' smith: bob '.

As far as I can tell, there are the following disadvantages for this approach:

  • There is no Redis function to select a range based on a string value. This function, called ZRANGEBYLEX, was proposed by Salvatore Sanfilippo at https://github.com/antirez/redis/issues/324 , but not implemented, so I would have to find the endpoints using binary searches and get (possibly using Lua or at the application level with Python, which is the language we use to access Redis).
  • If we want to include time for writing indexes, it seems that the easiest way to do this is to have a scheduled task that goes through the entire index and removes expired items.

Option 2: small sorted sets sorted by last_updated

This approach would be similar, except that we would have many smaller sorted sets, each of which would have a temporary value, such as last_updated for evaluations. For example, for the same last_name, first_name, last_updated index, we would have a sorted set for each combination last_name, first_name. For example, the key could be an index: people: last_name = smith: first_name = joel, and it will have an entry for each person we named Joel Smith. Each record will have a name and identifier, as well as its last_updated value. For instance:.

value: 0azbjZRHTQ6U8enBw6BJBw; Rating: 1372761839.444

The main advantages of this are (a) a search where we know that all fields except last_updated will be very easy, and (b) the implementation of time for life will be very simple using ZREMRANGEBYSCORE.

The disadvantage, which seems to me very large:

  • In management and search, this method seems to be much more complicated. For example, we need an index to track all of its keys (in the case of, for example, we want to clear at some point) and do it in a hierarchical order. A search, such as "last_name <" smith ", would require first to look through the list of all the last names to find those that go before the blacksmith, and then for each of those who look at all the names that it contains, and then for each of them that get all the items from their sorted set, in other words, a lot of components to create and worry about.

Completion

So, it seems to me that the first option will be better, despite its shortcomings. I would really appreciate any feedback on these two or other possible solutions (even if they want to use something other than Redis).

+6
source share
3 answers
  • I strongly reject the use of Redis for this. You will store a ton of extra pointer data, and if you ever decide that you want to perform more complex queries such as SELECT WHERE first_name LIKE 'jon%' , you will have problems. You will also need to handle additional, very large indexes that intersect multiple columns in case you want to search for two fields at the same time. You, in fact, will need to continue to crack and reengineer the search structure. You will be much better off using Elastic Search, or Solr, or any other structure already built to do what you are trying to do. Redis is awesome and has many good uses. This is not one of them.

  • Beware to answer your real question: I think you are best off using your first solution. Use one sorted set for the index, but just convert your letters to numbers. Convert your letters to some decimal value. You can use the ASCII value or simply assign each letter a value of 1-26 in lexicographical order if you use English. Standardize that each letter occupies the same numerical length (therefore, if 26 is your largest number, 1 will be written โ€œ01โ€). Then simply add them along with the decimal point in front and use them as your index value (i.e., "Hat" will be ".080120"). This will allow you to properly arrange the 1-to-1 mapping between words and these numbers. When you search, you convert from letters to numbers, and then you can use all the sorting functions specified by Redis, such as ZRANGEBYSCORE , without overwriting them. The Redis functions are written very, very optimally, so you are much better off using them whenever possible, rather than writing your own.

+7
source

You can use my python-stdnet project for this, it does all the indexing for you. For instance:

 class Person(odm.StdModel): first_name = odm.SymbolField() last_name = odm.SymbolField() last_update = odm.DateTimeField() 

Once the model is registered using the redis server , you can do this:

 qs = models.person.filter(first_name='john', last_name='smith') 

and

 qs = models.person.filter(first_name=('john','carl'), last_name=('smith','wood')) 

and much more

Filtering is fast because all identifiers are already in the sets.

+4
source

You can check redblade , it can automatically maintain a service index for you, and it is written by Node.JS.

 //define schema redblade.schema('article', { "_id" : "id" , "poster" : "index('user_article')" , "keywords" : "keywords('articlekeys', return +new Date() / 60000 | 0)" , "title" : "" , "content" : "" }) //insert an article redblade.insert('article', { _id : '1234567890' , poster : 'airjd' , keywords : 'ไฟกๆฏๆŠ€ๆœฏ,JavaScript,NoSQL' , title : 'ๆต‹่ฏ•็”จ็š„SLIDE ๆ ‡้ข˜' , content : 'ๆต‹่ฏ•็”จ็š„SLIDE ๅ†…ๅฎน' }, function(err) { }) //select by index field or keywords redblade.select('article', { poster:'airjd' }, function(err, articles) { console.log(articles[0]) }) redblade.select('article', { keywords: 'NoSQL' }, function(err, articles) { console.log(articles[0]) }) 
0
source

Source: https://habr.com/ru/post/948568/


All Articles