Composite indexing using Redis in a hierarchical data model

I have a data model:
Fields:

  • counter number (e.g. 00888, 00777, 00123, etc.)
  • counter code (e.g. XA, XD, ZA, SI, etc.)
  • start date (e.g. 2017-12-31 ...)
  • end date (e.g. 2017-12-31 ...)
  • Another counter date (e.g. xxxxx)

The current organization of the data structure looks like this (root and multiple child format):

counter_num + counter_code ---> start_date + end_date --> xxxxxxxx ---> start_date + end_date --> xxxxxxxx ---> start_date + end_date --> xxxxxxxx 

Example:

 00888 + XA ---> Jan 10 + Jan 20 --> xxxxxxxx ---> Jan 21 + Jan 31 --> xxxxxxxx ---> Feb 01 + Dec 31 --> xxxxxxxx 00888 + ZI ---> Jan 09 + Feb 24 --> xxxxxxxx ---> Feb 25 + Dec 31 --> xxxxxxxx 00777 + XA ---> Jan 09 + Feb 24 --> xxxxxxxx ---> Feb 25 + Dec 31 --> xxxxxxxx 

Today, the search takes place in two ways:

 //Fetch unique counter data using all the composite keys counter_number + counter_code + date (start_date <= date <= end_date) //Fetch all the counter codes and corresponding data matching the below conditions counter_number + date (start_date <= date <= end_date) 

What is the best way to simulate this in redis, as I need to cache some of the frequently accessed data. I feel that sorted sets should do it somehow, but are unable to model it.

UPDATE:

Just to eliminate the confusion, the request here is not about the SQL query "BETWEEN". "Coz I don't know what the values ​​of start_date and end_date are. Think these are just column names.

I do not want

 SELECT * FROM redis_db WHERE counter_num AND date_value BETWEEN start_date AND end_date 

I want to

 SELECT * FROM redis_db WHERE counter_num AND start_date <= specifc_date AND end_date >= specific_date 

NOTE. . The requirement is almost close to 2D indexing of what is proposed in the Redis multidimensional indexing document.

https://redis.io/topics/indexes#multi-dimensional-indexes

I understood the concept, but could not digest the implementation details that are given.

+5
source share
2 answers

I’m unlikely to do it on time for generosity, but what the hell ...

It sounds like a job for geohashing. Geohash is what you do when you want to index a two-dimensional (or higher) dataset. For example, if you have a database of cities and want to respond quickly to queries such as “find all cities within 50 km of X”, you use geohash.

For the purposes of this question, you can think of start_date and end_date as x and y coordinates. Usually in geohashing you look for points in a dataset near a specific point in space or in a specific limited area of ​​space. In this case, you only have the lower border of one of the coordinates and the upper border on the other. But I believe that in practice the entire data set is still limited, so this is not a problem.

It would be nice if there was a library for this in Redis. Probably eat if you look heavy enough. Newer versions of Redis have a built-in geohashing feature. See Commands Starting with GEO . But it does not pretend to be very accurate and is intended for the surface of a sphere, not a flat surface.

So, as far as I can see, you have 3 options:

  • Compare your search space with a small part of the sphere, preferably near the equator. Use the Redis GEO Commands. To perform a search, use GEOSPHERE on the circle that covers the triangle that you are trying to search, taking into account the built-in inaccuracy and distortion obtained by matching on the sphere, then filter the results to get those that are inside the triangle.
  • Find a third-party geohashing client for Redis that works in a flat space and is more accurate than GEO.
  • Read the rest of this answer or some other geohashing tutorial and then implement it yourself on top of Redis. This is the most difficult (but most educational) option.

If you have a database that indexes data using numerical ordering, so that you can make queries like "find all rows / records for which z is between a and b " you can create a geohash index on top of it. Suppose the coordinates are (non-negative) integers x and y . Then you add the integer column z and index z . To compute z , write x and y in binary, then take alternate numbers from each. Example:

 x = 969 = 0 1 1 1 1 0 0 1 0 0 1 y = 1130 = 1 0 0 0 1 1 0 1 0 1 0 z = 1750214 = 0110101011010011000110 

Note that the index allows you to find, for example, all records located with z between 0101100000000000000000 and 0101101111111111111111 inclusive. In other words, all entries for which z starts with 010110 . Or, to put it another way, you can find all entries for which x starts at 001 and y starts at 110 . This set of records corresponds to a square in the two-dimensional space we are trying to execute.

Not all squares can be searched this way. We will call these search squares. Suppose a client sends a request for all records for which (x,y) is inside a specific rectangle. (Or a circle or some other reasonable geometric shape.) Then you need to find a set of searchable squares that cover the rectangle. Then, for each of these squares that you have selected, query the database for the records within that square and send the results to the client. (But you will have to filter the results, because not all records in the square are in the original rectangle.)

There will be a balance. If you select a small number of large special squares, you will probably find yourself in a much larger area of ​​the map than you need; a database query will return many additional results that you will have to filter out. Alternatively, if you use many small special squares, you will make many database queries, many of which will not return any results.

I said above that x and y can be start_time and end_time . But in fact, the distribution of your dataset will not be as symmetrical as in most cases of using geohash. Thus, performance can be better (or worse) if you use x = end_time + start_time and y = end_time - start_time .

+3
source

Since your question remains a little vague as to how you want to request your data, it remains unclear how to resolve your question. With that in mind, however, here are my thoughts on how I can model your data:

Updated answer detailing how to use SORTED SET

I edited this answer to be able to store your values ​​so that you can query for dynamic date ranges. This edit assumes that your database values ​​are timestamps, as in the value for one time, and not 2, as in your current setting.

Yes, you are correct that using Sorted Sets can accomplish this. I suggest you always use the Unix timestamp value for the rating component in these sorted sets.

If you are new to redis, explain the limitations of indexing. Redis is a simple key value designed to quickly get values ​​with a key. Because of this design, it does not contain many of the features of your traditional DBMS, such as column indexing.

In redis, you index using a key, and the most nested key structures are available in HASH and SORTED SET, but you only get two key structures. In HASH, you have a key (the same as any data type), and an internal hash key that can take the form of any string.

In a SORTED INSTALL, you have a key (the same as any data type) and a numerical value.

A HASH is nice to use to store organized grouped data.

A SORTED KIT is good if you want to request a range of values. This may work well for your data.

Your SORTED SET will look like this:

 key 00888:XA => score (date value) value 1452427200 (2016-01-10) xxxxxxxx 1452859200 (2016-01-10) yyyyxxxx 1453291200 (2016-01-10) zzzzxxxx 

Let's use a more intuitive example, a 2017 Juventus list:

To create the SORTED KIT in the table below, run this command on your redis client:

 ZADD JUVENTUS 32 "Emil Audero" 1 "Gianluigi Buffon" 42 "Mattia Del Favero" 36 "Leonardo Loria" 25 "Neto" 15 "Andrea Barzagli" 4 "Medhi Benatia" 19 "Leonardo Bonucci" 3 "Giorgio Chiellini" 40 "Luca Coccolo" 29 "Paolo De Ceglie" 26 "Stephan Lichtsteiner" 12 "Alex Sandro" 24 "Daniele Rugani" 43 "Alessandro Semprini" 23 "Dani Alves" 22 "Kwadwo Asamoah" 7 "Juan Cuadrado" 6 "Sami Khedira" 18 "Mario Lemina" 46 "Mehdi Leris" 38 "Rolando Mandragora" 8 "Claudio Marchisio" 14 "Federico Mattiello" 45 "Simone Muratore" 20 "Marko Pjaca" 5 "Miralem Pjanic" 28 "Tomás Rincón" 27 "Stefano Sturaro" 21 "Paulo Dybala" 9 "Gonzalo Higuaín" 34 "Moise Kean" 17 "Mario Mandzukic" Jersey Name Jersey Name 32 Emil Audero 23 Dani Alves 1 Gianluigi Buffon 42 Mattia Del Favero 36 Leonardo Loria 25 Neto 15 Andrea Barzagli 4 Medhi Benatia 19 Leonardo Bonucci 3 Giorgio Chiellini 40 Luca Coccolo 29 Paolo De Ceglie 26 Stephan Lichtsteiner 12 Alex Sandro 24 Daniele Rugani 43 Alessandro Semprini 22 Kwadwo Asamoah 7 Juan Cuadrado 6 Sami Khedira 18 Mario Lemina 46 Mehdi Leris 38 Rolando Mandragora 8 Claudio Marchisio 14 Federico Mattiello 45 Simone Muratore 20 Marko Pjaca 5 Miralem Pjanic 28 Tomás Rincón 27 Stefano Sturaro 21 Paulo Dybala 9 Gonzalo Higuaín 34 Moise Kean 17 Mario Mandzukic 

Request a list using a number of knitwear numbers:

 ZRANGEBYSCORE JUVENTUS 1 5 Output: 1) "Gianluigi Buffon" 2) "Giorgio Chiellini" 3) "Medhi Benatia" 4) "Miralem Pjanic" 

Please note that points are not refunded, however the ZRANGEBYSCORE team orders the results in ASC order for the result. To add points, add the "WITHSCORES" command to the command: ZRANGEBYSCORE JUVENTUS 1 5 WITHSCORES

Using ZRANGEBYSCORE, you can request any key (counter number + counter code) with a date range, producing values ​​in that range.



Original: Below is my original answer recommending HASH

Based on your examples, I recommend using HASH .

With a hash, you will have a master key for finding a hash (example 00888:XA ). Then in the hash you have key pairs → values ​​(example 2017-01-10:2017-01-20xxxxxxxx ). I prefer to delimit or label the components of my keys with a colon colon : but you can use any delimiter.

HASH matches your data structure very well:

 key 00888:XA => hashkey value 2017-01-10:2017-01-20 xxxxxxxx 2017-01-21:2017-01-31 yyyyxxxx 2016-02-01:2016-12-31 zzzzxxxx key 00888:ZI => hashkey value 2017-01-10:2017-01-20 xxxxxxxx 2017-01-21:2017-01-31 xxxxyyyy 2016-02-01:2016-12-31 xxxxzzzz 

When requesting data instead of a GET key you request using the HGET key hashkey . For setting values, use the HSET key hashkey value instead of SET key value HSET key hashkey value .

Command Examples

 HSET 00777:XA 2017-01-10:2017-01-20 xxxxxxxx HSET 00777:XA 2017-01-21:2017-01-31 yyyyyyyy HSET 00777:XA 2016-02-01:2016-12-31 zzzzzzzz 

(Note: there is also an HMSET to simplify this into one command) Then:

 HGET 00777:XA 2017-01-21:2017-01-31 

Would yyyyyyyy

If there is no specific performance consideration or other purpose for your data, I think the hash is perfect for your system.

It is also very convenient if you want to get all hashkeys or all values ​​for a given hash using commands like HKEYS , HVALS or HGETALL .

+3
source

Source: https://habr.com/ru/post/1268183/


All Articles