Order Results in Kassandra

I have two questions about query results in Cassandra.

  • When I make a “full” table selection in Cassandra (i.e. select * from table ), is it guaranteed that the results will be returned in ascending order of split tokens? For example, having the following table:

     create table users(id int, name text, primary key(id)); 

    Is it guaranteed that the next query will return results with increasing values ​​in the token column?

     select token(id), id from users; 

    If so, is the distribution of data across multiple nodes in the cluster also guaranteed?

  • If the underwriter on the above question is yes, is it still valid if we use a secondary index? For example, if we had the following index:

     create index on users(name); 

    and we query the table using the index:

     select token(id), id from users where name = 'xyz'; 

    Is there any guarantee regarding the order of the results?

The motivation for the above questions is that a token is the right thing to use in order to implement paging and / or resume a longer “data export”.

EDIT: There are several resources on the network that claim that the order corresponds to the order of the marker (for example, in the description of the separator results or this Datastax page ):

Without the section key specified in the WHERE clause, the actual result set order becomes dependent on the hashed userid values.

However, the order of the results is not indicated in the official Cassandra documentation, for example. SELECT statement .

+5
source share
1 answer

Is it guaranteed that the next query will return results with increasing values ​​in the token column?

Yes it

If so, is the distribution of data across multiple nodes in the cluster also guaranteed?

Data distribution orthogonal to the ordering of the extracted data, lack of relationship

If the anwer to the above question is yes, is it still valid if we use a secondary index?

Yes, even if you request data using a secondary index (whether it be SASI or an embedded implementation), the returned results will always be sorted in marker order. What for? A technical explanation is given in my blog post here: http://www.doanduyhai.com/blog/?p=13191#cluster_read_path

This is the main reason explaining why SASI is not suitable if you want the search to return data sorted by certain column values. Only real integration with a search engine (for example, Datastax Enterprise Search) can lead to the correct order, since it bypasses the layer of the cluster reading path.

+2
source

Source: https://habr.com/ru/post/1272993/


All Articles