Order by clause does not work in Cassandra request

I created a table layer using the following code:

CREATE TABLE layer ( layer_name text, layer_position text, PRIMARY KEY (layer_name, layer_position) ) WITH CLUSTERING ORDER BY (layer_position DESC) 

I use the following query to retrieve data from a layer table in descending order (layer):

 $select = new Cassandra\SimpleStatement(<<<EOD select * from layer ORDER BY layer_position DESC EOD ); $result = $session->execute($select); 

But this request does not work. Please help me?

+5
source share
2 answers

Simply put, Cassandra only enforces the sort order inside the section key.

 PRIMARY KEY (layer_name, layer_position) ) WITH CLUSTERING ORDER BY (layer_position DESC) 

In this case, layer_name is your section key. If you specify layer_name in the WHERE clause, your results for this layer_name value will be ordered by layer_position .

 SELECT * FROM layer WHERE layer_name = 'layer1'; 

You do not need to specify ORDER BY. All ORDER BYs can indeed perform a different sorting direction (ascending or descending) at the query level.

Cassandra works this way because it is designed to read data in any order that it sorts on disk. Partition keys are sorted by hash value, so the results from the Unbound WHERE clause appear randomly ordered.

EDIT

I need to get the data using the state_id column, and it should be of the order of layer_position .

Cassandra tables are optimized for a specific query. Although this leads to high performance, the disadvantage is that query flexibility is limited. The way to solve this problem is to duplicate your data in an additional table designed to serve this particular request.

 CREATE TABLE layer_by_state_id ( layer_name text, layer_position text, state_id text, PRIMARY KEY (state_id, layer_position, layer_name) ) WITH CLUSTERING ORDER BY (layer_position DESC, layer_name ASC); 

The following queries are allowed in this table:

 SELECT * FROM layer WHERE state_id='thx1138'; 

And the results will be sorted by layer_position in the requested state_id .

Now I am making a few assumptions that you want to explore:

  • I assume state_id is a good split key. This means that it has a high enough power to offer good distribution in the cluster, but with a low power level, that it returns enough CQL rows to make the sorting worthwhile.
  • I assume that the combination of state_id and layer_position not enough to uniquely identify each row. Therefore, I guarantee uniqueness by adding layer_name as an additional clustering key. You may or may not need it, but I guess you will.
  • I assume that using state_id as the partitioning key will not show unrelated growth to get closer to Cassandra's limit of 2 billion cells per partition. If so, you may need to add an additional bucket section.
+4
source

You cannot use order by directly in Cassandra.

You can use order by to cluster columns only when your partition key is limited to EQ or IN .

0
source

Source: https://habr.com/ru/post/1239340/


All Articles