Error creating table in cassandra - Bad query: only clustered key columns can be defined in CLUSTERING ORDER directiv

I get the above error when I try to use the following cql statement, not sure what is wrong with it.

CREATE TABLE Stocks( id uuid,  market text,  symbol text, value text, time timestamp,  PRIMARY KEY(id) ) WITH CLUSTERING ORDER BY (time DESC); Bad Request: Only clustering key columns can be defined in CLUSTERING ORDER directive 

But this works fine, can I use some column that is not part of the primary key to arrange the rows?

 CREATE TABLE timeseries ( ... event_type text, ... insertion_time timestamp, ... event blob, ... PRIMARY KEY (event_type, insertion_time) ... ) ... WITH CLUSTERING ORDER BY (insertion_time DESC); 
+5
source share
2 answers

"can I use any column that is not part of the primary key to arrange the rows?"

No, you can’t. From the DataStax documentation of the SELECT command:

Only one column can be selected in ORDER BY clauses. This column should be the second column in the composite PRIMARY KEY. This also applies to tables with more than two column components in the primary key.

Therefore, for your first CREATE to work, you need to configure your master key:

 PRIMARY KEY(id,time) 

The second column from the composite primary key is known as the "clustering column". This is a column that determines the sort order of data on the disk inside the partition key . Please note that the last part is in italics, because this is important. When you request the Stocks column (table) id column family, all the "rows" of column values ​​for this id will be returned, sorted by time . In Cassandra, you can only specify the order in the partition key (and not for the entire table), and your partition key is the first key specified in the composite primary key.

Of course, the problem is that you probably want the id be unique (this means that CQL will only ever return one β€œrow” of column values ​​per section key). The time requirement to be part of the primary key negates this and allows storing multiple values ​​for the same identifier. This is a problem with breaking your data into a unique identifier. This might be a good idea in the RDBMS world, but it might make it harder to query Kassandra.

Essentially, you will need to revise your data model here. For example, if you want to request prices over time, you can call the table something like "StockPriceEvents" with the primary key (id,time) or (symbol,time) . Querying this table will give you prices recorded for each identifier or symbol, sorted by time. Now it may or may not be valuable for your use case. Just trying to explain how primary keys and sort order work in Kassandra.

Note. You should really use column names that matter more. Things like "id", "time" and "timeseries" are rather vague, not describing anything about the context in which they are used.

+7
source

When creating a table in Cassandra with the CLUSTERING ORDER BY parameter, make sure the clustering column is a Primary column.

Below is a table created using a clustering column, but the "Datetime" clustering column is not a primary key column. Hence the error below.

ERROR_SCRIPT

 cqlsh> CREATE TABLE IF NOT EXISTS cpdl3_spark_cassandra.log_data ( ... IP text, ... URL text, ... Status text, ... UserAgent text, ... Datetime timestamp, ... PRIMARY KEY (IP) ... ) WITH CLUSTERING ORDER BY (Datetime DESC); 

Error: InvalidRequest: Server error: code = 2200 [Invalid query] message = "Only columns of key clusters can be defined in the CLUSTERING ORDER directive"

CORRECTED_SCRIPT (where "Datetime" is added to the primary key columns)

 cqlsh> CREATE TABLE IF NOT EXISTS cpdl3_spark_cassandra.log_data ( ... IP text, ... URL text, ... Status text, ... UserAgent text, ... Datetime timestamp, ... PRIMARY KEY (IP,Datetime) ... ) WITH CLUSTERING ORDER BY (Datetime DESC); 
0
source

Source: https://habr.com/ru/post/1208071/


All Articles