In several places, he advised creating our Cassandra tables in accordance with the requests that we will fulfill on them. This DataScale article says the following:
The truth is that having many similar tables with similar data is good in Kassandra. Limit the primary key to exactly who you are looking for. If you plan to search for data with similar but different criteria, make them a separate table. There is no shortage of the fact that the same data is stored in different ways. Duplication of data is your friend in Kassandra.
[...]
If you need to save the same piece of data in 14 different tables, write it down 14 times. There are no barriers to multiple entries.
I understood this, and now my question is: provided that I have an existing table, let's say
CREATE TABLE invoices ( id_invoice int PRIMARY KEY, year int, id_client int, type_invoice text )
But I want to query for year and type instead, so I would like to have something like
CREATE TABLE invoices_yr ( id_invoice int, year int, id_client int, type_invoice text, PRIMARY KEY (type_invoice, year) )
With id_invoice
as the partition key and year
as the clustering key, what is the preferred way to copy data from one table to another to make optimized queries later?
My version of Cassandra:
user@cqlsh > show version; [cqlsh 5.0.1 | Cassandra 3.5.0 | CQL spec 3.4.0 | Native protocol v4]
source share