I will give you a bad way to do this first. If you insert these lines:
insert into foo (row,column,txt) values (1,1,'First Insert'); insert into foo (row,column,txt) values (1,2,'Second Insert'); insert into foo (row,column,txt) values (2,1,'First Insert'); insert into foo (row,column,txt) values (2,2,'Second Insert');
Performance
'select row from foo;'
will provide you with the following:
row ----- 1 1 2 2
Not different, since it shows all possible combinations of rows and columns. To query the value of a single row, you can add a column value:
select row from foo where column = 1;
But then you get this warning:
Bad Request: Cannot execute this query as it might involve data filtering and thus may have unpredictable performance. If you want to execute this query despite the performance unpredictability, use ALLOW FILTERING
Ok Then with this:
select row from foo where column = 1 ALLOW FILTERING; row
Great. What I wanted. However, we will not ignore this warning. If you have only a small number of lines, say 10,000, then this will work without huge success. What if I have 1 billion? Depending on the number of nodes and the replication rate, your performance will be seriously affected. First, the query should check every possible row in the table (read the full table scan), and then filter unique values ββfor the result set. In some cases, this request will simply be disabled. Given that this is probably not what you were looking for.
You mentioned that you were worried about performance when pasting into multiple tables. Multiple table insertions are a perfectly valid data modeling technique. Cassandra can do a lot of letters. As for the pain in synchronization, I do not know your specific application, but I can give general advice.
If you need a separate scan, you need to think about section columns. This is what we call an index or query table. An important thing to consider in any Cassandra data model is application requests. If I used the IP address as a string, I could create something like this to scan all the IP addresses that I have.
CREATE TABLE ip_addresses ( first_quad int, last_quads ascii, PRIMARY KEY (first_quad, last_quads) );
Now, to insert some lines into the address space 192.xxx:
insert into ip_addresses (first_quad,last_quads) VALUES (192,'000000001'); insert into ip_addresses (first_quad,last_quads) VALUES (192,'000000002'); insert into ip_addresses (first_quad,last_quads) VALUES (192,'000001001'); insert into ip_addresses (first_quad,last_quads) VALUES (192,'000001255');
To get individual lines in space 192, I do this:
SELECT * FROM ip_addresses WHERE first_quad = 192; first_quad | last_quads
To get each individual address, you just need to iterate over all possible lines from 0-255. In my example, I expect the application to request specific ranges in order to remain operational. Your application may have different needs, but hopefully you can see the template here.