Cassandra: Selecting the first record for each indexed column value

I have an event table and you want to retrieve the first timestamp ( unixtime column) for each user. Is there a way to do this with a single Cassandra request?

The scheme is as follows:

 CREATE TABLE events ( id VARCHAR, unixtime bigint, u bigint, type VARCHAR, payload map<text, text>, PRIMARY KEY(id) ); CREATE INDEX events_u ON events (u); CREATE INDEX events_unixtime ON events (unixtime); CREATE INDEX events_type ON events (type); 
0
source share
1 answer

According to your scheme, each user will have a time stamp. If you need one event for each entry, consider:

 PRIMARY KEY (id, unixtime). 

Assuming this is your schema, user entries will be saved in ascending unixtime order. Be careful though ... if in an unlimited stream of events and users have many events, the section for id will grow and grow. He recommended storing partition sizes up to tens or hundreds of megabytes. If you expect more, you will need to start some form of bucketing.

Now, at your request. In a word, no. If you don’t click on a section (specifying a section key), your request will become widespread. With small data it will work. But with lots of data, you will get timeouts. If you have data in its current form, I recommend that you use the Cassandra Spark and Apache Spark connectors to fulfill your request. An additional advantage of spark plugging is that if you have cassandra nodes as spark plug nodes, because of locality, you can efficiently hit the secondary index without specifying a partition key (which usually causes a wide range query with timeout problems, etc.). d.). You can even use Spark to get the required data and save it in another cassandra table for a quick query.

+1
source

Source: https://habr.com/ru/post/1239342/


All Articles