Kassandra: Visited the Last 10 Users

We are trying to track links clicked on our sites by storing the session ID and URLs in Cassandra. We want to show the last 10 visitors (session identifiers) on the page and chronologically list their journey through our pages.

The "scheme" is as follows:

Session ID is a row key. Each row contains columns, where timestamp is the name and url is the value (we do this because the URL can be clicked several times, and thus the key will not be unique in itself)

We have another column family that contains the row key "lastseen". There, the column name is the timestamp, and the value is sessionId. We did this because we need a chronological ordering of the session identifiers as they appear on our website.

Thus, when the user clicks on the link, we store pairs of timestamp / session values ​​in the last line. And another entry with a timestamp / url in the string for the user session id.

The idea is that we now request the last 10 entries on the last line, and then look at the URLs indicated by this session ID in the corresponding line of the session ID. However, we have duplicate values ​​in the last visible line, i.e. The same user who made the last 10 clicks will return the same session identifier 10 times.

We tried another scheme in which the session identifier row has the last visible column name, places an index on it and makes it smaller than the current time index expression, but cassandra does not currently support this.

Any idea on how to solve this problem effectively? We could check for duplicates on inserts and similar things, but it just seems ugly and uncertain how it will behave under high load. We could get 100 rows and filter manually if duplicates are present, but this is also ugly.

Is there something obvious that we are missing?

Tom

+4
source share
1 answer

I think the easiest solution is what you’ve already thought about - to have the “most recent activity” CF, whose column names are the activity time, and the values ​​are the session identifier and scan back until you get 10 unique values.

If you want Cassandra to perform unique authentication, you should instead perform client-side sorting, which will not scale for a large number of users.

+3
source

Source: https://habr.com/ru/post/1396644/


All Articles