Convert MySQL / query data to equivalent Cassandra view

Question

Convert MySQL / query data to equivalent Cassandra view

Consider a MySQL table with 500 million rows with the following table structure ...

CREATE TABLE foo_objects ( id int NOT NULL AUTO_INCREMENT, foo_string varchar(32), metadata_string varchar(128), lookup_id int, PRIMARY KEY (id), UNIQUE KEY (foo_string), KEY (lookup_id), );

... which is requested using only the following two queries ...

 # lookup by unique string key, maximum of one row returned SELECT * FROM foo_objects WHERE foo_string = ?; # lookup by numeric lookup key, may return multiple rows SELECT * FROM foo_objects WHERE lookup_id = ?;

Given these queries, how would you present this dataset using Cassandra?

+4

mysql cassandra nosql

knorv Mar 09 '10 at 23:40

source share

1 answer

jbellis · Accepted Answer · 2010-03-10T23:01:15+0000

You have two options:

(1) is kind of traditional: having one CF (columnfamily) with your foo objects, one row per foo, one column per field. then create two index CFs, where the row key in one is string values and the row key in the other is lookup_id. The columns in the rows of the index are foo ids. So you do a GET on the CF index, then returns MULTIGET on the identifiers.

Note that if you can make id the same as lookup_id, then you have another index to support.

High-level clients such as Digg lazyboy ( http://github.com/digg/lazyboy ) will automate the saving of CF indices for you. Cassandra herself does not do this automatically (yet).

(2) is similar to (1), but you duplicate all the foo objects in the column columns of the index rows (that is, the top-level columns of the index are super columns). Unless you really request the foo identifier itself, you do not need to store it in your own CF at all.

Convert MySQL / query data to equivalent Cassandra view

More articles: