We decided to transfer 5 years of data from Apache Cassandra to Google BigQuery. The problem was not only data transfer or export / import, the problem was the very old Cassandra!
After extensive research, we planned a migration to export data to csv, and then uploaded it to Google Cloud Storage for import to Big Query.
, Cassandra 1.1 ! , - - ! , 2.2.
3.4 , , 2.2 ! , , , .
, 2.2 . , 2.0, 2.2. , . docs.datastax.com .
, :
- , .
- ( SSTables, ..)
SSTable, .
nodetool upgradesstables
nodetool drain
node
- ( )
- Cassandra, stables ( 3) node.
Cassandra:
/etc/yum.repos.d/datastax.repo
[datastax]
name = DataStax Repo for Apache Cassandra
baseurl = https://rpm.datastax.com/community
enabled = 1
gpgcheck = 0
:
yum install dsc20
service cassandra start
Cassandra 2+ csv .
:
cqlsh -u username -p password
describe tables;
describe table abcd;
describe schema;
, , . , .
vi commands.list
, :
COPY keyspace.tablename TO '/backup/export.csv';
, , :
cqlsh -u username -p password -f /backup/commands.list
, csv. , , Google Cloud Storage:
gsutil rsync /backup gs://bucket
API Google CSV Google BigQuery. Google cloud.google.com.