How to export data from Cassandra to BigQuery

I have Apache Cassandra working on 4 virtual machines on Google Cloud. I found this too expensive and want to export all the data to BigQuery. Kassandra has about 2 TB (60 billion rows). Any suggestions how can I do this?

Thanks in advance.

+4
source share
2 answers

We decided to transfer 5 years of data from Apache Cassandra to Google BigQuery. The problem was not only data transfer or export / import, the problem was the very old Cassandra!

After extensive research, we planned a migration to export data to csv, and then uploaded it to Google Cloud Storage for import to Big Query.

, Cassandra 1.1 ! , - - ! , 2.2.

3.4 , , 2.2 ! , , , .

, 2.2 . , 2.0, 2.2. , . docs.datastax.com .

, :

  • , .
  • ( SSTables, ..)
  • SSTable, .

    nodetool upgradesstables

  • nodetool drain

  • node

  • ( )
  • Cassandra, stables ( 3) node. Cassandra:

/etc/yum.repos.d/datastax.repo

[datastax]
name = DataStax Repo for Apache Cassandra
baseurl = https://rpm.datastax.com/community
enabled = 1
gpgcheck = 0

:

yum install dsc20
service cassandra start

Cassandra 2+ csv .

:

cqlsh -u username -p password
describe tables;
describe table abcd;
describe schema;

, , . , .

vi commands.list

, :

COPY keyspace.tablename TO '/backup/export.csv';

, , :

cqlsh -u username -p password -f /backup/commands.list

, csv. , , Google Cloud Storage:

gsutil rsync /backup gs://bucket

API Google CSV Google BigQuery. Google cloud.google.com.

+7

, , Apache Beam Cloud Dataflow.

Beam IO- Apache Cassandra Google BigQuery

CassandraIO (Java)

, Beam, / , .

+3

Source: https://habr.com/ru/post/1687736/


All Articles