I need to read a huge number of records in a table that I want to write to another table. So I wrote a java + scala program that uses rdd to scan the source table and write each record in the target table. The program is sent to the spark cluster, which is connected to the cassandra cluster located on Amazon with the following settings:
The hidden cluster has one master and four slaves with 8 cores and 16 GB of RAM
The cassandra cluster has three nodes with 8 cores, 32 gigabytes of bar, standard hdds giving the source and destination tables, and sdd for the commit log.
The key is distributed by the three nodes of the cassandra, which means that this cluster is not fault tolerant. Everything works fine in the first 8-16 hours, and the use of resources does not seem to indicate that the cassandra may be overloaded (at least not at first, but not necessarily at night).
Unfortunately, after a few hours, one node ends with an inconsistent version of the schema, and therefore the cluster fails. I am trying to determine which of the possible reasons for this, any advice is welcome!
thank you for your time
Additional configuration information on request:
concurrent_compactors: Default (Fewer drives or cores, minimum 2 and maximum 8 per processor core)
compaction_throughput_mb_per_sec: Default (16)
compression strategy: default strategy (dimensional multi-level packaging)