I have 3 Cassandra node clusters with 1 seed node and 1 spark master and 3 slave nodes with 8 GB of RAM and 2 cores. Here is the contribution to my spark assignments
spark.cassandra.input.split.size_in_mb 67108864
When I run this configuration set, I see that about 89.1 MB of data of approximately 1706765 records are created around 768 partitions. I cannot understand why so many partitions are being created. I use the Cassandra spark plug version 1.4, so the error is also fixed regarding the size of the split input.
There are only 11 unique partition keys. My section key has an application name, which is always a test and random number, which is always from 0 to 10, so there are only 11 different unique sections.
Why are there so many sections and how does the question of how many sections to create
Nipun source share