It Spark 2.2.0-SNAPSHOT.
Why is the number of sections after groupByconversion 200 in the following example?
scala> spark.range(5).groupByKey(_ % 5).count.rdd.getNumPartitions
res0: Int = 200
What is so special about 200? Why not some other number, for example 1024?
They told me about Why are there always 200 tasks in a groupByKey operation? that ask specifically about groupByKey, but the question of the "riddle" is behind by choosing 200the default rather than the default 200 sections.
source
share