Why is the number of partitions after groupBy 200? Why is this 200 not some other number?

Question

It Spark 2.2.0-SNAPSHOT.

Why is the number of sections after groupByconversion 200 in the following example?

scala> spark.range(5).groupByKey(_ % 5).count.rdd.getNumPartitions
res0: Int = 200

What is so special about 200? Why not some other number, for example 1024?

They told me about Why are there always 200 tasks in a groupByKey operation? that ask specifically about groupByKey, but the question of the "riddle" is behind by choosing 200the default rather than the default 200 sections.

+4

Jacek laskowski Dec 28 '16 at 9:44

1 answer

Assaf Mendelson · Answer 1 · 2016-12-28T09:53:36+0000

This is specified by spark.sql.shuffle.partitions

, , sql , .

(.. , ).