Why is the number of partitions after groupBy 200? Why is this 200 not some other number?

It Spark 2.2.0-SNAPSHOT.

Why is the number of sections after groupByconversion 200 in the following example?

scala> spark.range(5).groupByKey(_ % 5).count.rdd.getNumPartitions
res0: Int = 200

What is so special about 200? Why not some other number, for example 1024?

They told me about Why are there always 200 tasks in a groupByKey operation? that ask specifically about groupByKey, but the question of the "riddle" is behind by choosing 200the default rather than the default 200 sections.

+4
source share
1 answer

This is specified by spark.sql.shuffle.partitions

, , sql , .

(.. , ).

. http://spark.apache.org/docs/latest/sql-programming-guide.html#other-configuration-options

+6

Source: https://habr.com/ru/post/1665089/


All Articles