Spark: what are the benefits of having multiple artists per site to work with?

I work on an AWS-EMR cluster. This is a cluster of 40 nodes using cr1.8x large instances. Each cr1.8xlarge has 240G of memory and 32 cores. I can work with the following configuration:

--driver-memory 180g --driver-cores 26 --executor-memory 180g --executor-cores 26 --num-executors 40 --conf spark.default.parallelism=4000

or

--driver-memory 180g --driver-cores 26 --executor-memory 90g --executor-cores 13 --num-executors 80 --conf spark.default.parallelism=4000

Since the number of tasks launched simultaneously from the job-tracker website is basically just the number of cores (CPUs). So I'm wondering if there are any advantages or specific scenarios that we want to have more than one artist per node?

Thanks!

+5
source share
1 answer

, node - , . Cloudera.

, :

, , Spark- . NodeManager, 16 64 . NodeManager, yarn.nodemanager.resource.memory-mb yarn.nodemanager.resource.cpu-vcores, , 63 * 1024 = 64512 () 15 . 100% YARN, node Hadoop. ​​ . Cloudera Manager , YARN.

-num-executors 6 -executor-core 15 -executor-memory 63G. , :

63 + 63- NodeManagers. ​​ , , node 15- . 15 / HDFS. -num-executors 17 -executor-core 5 -executor-memory 19G. ?

, AM, . --executor-memory (63/3 node) = 21. 21 * 0.07 = 1.47. 21 - 1,47 ~ 19.

+8

Source: https://habr.com/ru/post/1620669/


All Articles