I work on an AWS-EMR cluster. This is a cluster of 40 nodes using cr1.8x large instances. Each cr1.8xlarge has 240G of memory and 32 cores. I can work with the following configuration:
--driver-memory 180g --driver-cores 26 --executor-memory 180g --executor-cores 26 --num-executors 40 --conf spark.default.parallelism=4000
or
--driver-memory 180g --driver-cores 26 --executor-memory 90g --executor-cores 13 --num-executors 80 --conf spark.default.parallelism=4000
Since the number of tasks launched simultaneously from the job-tracker website is basically just the number of cores (CPUs). So I'm wondering if there are any advantages or specific scenarios that we want to have more than one artist per node?
Thanks!
source
share