Running Spark on AWS EMR, how to run the driver on the master node?

Question

Running Spark on AWS EMR, how to run the driver on the master node?

It seems that by default, EMR deploys the Spark driver for one of the CORE nodes, resulting in virtually no MASTER node. Is it possible to run the driver program on the MASTER node? I experimented with arguments to --deploy-modeno avail.

Here are my JSON group definitions:

[
  {
    "InstanceGroupType": "MASTER",
    "InstanceCount": 1,
    "InstanceType": "m3.xlarge",
    "Name": "Spark Master"
  },
  {
    "InstanceGroupType": "CORE",
    "InstanceCount": 3,
    "InstanceType": "m3.xlarge",
    "Name": "Spark Executors"
  }
]

Here are my JSON settings:

[
  {
    "Classification": "spark",
    "Properties": {
      "maximizeResourceAllocation": "true"
    },
    "Configurations": []
  },
  {
    "Classification": "spark-env",
    "Properties": {
    },
    "Configurations": [
      {
        "Classification": "export",
        "Properties": {
        },
        "Configurations": [
        ]
      }
    ]
  }
]

Here are my JSON definitions:

[
  {
    "Name": "example",
    "Type": "SPARK",
    "Args": [
      "--class", "com.name.of.Class",
      "/home/hadoop/myjar-assembly-1.0.jar"
    ],
    "ActionOnFailure": "TERMINATE_CLUSTER"
  }
]

I use aws emr create-clusterwith --release-label emr-4.3.0.

+4

amazon-web-services emr apache-spark

Landon kuhn Feb 04 '16 at 19:40

source share

1 answer

Pankaj Arora · Answer 1 · 2016-02-23T05:59:36+0000

Setting driver location

Using the spark-submit function, the -deploy-mode flag can be used to select a driver location.

, . . , . , EMR (, , ), , EMR -.

https://blogs.aws.amazon.com/bigdata/post/Tx578UTQUV7LRP/Submitting-User-Applications-with-spark-submit

Running Spark on AWS EMR, how to run the driver on the master node?

More articles: