Running Spark on AWS EMR, how to run the driver on the master node?

It seems that by default, EMR deploys the Spark driver for one of the CORE nodes, resulting in virtually no MASTER node. Is it possible to run the driver program on the MASTER node? I experimented with arguments to --deploy-modeno avail.

Here are my JSON group definitions:

[
  {
    "InstanceGroupType": "MASTER",
    "InstanceCount": 1,
    "InstanceType": "m3.xlarge",
    "Name": "Spark Master"
  },
  {
    "InstanceGroupType": "CORE",
    "InstanceCount": 3,
    "InstanceType": "m3.xlarge",
    "Name": "Spark Executors"
  }
]

Here are my JSON settings:

[
  {
    "Classification": "spark",
    "Properties": {
      "maximizeResourceAllocation": "true"
    },
    "Configurations": []
  },
  {
    "Classification": "spark-env",
    "Properties": {
    },
    "Configurations": [
      {
        "Classification": "export",
        "Properties": {
        },
        "Configurations": [
        ]
      }
    ]
  }
]

Here are my JSON definitions:

[
  {
    "Name": "example",
    "Type": "SPARK",
    "Args": [
      "--class", "com.name.of.Class",
      "/home/hadoop/myjar-assembly-1.0.jar"
    ],
    "ActionOnFailure": "TERMINATE_CLUSTER"
  }
]

I use aws emr create-clusterwith --release-label emr-4.3.0.

+4
source share
1 answer

Setting driver location

Using the spark-submit function, the -deploy-mode flag can be used to select a driver location.

, . . , . , EMR (, , ), , EMR -.

https://blogs.aws.amazon.com/bigdata/post/Tx578UTQUV7LRP/Submitting-User-Applications-with-spark-submit

+1

Source: https://habr.com/ru/post/1627346/


All Articles