It seems that by default, EMR deploys the Spark driver for one of the CORE nodes, resulting in virtually no MASTER node. Is it possible to run the driver program on the MASTER node? I experimented with arguments to --deploy-modeno avail.
Here are my JSON group definitions:
[
{
"InstanceGroupType": "MASTER",
"InstanceCount": 1,
"InstanceType": "m3.xlarge",
"Name": "Spark Master"
},
{
"InstanceGroupType": "CORE",
"InstanceCount": 3,
"InstanceType": "m3.xlarge",
"Name": "Spark Executors"
}
]
Here are my JSON settings:
[
{
"Classification": "spark",
"Properties": {
"maximizeResourceAllocation": "true"
},
"Configurations": []
},
{
"Classification": "spark-env",
"Properties": {
},
"Configurations": [
{
"Classification": "export",
"Properties": {
},
"Configurations": [
]
}
]
}
]
Here are my JSON definitions:
[
{
"Name": "example",
"Type": "SPARK",
"Args": [
"--class", "com.name.of.Class",
"/home/hadoop/myjar-assembly-1.0.jar"
],
"ActionOnFailure": "TERMINATE_CLUSTER"
}
]
I use aws emr create-clusterwith --release-label emr-4.3.0.