Configure Zeppelin Spark Interpreter on EMR when starting a cluster

Question

Configure Zeppelin Spark Interpreter on EMR when starting a cluster

I create clusters on EMR and configure Zeppelin to read notebooks with S3. For this I use a json object that looks like this:

[
  {
    "Classification": "zeppelin-env",
    "Properties": {

    },
    "Configurations": [
      {
        "Classification": "export",
        "Properties": {
        "ZEPPELIN_NOTEBOOK_STORAGE":"org.apache.zeppelin.notebook.repo.S3NotebookRepo",
          "ZEPPELIN_NOTEBOOK_S3_BUCKET":"hs-zeppelin-notebooks",
          "ZEPPELIN_NOTEBOOK_USER":"user"
        },
        "Configurations": [

        ]
      }
    ]
  }
]

I paste this object into the Stoftware EMR configuration page: My question is how / where can I configure the Spark interpreter directly, without having to manually configure it from Zeppelin every time I start the cluster?

+3

emr apache-spark amazon-emr apache-zeppelin

Rami Jul 26 '17 at 13:39

source share

1 answer

rdeboo · Accepted Answer · 2017-07-26T14:20:13+0000

This is a bit related, you need to do 2 things:

Edit .json interpreter from Zeppelin
Restart interpreter

, , script, EMR, script.

Zeppelin json, jq () json. , , , ( ) DepInterpreter:

#!/bin/bash

# 1 edit the Spark interpreter
set -e
cat /etc/zeppelin/conf/interpreter.json | jq '.interpreterSettings."2ANGGHHMQ".interpreterGroup |= .+ [{"class":"org.apache.zeppelin.spark.DepInterpreter", "name":"dep"}]' | sudo -u zeppelin tee /etc/zeppelin/conf/interpreter.json


# Trigger restart of Spark interpreter
curl -X PUT http://localhost:8890/api/interpreter/setting/restart/2ANGGHHMQ

script s3. EMR-

--steps Type=CUSTOM_JAR,Name=CustomJAR,ActionOnFailure=CONTINUE,Jar=s3://eu-west-1.elasticmapreduce/libs/script-runner/script-runner.jar,Args=[s3://mybucket/script.sh]

Configure Zeppelin Spark Interpreter on EMR when starting a cluster

More articles: