Very simple answer:
Basically, you can use the SparkLauncher class to launch Spark applications and add some listeners to track progress.
Livy, RESTful Sever Spark. , Livy .
Spark REST , . , REST API
3 , - ;) . :
- SparkLauncher + Spark REST
, ,
Spark -, , .
SparkLauncher
SparkLauncher - spark-launcher. Spark, Spark Submit.
:
1) Spark JAR
2) , -, SparkLauncher, JAR
SparkAppHandle handle = new SparkLauncher()
.setSparkHome(SPARK_HOME)
.setJavaHome(JAVA_HOME)
.setAppResource(pathToJARFile)
.setMainClass(MainClassFromJarWithJob)
.setMaster("MasterAddress
.startApplication();
// or: .launch().waitFor()
startApplication SparkAppHandle, . getAppId.
SparkLauncher API- Spark REST. http://driverNode:4040/api/v1/applications/*ResultFromGetAppId*/jobs, .
API- Spark REST
Spark API RESTful. SparkLauncher, RESTful .
- :
curl -X POST http://spark-master-host:6066/v1/submissions/create
"action" : "CreateSubmissionRequest",
"appArgs" : [ "myAppArgument1" ],
"appResource" : "hdfs:///filepath/spark-job-1.0.jar",
"clientSparkVersion" : "1.5.0",
"environmentVariables" : {
"SPARK_ENV_LOADED" : "1"
},
"mainClass" : "spark.ExampleJobInPreparedJar",
"sparkProperties" : {
"spark.jars" : "hdfs:///filepath/spark-job-1.0.jar",
"spark.driver.supervise" : "false",
"spark.app.name" : "ExampleJobInPreparedJar",
"spark.eventLog.enabled": "true",
"spark.submit.deployMode" : "cluster",
"spark.master" : "spark://spark-cluster-ip:6066"
}
}'
ExampleJobInPreparedJar Spark Master. submissionId, - : curl http://spark-cluster-ip:6066/v1/submissions/status/submissionIdFromResponse. , ,
REST Spark Job
Livy REST Server Spark Job Server RESTful , - RESTful. Spark REST , Livy SJS , JAR . , Spark.
. Livy, , .
1) 1: ,
LivyClient client = new LivyClientBuilder()
.setURI(new URI(livyUrl))
.build();
try {
client.uploadJar(new File(piJar)).get();
double pi = client.submit(new PiJob(samples)).get();
} finally {
client.stop(true);
}
2) 2:
// example in Python. Data contains code in Scala, that will be executed in Spark
data = {
'code': textwrap.dedent("""\
val NUM_SAMPLES = 100000;
val count = sc.parallelize(1 to NUM_SAMPLES).map { i =>
val x = Math.random();
val y = Math.random();
if (x*x + y*y < 1) 1 else 0
}.reduce(_ + _);
println(\"Pi is roughly \" + 4.0 * count / NUM_SAMPLES)
""")
}
r = requests.post(statements_url, data=json.dumps(data), headers=headers)
pprint.pprint(r.json())
, , Spark.
" ". Mist Livy Spark Job Server.
1) :
import io.hydrosphere.mist.MistJob
object MyCoolMistJob extends MistJob {
def doStuff(parameters: Map[String, Any]): Map[String, Any] = {
val rdd = context.parallelize()
...
return result.asInstance[Map[String, Any]]
}
}
2) JAR
3) :
curl --header "Content-Type: application/json" -X POST http://mist_http_host:mist_http_port/jobs --data '{"path": "/path_to_jar/mist_examples.jar", "className": "SimpleContext$", "parameters": {"digits": [1, 2, 3, 4, 5, 6, 7, 8, 9, 0]}, "namespace": "foo"}'
, Mist, , MQTT,
Apache Toree
Apache Toree Spark. JAR. IPython, Python.
Jupyter, API REST-.
:
- SparkLauncher
- API- Spark REST
- REST Spark Job
- Apache Toree
. :
- , JAR : Spark Launcher, Spark REST API
- : Livy, SJS, Mist
- , : Toree ( , )
SparkLauncher Spark. , , JSON.
RESTful API- Spark REST, Livy, SJS Mist. - , . REST API , , Livy SJS - . , Spark REST API Spark, Livy/SJS - . Mist, - Spark.
Toree . , .
REST REST API? SaaS, Livy, Spark. Spark node, , . . Apache Zeppelin Livy Spark