Best practice for running Spark apps through a web app?

I want to open Spark apps for users using a web app.

In principle, the user can decide what action he wants to run and enter several variables that need to be transferred to the spark application. For example: A user enters several fields, and then clicks a button that performs the following actions: "Run sparkApp1 with the parameter min_x, max_x, min_y, max_y."

The spark application must start with the parameters specified by the user. Upon completion, web applications may be required to retrieve the results (from hdfs or mongodb) and display them to the user. During processing, the web application should display the status of the Spark application.

My question is:

  • How can a web app launch a Spark app? He may be able to run it from the command line under the hood, but there may be a better way to do this.
  • How can a web application access the current status of a Spark application? Does status get from the Spark WebUI REST API path?

I am running the Spark 1.6.1 group with YARN / Mesos (not sure yet) and MongoDB.

+26
source share
3 answers

Very simple answer:

Basically, you can use the SparkLauncher class to launch Spark applications and add some listeners to track progress.

Livy, RESTful Sever Spark. , Livy .

Spark REST , . , REST API

3 , - ;) . :

  • SparkLauncher + Spark REST

, ,

Spark -, , .

SparkLauncher

SparkLauncher - spark-launcher. Spark, Spark Submit.

:

1) Spark JAR 2) , -, SparkLauncher, JAR

SparkAppHandle handle = new SparkLauncher()
    .setSparkHome(SPARK_HOME)
    .setJavaHome(JAVA_HOME)
    .setAppResource(pathToJARFile)
    .setMainClass(MainClassFromJarWithJob)
    .setMaster("MasterAddress
    .startApplication();
    // or: .launch().waitFor()

startApplication SparkAppHandle, . getAppId.

SparkLauncher API- Spark REST. http://driverNode:4040/api/v1/applications/*ResultFromGetAppId*/jobs, .

API- Spark REST

Spark API RESTful. SparkLauncher, RESTful .

- :

curl -X POST http://spark-master-host:6066/v1/submissions/create --header "Content-Type:application/json;charset=UTF-8" --data '{
  "action" : "CreateSubmissionRequest",
  "appArgs" : [ "myAppArgument1" ],
  "appResource" : "hdfs:///filepath/spark-job-1.0.jar",
  "clientSparkVersion" : "1.5.0",
  "environmentVariables" : {
    "SPARK_ENV_LOADED" : "1"
  },
  "mainClass" : "spark.ExampleJobInPreparedJar",
  "sparkProperties" : {
    "spark.jars" : "hdfs:///filepath/spark-job-1.0.jar",
    "spark.driver.supervise" : "false",
    "spark.app.name" : "ExampleJobInPreparedJar",
    "spark.eventLog.enabled": "true",
    "spark.submit.deployMode" : "cluster",
    "spark.master" : "spark://spark-cluster-ip:6066"
  }
}'

ExampleJobInPreparedJar Spark Master. submissionId, - : curl http://spark-cluster-ip:6066/v1/submissions/status/submissionIdFromResponse. , ,

REST Spark Job

Livy REST Server Spark Job Server RESTful , - RESTful. Spark REST , Livy SJS , JAR . , Spark.

. Livy, , .

1) 1: ,

// creating client
LivyClient client = new LivyClientBuilder()
  .setURI(new URI(livyUrl))
  .build();

try {
  // sending and submitting JAR file
  client.uploadJar(new File(piJar)).get();
  // PiJob is a class that implements Livy Job
  double pi = client.submit(new PiJob(samples)).get();
} finally {
  client.stop(true);
}

2) 2:

// example in Python. Data contains code in Scala, that will be executed in Spark
data = {
  'code': textwrap.dedent("""\
    val NUM_SAMPLES = 100000;
    val count = sc.parallelize(1 to NUM_SAMPLES).map { i =>
      val x = Math.random();
      val y = Math.random();
      if (x*x + y*y < 1) 1 else 0
    }.reduce(_ + _);
    println(\"Pi is roughly \" + 4.0 * count / NUM_SAMPLES)
    """)
}

r = requests.post(statements_url, data=json.dumps(data), headers=headers)
pprint.pprint(r.json()) 

, , Spark.

" ". Mist Livy Spark Job Server.

1) :

import io.hydrosphere.mist.MistJob

object MyCoolMistJob extends MistJob {
    def doStuff(parameters: Map[String, Any]): Map[String, Any] = {
        val rdd = context.parallelize()
        ...
        return result.asInstance[Map[String, Any]]
    }
} 

2) JAR 3) :

curl --header "Content-Type: application/json" -X POST http://mist_http_host:mist_http_port/jobs --data '{"path": "/path_to_jar/mist_examples.jar", "className": "SimpleContext$", "parameters": {"digits": [1, 2, 3, 4, 5, 6, 7, 8, 9, 0]}, "namespace": "foo"}'

, Mist, , MQTT,

Apache Toree

Apache Toree Spark. JAR. IPython, Python.

Jupyter, API REST-.

:

  • SparkLauncher
  • API- Spark REST
  • REST Spark Job
  • Apache Toree

. :

  • , JAR : Spark Launcher, Spark REST API
  • : Livy, SJS, Mist
  • , : Toree ( , )

SparkLauncher Spark. , , JSON.

RESTful API- Spark REST, Livy, SJS Mist. - , . REST API , , Livy SJS - . , Spark REST API Spark, Livy/SJS - . Mist, - Spark.

Toree . , .

REST REST API? SaaS, Livy, Spark. Spark node, , . . Apache Zeppelin Livy Spark

+38

SparkLauncher T.Gawęda :

SparkAppHandle handle = new SparkLauncher()
    .setSparkHome(SPARK_HOME)
    .setJavaHome(JAVA_HOME)
    .setAppResource(SPARK_JOB_JAR_PATH)
    .setMainClass(SPARK_JOB_MAIN_CLASS)
    .addAppArgs("arg1", "arg2")
    .setMaster("yarn-cluster")
    .setConf("spark.dynamicAllocation.enabled", "true")
    .startApplication();

- java Spark, . SparkLauncher SparkAppHandle, . , Spark rest-api:

http://driverHost:4040/api/v1/applications/[app-id]/jobs

, SparkLauncher:

<dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-launcher_2.10</artifactId>
    <version>2.0.1</version>
</dependency>
+7

PredictionIO PredictionIO, ML.   https://github.com/apache/predictionio

0

Source: https://habr.com/ru/post/1662424/


All Articles