Spark has a REST apis for sending jobs by invoking the name of the master spark master.
Submit Application:
curl -X POST http://spark-cluster-ip:6066/v1/submissions/create
"action" : "CreateSubmissionRequest",
"appArgs" : [ "myAppArgument1" ],
"appResource" : "file:/myfilepath/spark-job-1.0.jar",
"clientSparkVersion" : "1.5.0",
"environmentVariables" : {
"SPARK_ENV_LOADED" : "1"
},
"mainClass" : "com.mycompany.MyJob",
"sparkProperties" : {
"spark.jars" : "file:/myfilepath/spark-job-1.0.jar",
"spark.driver.supervise" : "false",
"spark.app.name" : "MyJob",
"spark.eventLog.enabled": "true",
"spark.submit.deployMode" : "cluster",
"spark.master" : "spark://spark-cluster-ip:6066"
}
}'
Dispatch Response:
{
"action" : "CreateSubmissionResponse",
"message" : "Driver successfully submitted as driver-20151008145126-0000",
"serverSparkVersion" : "1.5.0",
"submissionId" : "driver-20151008145126-0000",
"success" : true
}
Get the status of the sent application
curl http://spark-cluster-ip:6066/v1/submissions/status/driver-20151008145126-0000
State response
{
"action" : "SubmissionStatusResponse",
"driverState" : "FINISHED",
"serverSparkVersion" : "1.5.0",
"submissionId" : "driver-20151008145126-0000",
"success" : true,
"workerHostPort" : "192.168.3.153:46894",
"workerId" : "worker-20151007093409-192.168.3.153-46894"
}
Now, in the application, the spark that you send should do all the operations and save output to any datasource and access the data via thrift server, since they don’t have much data to transfer (you can think of sqoop if you want to transfer data between your MVC db application and the Hadoop cluster),
loans: link1 , link2
: ( )
. , CSV MLib, -.