If you want to use spark-submit, I prefer crontab or something similar and run a bash script for example.
"spark-submit" java, Package org.apache.spark.launcher. SparkLauncher.
import org.apache.spark.launcher.SparkAppHandle;
import org.apache.spark.launcher.SparkLauncher;
...
public void startApacheSparkApplication(){
SparkAppHandle handler = new SparkLauncher()
.setAppResource("pathToYourSparkApp.jar")
.setMainClass("your.package.main.Class")
.setMaster("local")
.setConf(...)
.startApplication();
}
...
. Timer Date, java util (java.util.TimerTask), Quartz Job Scheduling Library - ( spring Quartz Scheduler ).
Spring , JDK 1.3, Quartz Scheduler (http://quartz-scheduler.org)....
cron scheduling, .
maven
<dependency>
<groupId>org.quartz-scheduler</groupId>
<artifactId>quartz</artifactId>
<version>2.2.3</version>
</dependency>
-
public class SparkLauncherQuartzJob implements Job {
startApacheSparkApplication();
...
Trigger trigger = new Trigger()
.withIdentity("sparkJob1Trigger", "sparkJobsGroup")
.withSchedule(
CronScheduleBuilder.cronSchedule("0 * * * * ?"))
.build();
JobDetail sparkQuartzJob = JobBuilder.newJob(SparkLauncherQuartzJob.class).withIdentity("SparkLauncherQuartzJob", "sparkJobsGroup").build();
Scheduler scheduler = new StdSchedulerFactory().getScheduler();
scheduler.start();
scheduler.scheduleJob(sparkQuartzJob , trigger);
. spring, - @EnableScheduling - :
@Scheduled(fixedRate = 300000)
public void periodicalRunningSparkJob() {
log.info("Spark job periodically execution");
startApacheSparkApplication();
}