Spark Java Job Scheduling

I have a Spark job that reads an HBase table, some aggregations and saves data in mongoDB. This task is currently being performed manually using a spark-submit script. I want to schedule it to run at a fixed interval.

How can I achieve this with java.

Any library? Or can I do this with Thread in java?

Any suggestions appreciated!

+4
source share
1 answer

If you want to use spark-submit, I prefer crontab or something similar and run a bash script for example.

"spark-submit" java, Package org.apache.spark.launcher. SparkLauncher.

import org.apache.spark.launcher.SparkAppHandle;
import org.apache.spark.launcher.SparkLauncher;

...

     public void startApacheSparkApplication(){
        SparkAppHandle handler = new SparkLauncher()
         .setAppResource("pathToYourSparkApp.jar")
         .setMainClass("your.package.main.Class")
         .setMaster("local")
         .setConf(...)
         .startApplication(); // <-- and start spark job app
     }
...

. Timer Date, java util (java.util.TimerTask), Quartz Job Scheduling Library - ( spring Quartz Scheduler ).

Spring , JDK 1.3, Quartz Scheduler (http://quartz-scheduler.org)....

cron scheduling, .

maven

<!-- https://mvnrepository.com/artifact/org.quartz-scheduler/quartz -->
<dependency>
    <groupId>org.quartz-scheduler</groupId>
    <artifactId>quartz</artifactId>
    <version>2.2.3</version>
</dependency>

-

   public class SparkLauncherQuartzJob implements Job {
         startApacheSparkApplication();
   ...

 // trigger runs every hour
 Trigger trigger = new Trigger() 
             .withIdentity("sparkJob1Trigger", "sparkJobsGroup")
             .withSchedule(
                 CronScheduleBuilder.cronSchedule("0 * * * * ?"))
             .build();


  JobDetail sparkQuartzJob = JobBuilder.newJob(SparkLauncherQuartzJob.class).withIdentity("SparkLauncherQuartzJob", "sparkJobsGroup").build();

  Scheduler scheduler = new StdSchedulerFactory().getScheduler();
  scheduler.start();
  scheduler.scheduleJob(sparkQuartzJob , trigger);

. spring, - @EnableScheduling - :

@Scheduled(fixedRate = 300000)
public void periodicalRunningSparkJob() {
    log.info("Spark job periodically execution");
    startApacheSparkApplication();
}
+2

Source: https://habr.com/ru/post/1656777/


All Articles