How to run two spark jobs in parallel offline

Question

I have a spark job in which I process a file and then do the following steps.

1. Load the file into DataFrame
2. Push the DataFrame to elasticsearch
3. Run some aggregations on dataframe and save to cassandra

I wrote a spark task for this in which I have the following function calls

writeToES(df)
writeToCassandra(df)

Now these two operations are performed one after another. However, these two can work in parallel.

How can I do this in one spark task.

I can do two spark assignments for writing in ES and Cassandra. But they will use several ports that I want to avoid.

+4

Hard coder Apr 4 '18 at 8:58

1 answer

Ernest Kiwele · Answer 1 · 2018-04-04T09:34:24+0000

. , , , , .

, , :

Spark ( SparkContext) , . "" Spark (, , ) , . Sparks , , (, ).

, ( API , async ):

CompletableFuture.runAsync(() -> writeToES(df));
CompletableFuture.runAsync(() -> writeToCassandra(df));

, . , . FAIR :

conf.set("spark.scheduler.mode", "FAIR")