I have a spark job in which I process a file and then do the following steps.
1. Load the file into DataFrame
2. Push the DataFrame to elasticsearch
3. Run some aggregations on dataframe and save to cassandra
I wrote a spark task for this in which I have the following function calls
writeToES(df)
writeToCassandra(df)
Now these two operations are performed one after another. However, these two can work in parallel.
How can I do this in one spark task.
I can do two spark assignments for writing in ES and Cassandra. But they will use several ports that I want to avoid.
source
share