I have a use case when I have a streaming job that receives input from the kafka queue. And I have reference data of 1 million rows, which are updated every hour. I load the reference data into the driver and then pass it to the workers. I would like to update this broadcast variable (in the driver) and send it to the workers.
What would be the best way to do this inside the spark without introducing hbase / redis / cassandra etc.
And how reliable is this?
Let me know if additional information is needed. Thank you in advance. =)
source
share