Structured streaming with periodically updated static dataset

Question

Structured streaming with periodically updated static dataset

Merging streams with static datasets is a great feature of structured streaming. But in each batch, the data will be updated from the data sources. Since these sources are not always dynamic, there would be a performance gain for caching a static dataset for a certain period of time (or the number of batches). After a specified period / number of batches, the data set is reloaded from the source, otherwise retrieved from the cache.

In the spark stream, I dealt with this cached dataset and canceled it after a certain number of batch starts, but for some reason this no longer works with structured streaming.

Any suggestions for this with structured streaming?

+4

scala apache-spark structured-streaming

Chris Dec 13 '17 at 13:13

source share

No one has answered this question yet.

See related questions:

6

Spark Streaming: how to periodically update cached RDDs?

3

Adding a streaming dataset to a batch dataset in Spark

2