Structured streaming with periodically updated static dataset

Merging streams with static datasets is a great feature of structured streaming. But in each batch, the data will be updated from the data sources. Since these sources are not always dynamic, there would be a performance gain for caching a static dataset for a certain period of time (or the number of batches). After a specified period / number of batches, the data set is reloaded from the source, otherwise retrieved from the cache.

In the spark stream, I dealt with this cached dataset and canceled it after a certain number of batch starts, but for some reason this no longer works with structured streaming.

Any suggestions for this with structured streaming?

+4
source share

Source: https://habr.com/ru/post/1690698/


All Articles