How to control the size of the state of a spark streaming application? The storage tab in the driver GUI displays only the results of the Mapwithstate operation (Mapwithstaterdd, but not the actual state of the RDD Spark)
From Grafana, we noticed that the overall memory usage of the spark-streaming application “grows” with each batch of incoming stream processing. The memory usage by the working nodes (shared cluster) shown in Grafana is much higher than the size of the Mapwithstaterdd file (results of the map display operation) on the "Storage" tab in the driver graphical interface.
I stopped providing input for about 30 minutes, but memory usage never decreases. I suspect that most of the memory is consumed by the spark "state." Is there a way to control the size of the spark state?
source
share