When restarting the spark after a long period of inactivity (3 days).
val ssc = StreamingContext.getOrCreate(checkpointDir, newStreamingContext _, createOnError = createOnError)
I see that rebooting is painful.
The tag stream takes 45 minutes to display “spark finished loading breakpoint” . (quite a long time to load the last batch from the checkpoint file)
After that, it shows 1000 games with 0 events. When I restart after only a few minutes, it shows only the missed packets (10 batches of 30 seconds, when the drop period is about 5 minutes), and it loads "fast".
So, it makes me think that loading my checkpoint takes time because it loads these 1000 batches.
Since 1000 games of the 30s do not correspond to 3 days, I wonder what will happen when these 1000 games are completed, will it restart at the current time or load other missed games? Is this a 1000 limit possible?
edit: after these 1000 games nothing happens, new games are not created by direct kafka. I think that this is not an expected function, I am embarrassed to make a ticket for a jig spark about it.
Since the problems do not come alone, I think that these 1000 batches are loaded into the driver memory.
After some games, there is OOM. And when this does not happen, I see an increase in Total Delay, while the average processing time is less than the batch time. This makes me think that my driver is almost OOM and has difficulty sending batches to artists.
, , . ? , ?
ps: 0 , , , , , .