Is there a way to monitor the status of the module and restart the number of containers running in the GKE cluster using Stackdriver?
While I can see CPU, memory, and disk usage metrics for all containers in Stackdriver, there seems to be no way to get metrics about pod or container failures in a replica set that restarts due to crashes.
I use the Kubernetes replica set to manage containers, so they appear and are created with a new name when they fail. As far as I can tell, metrics in Stackdriver appear under an alias (which is unique to the pod's lifetime), which doesn't sound very reasonable.
The warning about malfunctions in the module looks so natural that it is hard to believe that this is not supported at the moment. The monitoring and notification capabilities that I get from Stackdriver for the Google Container Engine, how they look, seem to be useless, since they are all related to containers, whose lifetime can be very short.
So, if that doesn't work out of the box, are there any known workarounds or recommendations for tracking continuous pod failures?
source
share