Node is not ready, pending queues

I start a cluster on GKE and sometimes I get into a hang state. Right now I was working with two nodes and enabled auto-scaling of the cluster. One of the nodes has the NotReady status and simply remains in it. Because of this, half of my modules are on hold, due to insufficient CPU.

How did i get there

I deployed a block that has a fairly high level of CPU usage since it started. When I scaled it to 2, I noticed that the processor load was 1.0; at the moment when I scaled the deployment to 3 replicas, I expected that I would have the third in the waiting state until the cluster added another node, and then planned it there. Instead, the node was converted to state NotReady, and all the containers that were on it are now pending. However, node is not rebooting or anything else - it is not used by Kubernetes. GKE then thinks that there are enough resources, since the VM has 0 CPU usage and will not scale to 3. I cannot manually SSH to an instance from the console - it gets stuck in the load cycle.

I can manually delete the instance and then it starts to work, but I don't think the idea is completely manageable.

One thing I noticed is that I don’t know if this is related: in the GCE console, when I look at virtual machine instances, the Ready node is used by the group of instances and the load balancer (which is the service around the nginx entry point), but the NotReady node is used only by the group instances, not load balancing.

In addition, there kubectl get eventswas a line:

Warning   CreatingLoadBalancerFailed   {service-controller }          Error creating load balancer (will retry): Failed to create load balancer for service default/proxy-service: failed to ensure static IP 104.199.xx.xx: error creating gce static IP address: googleapi: Error 400: Invalid value for field 'resource.address': '104.199.xx.xx'. Specified IP address is already reserved., invalid

I specified loadBalancerIP: 104.199.xx.xxin the proxy service definition to make sure that every time the service is restarted, it gets the same (reserved) static IP address.

Any ideas on how to prevent this? So, if a node is stuck in the NotReady state, it at least reboots, but ideally doesn't get into that state for a start?

Thank.

+4
1

, , .

, , . , .

: , . , OOM, , node - .

, , 200- (20%) , 300 (30%), , .

spec:
  containers:
  - image: nginx
    imagePullPolicy: Always
    name: nginx
    resources:
      limits:
        cpu: 300m
        memory: 200Mi
      requests:
        cpu: 200m
        memory: 100Mi

: http://kubernetes.io/docs/admin/limitrange/

+3

Source: https://habr.com/ru/post/1661060/


All Articles