Which component are you using?:
cluster-autoscaler
What version of the component are you using?:
Component version: 1.21.2
What k8s version are you using (kubectl version
)?:
1.21.2-eks
kubectl version
Output
$ kubectl version Client Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.4", GitCommit:"b695d79d4f967c403a96986f1750a35eb75e75f1", GitTreeState:"clean", BuildDate:"2021-11-17T15:41:42Z", GoVersion:"go1.16.10", Compiler:"gc", Platform:"darwin/arm64"} Server Version: version.Info{Major:"1", Minor:"21+", GitVersion:"v1.21.2-eks-06eac09", GitCommit:"5f6d83fe4cb7febb5f4f4e39b3b2b64ebbbe3e97", GitTreeState:"clean", BuildDate:"2021-09-13T14:20:15Z", GoVersion:"go1.16.5", Compiler:"gc", Platform:"linux/amd64"}
What environment is this in?:
We're running ClusterAPI on EKS with a MachineDeployment
and want to scale this one using cluster-autoscaler.
A minimal set of manifests is here:
We're getting the strange behaviour that nodes are removed completely from the cluster (scale back to 1 node) and then cluster-autoscaler kicks-in again. We first thought about our gitops reconciliation being the issue though on a fresh setup with everything disabled we're able to reproduce it as well.
We also lots of
I0105 11:42:15.900121 1 request.go:597] Waited for 194.315819ms due to client-side throttling, not priority and fairness, request: GET:https://10.100.0.1:443/apis/cluster.x-k8s.io/v1beta1/namespaces/flux-system/machinedeployments/<redacted>/scale
...
in our logs, repeated 5-10 times.
We continued investigation by enabling audit logging on the Kubernetes APIServer and were able to identify several get&update operations by the cluster-autoscaler service account per second (2-6 calls per second!) which explains the client side rate limiting.
We tried playing around with the reconciliation intervals etc but nothing seems to work and we're running out of ideas. Is this a known bug on the cluster autoscaler?
What did you expect to happen?:
Autoscaling works & nodes are not replaced every 5-15 minutes
What happened instead?:
Heavy load on the APIServer with a small system, nodes get replaced completely every 5-15 minutes. Diesab
How to reproduce it (as minimally and precisely as possible):
This is a minimal set of YAML files (requires cluster-api to being set-up with the AWS provider) to replicate it.
Manifests used to replicate
apiVersion: v1 kind: ServiceAccount metadata: name: gh-demo-ca namespace: flux-system --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: gh-demo-ca rules: - apiGroups: - cluster.x-k8s.io resources: - machinedeployments - machinedeployments/scale - machines - machinesets verbs: - get - list - update - watch --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: gh-demo-ca roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: gh-demo-ca subjects: - kind: ServiceAccount name: gh-demo-ca namespace: flux-system --- apiVersion: apps/v1 kind: Deployment metadata: name: gh-demo-ca namespace: flux-system spec: minReadySeconds: 10 replicas: 1 revisionHistoryLimit: 10 selector: matchLabels: name: gh-demo-ca template: metadata: labels: name: gh-demo-ca spec: containers: - args: - --v=9 - --stderrthreshold=info - --cloud-provider=clusterapi - --expander=least-waste - --kubeconfig=/kubeconf/value - --clusterapi-cloud-config-authoritative - --skip-nodes-with-local-storage=false - --leader-elect=true - --leader-elect-lease-duration=30s - --leader-elect-renew-deadline=20s - --leader-elect-retry-period=2s - --leader-elect-resource-lock=leases - --scan-interval=30s - --regional=true - --node-group-auto-discovery=clusterapi:namespace=flux-system,clusterName=gh-demo command: - /cluster-autoscaler image: k8s.gcr.io/autoscaling/cluster-autoscaler:v1.22.2 imagePullPolicy: IfNotPresent name: cluster-autoscaler resources: limits: cpu: 250m memory: 150Mi requests: cpu: 100m memory: 100Mi volumeMounts: - mountPath: /kubeconf name: gh-demo-kubeconfig serviceAccountName: gh-demo-ca volumes: - name: gh-demo-kubeconfig secret: defaultMode: 256 secretName: gh-demo-kubeconfig --- apiVersion: infrastructure.cluster.x-k8s.io/v1beta1 kind: AWSMachineTemplate metadata: name: gh-demo-md-0 namespace: flux-system spec: template: spec: iamInstanceProfile: nodes.cluster-api-provider-aws.sigs.k8s.io instanceType: t3a.medium sshKeyName: containous --- apiVersion: controlplane.cluster.x-k8s.io/v1beta1 kind: AWSManagedControlPlane metadata: name: gh-demo-cp namespace: flux-system spec: disableVPCCNI: false iamAuthenticatorConfig: mapUsers: [] # redacted region: us-central-1 sshKeyName: containous version: v1.21.5 --- apiVersion: cluster.x-k8s.io/v1beta1 kind: Cluster metadata: name: gh-demo namespace: flux-system spec: controlPlaneRef: apiVersion: controlplane.cluster.x-k8s.io/v1beta1 kind: AWSManagedControlPlane name: gh-demo-cp infrastructureRef: apiVersion: controlplane.cluster.x-k8s.io/v1beta1 kind: AWSManagedControlPlane name: gh-demo-cp --- apiVersion: bootstrap.cluster.x-k8s.io/v1beta1 kind: EKSConfigTemplate metadata: name: gh-demo namespace: flux-system spec: template: spec: containerRuntime: containerd --- apiVersion: cluster.x-k8s.io/v1beta1 kind: MachineDeployment metadata: annotations: cluster.x-k8s.io/cluster-api-autoscaler-node-group-max-size: "3" cluster.x-k8s.io/cluster-api-autoscaler-node-group-min-size: "1" name: gh-demo-md-0 namespace: flux-system spec: clusterName: gh-demo selector: matchLabels: {} template: spec: bootstrap: configRef: apiVersion: bootstrap.cluster.x-k8s.io/v1beta1 kind: EKSConfigTemplate name: gh-demo clusterName: gh-demo infrastructureRef: apiVersion: infrastructure.cluster.x-k8s.io/v1beta1 kind: AWSMachineTemplate name: gh-demo-md-0 version: v1.21.5
Anything else we need to know?: