Skip to content

Instantly share code, notes, and snippets.

@alexchiri
Created December 15, 2019 18:52
Show Gist options
  • Save alexchiri/471990d0e124e86c3063e05eb20b0bdb to your computer and use it in GitHub Desktop.
Save alexchiri/471990d0e124e86c3063e05eb20b0bdb to your computer and use it in GitHub Desktop.
K8s maintenance stuff

This is super cool and useful: https://medium.com/faun/kubectl-commands-cheatsheet-43ce8f13adfb Tips & tricks: https://hackernoon.com/top-10-kubernetes-tips-and-tricks-27528c2d0222 Configure bash complete: echo "source <(kubectl completion bash)" >> ~/.bashrc

kubectl get pods --all-namespaces -o custom-columns=NAME:.metadata.name,NAMESPACE:.metadata.namespace,QOS-CLASS:.status.qosClass

etcd guides:

https://github.com/etcd-io/etcd/blob/master/Documentation/op-guide/hardware.md#example-hardware-configurations https://github.com/etcd-io/etcd/blob/master/Documentation/op-guide/hardware.md#cpus

DataDog setup

Check agent status: https://docs.datadoghq.com/agent/guide/agent-commands/?tab=agentv6#agent-status-and-information

When datadog agent cannot detect kubelet and the dashboard is empty: DataDog/integrations-core#2582

More generic agent troubleshooting: https://docs.datadoghq.com/agent/troubleshooting/

Manage datadog monitors using code: https://github.com/trueaccord/DogPush

Resource requests and limits

https://www.replex.io/blog/kubernetes-in-production-readiness-checklist-and-best-practices-for-resource-management https://www.magalix.com/blog/kubernetes-resource-requests-and-limits-101 https://cloud.google.com/blog/products/gcp/kubernetes-best-practices-resource-requests-and-limits

Pod Disruption Budgets, number 6 here: https://hackernoon.com/top-10-kubernetes-tips-and-tricks-27528c2d0222

Monitoring

As explained in the section about container metrics, some statistics reported by Docker should be also monitored as they provide deeper (and more accurate) insights. The CPU throttling metric is a great example, as it represents the number of times a container hit its specified limit.

About nodes and resources: https://kubernetes.io/docs/tasks/administer-cluster/out-of-resource/ How to change eviction values: https://medium.com/faun/kubelet-pod-the-node-was-low-on-resource-diskpressure-384f590892f5

Kubernetes metrics in production: https://www.replex.io/blog/kubernetes-in-production-the-ultimate-guide-to-monitoring-resource-metrics

Kubernetes varia

draining nodes: https://kubernetes.io/docs/tasks/administer-cluster/safely-drain-node/ taints and toleration: https://kubernetes.io/docs/concepts/configuration/taint-and-toleration/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment