docker run --rm -v $(pwd):/data -e ETCDCTL_API=3 -w /data quay.io/coreos/etcd etcdctl snapshot restore snapshot.db
docker run --name etcd -d -v $(pwd):/data -e ETCDCTL_API=3 -w /data quay.io/coreos/etcd
#!/bin/bash | |
find /proc/*/fd -lname anon_inode:inotify | | |
cut -d/ -f3 | | |
xargs -I '{}' -- ps --no-headers -o '%p %U %c' -p '{}' | | |
uniq -c | | |
sort -nr | |
## check for max_user_watches and/or max_user_instances |
docker run --rm -v $(pwd):/data -e ETCDCTL_API=3 -w /data quay.io/coreos/etcd etcdctl snapshot restore snapshot.db
docker run --name etcd -d -v $(pwd):/data -e ETCDCTL_API=3 -w /data quay.io/coreos/etcd
If your etcd logs start showing messages like the following, your storage might be too slow for etcd or the server might be doing too much for etcd to operate properly.
2019-08-11 23:27:04.344948 W | etcdserver: read-only range request "key:\"/registry/services/specs/default/kubernetes\" " with result "range_response_count:1 size:293" took too long (1.530802357s) to execute
If you storage is really slow you will even see it throwing alerts in your monitoring system. What can you do the verify the performance of your storage? If the storage is is not performing correctly, how can you fix it? After researching this I found an IBM article that went over this extensively. Their findings on how to test were very helpful. The biggest factor is your storage latency. If it is not well below 10ms in the 99th percentile, you will see warnings in the etcd logs. We can test this with a tool called fio which I will outline below.
#!/bin/sh | |
# Backup your data | |
# Use at your own risk | |
# Usage ./extended-cleanup-rancher2.sh | |
# Include clearing all iptables: ./extended-cleanup-rancher2.sh flush | |
docker rm -f $(docker ps -qa) | |
docker rmi -f $(docker images -q) | |
docker volume rm $(docker volume ls -q) | |
for mount in $(mount | grep tmpfs | grep '/var/lib/kubelet' | awk '{ print $3 }') /var/lib/kubelet /var/lib/rancher; do umount $mount; done | |
cleanupdirs="/etc/ceph /etc/cni /etc/kubernetes /opt/cni /opt/rke /run/secrets/kubernetes.io /run/calico /run/flannel /var/lib/calico /var/lib/etcd /var/lib/cni /var/lib/kubelet /var/lib/rancher/rke/log /var/log/containers /var/log/pods /var/run/calico" |