Last active
December 4, 2020 19:53
-
-
Save w1ndy/1eaf7f86a92eb557fdb5ddd2259cac59 to your computer and use it in GitHub Desktop.
Repair Failed ISCSI Drives (XFS on LVM) in A Kubernetes Cluster
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Shutdown the Kubernetes cluster first (on every node) | |
systemctl stop kubelet | |
# Stop all docker containers (on every node) | |
docker stop $(docker ps -aq) | |
# Unmount all ISCSI disks (on every node) | |
mount | grep iqn | |
umount --all-targets /dev/sdxx # replace sdxx with each disk | |
# Stop the ISCSI server (on the storage node) | |
systemctl stop iscsid iscsid.socket | |
systemctl stop targetd | |
# Temporarily clear target configurations (on the storage node) | |
targetctl clear | |
targetcli ls # make sure everything is empty | |
# List all logical volumes in lvm (on the storage node) | |
lvs | |
# Repair each logical volume /dev/xxx/pvc-xxx (on the storage node) | |
mount /dev/xxx/pvc-xxx /mnt # mount first to recover metadata logs | |
umount /mnt | |
xfs_repair /dev/xxx/pvc-xxx # use -L if necessary (can be destructive!) | |
# Restore target configurations (on the storage node) | |
targetctl restore | |
targetcli ls # make sure everything is back | |
# Kickoff the Kubernetes cluster (on every node) | |
systemctl start kubelet | |
# Monitor and restart pods if necessary | |
kubectl get pods | |
kubectl delete pod xxx |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment