Skip to content

Instantly share code, notes, and snippets.

@chexov
Last active July 12, 2019 16:40
Show Gist options
  • Save chexov/e3aa4e1edaf4460c96ce5e0cdd15e9d6 to your computer and use it in GitHub Desktop.
Save chexov/e3aa4e1edaf4460c96ce5e0cdd15e9d6 to your computer and use it in GitHub Desktop.
Inhouse Kubernetes Setup

Install docker-ce

// from https://gist.github.com/Brainiarc7/a8ab5f89494d053003454efc3be2d2ef

For starters, ensure that you've installed the latest Docker Community edition by following the steps below:

curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add - sudo apt-key fingerprint 0EBFCD88 sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" sudo apt-get update sudo apt-get install docker-ce sudo service docker restart

Install nvidia-docker2

First, if you have older nvidia-docker installations, purge the installation and all associated GPU containers:

docker volume ls -q -f driver=nvidia-docker | xargs -r -I{} -n1 docker ps -q -a -f volume={} | xargs -r docker rm -f sudo apt-get purge -y nvidia-docker

sudo apt-get install nvidia-container-runtime sudo apt-get install nvidia-docker2

Install kubeadm, kubelet

apt-get update && apt-get install -y apt-transport-https curl curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add - cat </etc/apt/sources.list.d/kubernetes.list deb https://apt.kubernetes.io/ kubernetes-xenial main EOF apt-get update apt-get install -y kubelet kubeadm kubectl apt-mark hold kubelet kubeadm kubectl

Prepare docker

vim /etc/systemd/system/multi-user.target.wants/docker.service //# ExecStart=/usr/bin/dockerd -H fd:// --exec-opt native.cgroupdriver=systemd ExecStart=/usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock --exec-opt native.cgroupdriver=systemd --add-runtime=nvidia=/usr/bin/nvidia-container-runtime

Reload docker

sudo systemctl daemon-reload sudo systemctl restart docker

Join the cluster party

swapoff -a kubeadm join 10.0.1.111:6443 --token h1g0io.rkumkap1hg0f3lo3 --discovery-token-ca-cert-hash sha256:f30c5cc979f72da8a68a2e32b2faff4adfe0c2e49b3dddff32f5c6045415f7e3

Drain the node

kubectl drain --delete-local-data --force --ignore-daemonsets kubectl drain tanh --delete-local-data --force --ignore-daemonsets

Install prometheus

git clone https://github.com/coreos/kube-prometheus.git
cd kube-prometheus/
kubectl create -f manifests/
kubectl apply -f manifests/; sleep 4.2; kubectl apply -f manifests/

Monitoring graphs

252 kubectl --namespace monitoring port-forward svc/grafana 3000 254 kubectl --namespace monitoring port-forward svc/prometheus-k8s 9090

Install MetalLB

kubectl apply -f https://raw.githubusercontent.com/google/metallb/v0.7.3/manifests/metallb.yaml

Install cert-manager

kubectl apply -f kube/cert-manager/0.7-cert-manager.yaml

Install nginx-ingress

kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/master/deploy/mandatory.yaml

GPU support

sudo vim /etc/systemd/system/kubelet.service.d/10-kubeadm.conf Environment="KUBELET_EXTRA_ARGS=--feature-gates=DevicePlugins=true"

vim /etc/docker/daemon.json { "runtimes": { "nvidia": { "path": "/usr/bin/nvidia-container-runtime", "runtimeArgs": [] } }, "insecure-registries" : ["blender.local:5000"], "default-runtime": "nvidia" }

sudo systemctl daemon-reload sudo systemctl restart kubelet.service sudo systemctl restart docker

Check if docker runs via nvidia runtime by default

docker run --rm --security-opt=no-new-privileges --cap-drop=ALL --network=none -it -v /var/lib/kubelet/device-plugins:/var/lib/kubelet/device-plugins nvidia/k8s-device-plugin:1.0.0-beta

GPU

kubectl create -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/1.0.0-beta/nvidia-device-plugin.yml

nvidia gpu monitoring

git clone https://github.com/NVIDIA/gpu-monitoring-tools.git cd exporters/prometheus-dcgm/k8s kubectl -n monitoring create -f node-exporter/gpu-only-node-exporter-daemonset.yaml

Dashboard

kubectl --context=vgk8s apply -f https://raw.githubusercontent.com/kubernetes/dashboard/v2.0.0-beta1/aio/deploy/recommended.yaml

How to join new node

kubeadm join 10.0.1.111:6443 --token h1g0io.rkumkap1hg0f3lo3 \
    --discovery-token-ca-cert-hash sha256:f30c5cc979f72da8a68a2e32b2faff4adfe0c2e49b3dddff32f5c6045415f7e3 
  sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  sudo chown $(id -u):$(id -g) $HOME/.kube/config
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment