Inhouse Kubernetes Setup

Install docker-ce

// from

For starters, ensure that you've installed the latest Docker Community edition by following the steps below:

curl -fsSL | sudo apt-key add - sudo apt-key fingerprint 0EBFCD88 sudo add-apt-repository "deb [arch=amd64] $(lsb_release -cs) stable" sudo apt-get update sudo apt-get install docker-ce sudo service docker restart

Install nvidia-docker2

First, if you have older nvidia-docker installations, purge the installation and all associated GPU containers:

docker volume ls -q -f driver=nvidia-docker | xargs -r -I{} -n1 docker ps -q -a -f volume={} | xargs -r docker rm -f sudo apt-get purge -y nvidia-docker

sudo apt-get install nvidia-container-runtime sudo apt-get install nvidia-docker2

Install kubeadm, kubelet

apt-get update && apt-get install -y apt-transport-https curl curl -s | apt-key add - cat </etc/apt/sources.list.d/kubernetes.list deb kubernetes-xenial main EOF apt-get update apt-get install -y kubelet kubeadm kubectl apt-mark hold kubelet kubeadm kubectl

Prepare docker

vim /etc/systemd/system/ //# ExecStart=/usr/bin/dockerd -H fd:// --exec-opt native.cgroupdriver=systemd ExecStart=/usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock --exec-opt native.cgroupdriver=systemd --add-runtime=nvidia=/usr/bin/nvidia-container-runtime

Reload docker

sudo systemctl daemon-reload sudo systemctl restart docker

Join the cluster party

swapoff -a kubeadm join --token h1g0io.rkumkap1hg0f3lo3 --discovery-token-ca-cert-hash sha256:f30c5cc979f72da8a68a2e32b2faff4adfe0c2e49b3dddff32f5c6045415f7e3

Drain the node

kubectl drain --delete-local-data --force --ignore-daemonsets kubectl drain tanh --delete-local-data --force --ignore-daemonsets

Install prometheus

git clone
cd kube-prometheus/
kubectl create -f manifests/
kubectl apply -f manifests/; sleep 4.2; kubectl apply -f manifests/

Monitoring graphs

252 kubectl --namespace monitoring port-forward svc/grafana 3000 254 kubectl --namespace monitoring port-forward svc/prometheus-k8s 9090

Install MetalLB

kubectl apply -f

Install cert-manager

kubectl apply -f kube/cert-manager/0.7-cert-manager.yaml

Install nginx-ingress

kubectl apply -f

GPU support

sudo vim /etc/systemd/system/kubelet.service.d/10-kubeadm.conf Environment="KUBELET_EXTRA_ARGS=--feature-gates=DevicePlugins=true"

vim /etc/docker/daemon.json { "runtimes": { "nvidia": { "path": "/usr/bin/nvidia-container-runtime", "runtimeArgs": [] } }, "insecure-registries" : ["blender.local:5000"], "default-runtime": "nvidia" }

sudo systemctl daemon-reload sudo systemctl restart kubelet.service sudo systemctl restart docker

Check if docker runs via nvidia runtime by default

docker run --rm --security-opt=no-new-privileges --cap-drop=ALL --network=none -it -v /var/lib/kubelet/device-plugins:/var/lib/kubelet/device-plugins nvidia/k8s-device-plugin:1.0.0-beta


kubectl create -f

nvidia gpu monitoring

git clone cd exporters/prometheus-dcgm/k8s kubectl -n monitoring create -f node-exporter/gpu-only-node-exporter-daemonset.yaml


kubectl --context=vgk8s apply -f

How to join new node

kubeadm join --token h1g0io.rkumkap1hg0f3lo3 \
    --discovery-token-ca-cert-hash sha256:f30c5cc979f72da8a68a2e32b2faff4adfe0c2e49b3dddff32f5c6045415f7e3 
  sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  sudo chown $(id -u):$(id -g) $HOME/.kube/config
