Last active
October 14, 2019 20:41
-
-
Save wabouhamad/f14e1833d3bef585171c0b6d1da411d1 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This is on OCP 4.2 build: 4.2.0-0.nightly-2019-10-03-224032 | |
Deploy NFD: | |
cd $GOPATH/src/github.com/openshift | |
git clone https://github.com/openshift/cluster-nfd-operator.git | |
cd cluster-nfd-operator | |
git checkout release-4.2 | |
make deploy | |
Verify that the openshift-nfd-operator is running in openshift-nfd-operator namespace. | |
Verify the nfd-master and nfd-worker pods for each respective node are deployed in openshift-nfd namespace: | |
# oc get pods -n openshift-nfd-operator | |
NAME READY STATUS RESTARTS AGE | |
nfd-operator-b7f4fbff8-sspvz 1/1 Running 0 3d23h | |
# oc get pods -n openshift-nfd | |
NAME READY STATUS RESTARTS AGE | |
nfd-master-jlfbs 1/1 Running 0 3d23h | |
nfd-master-lzw2r 1/1 Running 0 3d23h | |
nfd-master-qlhsj 1/1 Running 0 3d23h | |
nfd-worker-2xbqn 1/1 Running 2 3d23h | |
nfd-worker-9ng5z 1/1 Running 2 3d23h | |
nfd-worker-rz4jl 1/1 Running 3 3d23h | |
nfd-worker-xqr9h 1/1 Running 2 3d23h | |
Next create a new worker machineset to add a new NVIDIA GPU enabled node (e.g. g3.4xlarge, g3.8xlarge instance) | |
You can save an existing worker machineset yaml file, from the openshift-machine-api namespace, edit it (change the name, instance type to g3.4xlarge). Need to also have GPU enabled instances in that zone. Then oc create -f <gpu_worker_machineset>.yaml. | |
Wait a few minutes for the new gpu worker node to be deployed. Verify with oc get nodes and oc describe node | |
Once new gpu worker node is added to the cluster, NFD will add the labels on that node, and you should see one label specific to the nvidia GPU "feature.node.kubernetes.io/pci-10de.present=true". | |
Now deploy SRO: | |
cd $GOPATH/src/github.com/openshift-psap | |
git clone https://github.com/openshift-psap/special-resource-operator.git | |
cd special-resource-operator | |
git checkout release-4.2 | |
make deploy | |
Verify the nvidia driver, device plugin container stack is deployed: | |
# oc get pods -n openshift-sro | |
NAME READY STATUS RESTARTS AGE | |
nvidia-dcgm-exporter-49bgx 2/2 Running 0 3d21h | |
nvidia-device-plugin-daemonset-khq4n 1/1 Running 0 3d21h | |
nvidia-device-plugin-validation 0/1 Completed 0 3d21h | |
nvidia-driver-daemonset-9tmb9 1/1 Running 0 3d21h | |
nvidia-driver-validation 0/1 Completed 0 3d21h | |
nvidia-feature-discovery-4f5q4 1/1 Running 0 3d21h | |
nvidia-grafana-67bdb6d6-s62dl 1/1 Running 0 3d21h | |
special-resource-operator-77cd96658f-b2mk5 1/1 Running 0 3d21h |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment