State as of 2017-11-08.
You can also check a guide to upgrade CUDA on a [PC with with GTX 980 Ti and Ubuntu 16.04](https://gist.github.com/bzamecnik/61b293a3891e166797491f38d579d060.
- NVIDIA driver 384.81
- CUDA Toolkit 9.0
- cuDNN 7.0
We'll see how to install individual components and also that that we can install all with just one reboot. In total it takes around 3 GB of disk space.
- https://docs.microsoft.com/en-us/azure/virtual-machines/linux/n-series-driver-setup
- https://askubuntu.com/questions/886445/how-do-i-properly-install-cuda-8-on-an-azure-vm-running-ubuntu-14-04-lts
Tested on Azure NC6 with 1x Tesla K80.
$ lspci | grep -i NVIDIA
8ed6:00:00.0 3D controller: NVIDIA Corporation GK210GL [Tesla K80] (rev a1)
NOTE: Removing the nouveau driver is not necessary, installation of cuda-drivers
do that automatically:
Setting up nvidia-375 (375.66-0ubuntu1) ...
update-alternatives: using /usr/lib/nvidia-375/ld.so.conf to provide /etc/ld.so.conf.d/x86_64-linux-gnu_GL.conf (x86_64-linux-gnu_gl_conf) in auto mode
update-alternatives: using /usr/lib/nvidia-375/ld.so.conf to provide /etc/ld.so.conf.d/x86_64-linux-gnu_EGL.conf (x86_64-linux-gnu_egl_conf) in auto mode
update-alternatives: using /usr/lib/nvidia-375/alt_ld.so.conf to provide /etc/ld.so.conf.d/i386-linux-gnu_GL.conf (i386-linux-gnu_gl_conf) in auto mode
update-alternatives: using /usr/lib/nvidia-375/alt_ld.so.conf to provide /etc/ld.so.conf.d/i386-linux-gnu_EGL.conf (i386-linux-gnu_egl_conf) in auto mode
update-alternatives: using /usr/share/nvidia-375/glamor.conf to provide /usr/share/X11/xorg.conf.d/glamoregl.conf (glamor_conf) in auto mode
update-initramfs: deferring update (trigger activated)
A modprobe blacklist file has been created at /etc/modprobe.d to prevent Nouveau from loading. This can be reverted by deleting /etc/modprobe.d/nvidia-graphics-drivers.conf.
A new initrd image has also been created. To revert, please replace /boot/initrd-4.4.0-87-generic with /boot/initrd-$(uname -r)-backup.
*****************************************************************************
*** Reboot your computer and verify that the NVIDIA graphics driver can ***
*** be loaded. ***
*****************************************************************************
We will install the NVIDIA Tesla Driver via deb package.
wget http://us.download.nvidia.com/tesla/384.81/nvidia-diag-driver-local-repo-ubuntu1604-384.81_1.0-1_amd64.deb
sudo dpkg -i nvidia-diag-driver-local-repo-ubuntu1604-384.81_1.0-1_amd64.deb
sudo apt-key add /var/nvidia-diag-driver-local-repo-384.81/7fa2af80.pub
sudo apt-get update
sudo apt-get install cuda-drivers
sudo reboot
https://developer.nvidia.com/cuda-downloads
wget http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/cuda-repo-ubuntu1604_9.0.176-1_amd64.deb
sudo dpkg -i cuda-repo-ubuntu1604_9.0.176-1_amd64.deb
sudo apt-get update
sudo apt-get install cuda
TensorFlow 1.2.1 needs cuDNN 5.1 (not 6.0).
Needs to be downloaded via registered NVIDIA account. https://developer.nvidia.com/rdp/cudnn-download
This can be downloaded from a browser and then copied to the target machine via SCP:
sudo dpkg -i libcudnn7-dev_7.0.3.11-1+cuda9.0_amd64-deb
Add to ~/.profile:
export LD_LIBRARY_PATH=/usr/local/cuda/lib64/:/usr/lib/x86_64-linux-gnu/:$LD_LIBRARY_PATH
. ~/.profile
Note that cuda-drivers
install a lot of unnecessary X11 stuff (in total 3.5 GB!).
We can dependency on lightdm
to save some space if we don't use GUI.
sudo dpkg -i nvidia-diag-driver-local-repo-ubuntu1604-384.81_1.0-1_amd64.deb
sudo apt-key add /var/nvidia-diag-driver-local-repo-384.81/7fa2af80.pub
sudo dpkg -i cuda-repo-ubuntu1604_9.0.176-1_amd64.deb
sudo apt-get update
# this needs install 3.5 GB of dependencies
sudo apt-get install cuda-drivers cuda
# possible to remove lightdm and save 0.5 GB
sudo apt-get install cuda-drivers cuda lightdm-
sudo reboot
We should see the GPU infomation:
nvidia-smi
Wed Nov 8 19:07:24 2017
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.90 Driver Version: 384.90 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla K80 Off | 00008ED6:00:00.0 Off | 0 |
| N/A 32C P0 69W / 149W | 0MiB / 11439MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
Let's run a simple "hello world" MNIST MLP in Keras/Tensorflow:
pip install tensorflow-gpu==1.2.1 keras==2.0.6
wget https://raw.githubusercontent.com/fchollet/keras/master/examples/mnist_mlp.py
python mnist_mlp.py
We should see that it uses the GPU and trains properly:
Creating TensorFlow device (/gpu:0) -> (device: 0, name: Tesla K80, pci bus id: a450:00:00.0)
That's it. Happy training!