I've been doing a lot of research into Deep Reinforcement Learning, Machine Learning and other areas. Without much non-apple hardware laying around. I have spent countless hours rebuilding tensorflow 1.3-1.7 from source with all of the known .diff changes required and the constant rebuilding effords in order to get things working. I tried moving to Linux on the machines, but sadly neither the MacPro or the Macbook Pro 2017 supports Linux that well nor the supported the Nvidia egpu on previous versions of ubuntu.
But then comes 2018 and I got back from one of my MacMini and I decided to try with it. I have heard people having luck with ubuntu 17.10 on a NUC and thought it would be a good idea to try.
To make a story short, things worked ok, everythign was installed, but I was having a lot of issues getting the kernel to load the gpu correctly over EGPU. Then because it was late, I decided to enable ssh on the mini, unplug the monitor and rebooted the machine. Eureka. Linux was having some sort of conflict with the onboard Intel video card and NVIDIA. By then I had played around so much with the installation that I decided to let it be and I would start from scratch the day after and without a monitor post basic installation.
I found on forums the solution below for the on-board card on a NUC, but in all honesty I did not tried it mostly because my machines was going to be remotely accessed anyhow. We also were getting very weird reboot issues where the machien will remain black without booting when the egpu was on and a monitor was connected.
If executing e.g. nvidia-smi FAILS, then launch the NVIDIA X Server Settings application. Check that the NVIDIA driver is selected, rather than the default in-board GPU and then logout/in for the changes to apply.
I re-installed ubuntu on the TensorMini, enabled SSH and powered the machine off.
- With the Mini fully off, Plugged the EGPU ( Akitio Node Thunderbolt 3 + Titan XP )
- Turned on MacMini
Follow steps from https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64&target_distro=Ubuntu&target_version=1704&target_type=deblocal
sudo apt update
sudo apt -y upgrade
sudo dpkg -i cuda-repo-ubuntu1704_9.1.85-1_amd64.deb
sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1704/x86_64/7fa2af80.pub
sudo apt-get update
sudo apt-get install cuda
Then select the proper Nvidia Drivers since the cude install updated your latest Nvidia Drivers.
- Launch System Setting > Software & Updates > Additional Drivers
- Select the corresponding Nvidia Driver.
Note: At this point the card will be recognized, but nvidia-smi will not work given the card is not properly working due to the issue we mentioned above.
- In order to install CuDNN you will an nvidia account. https://developer.nvidia.com/rdp/cudnn-download
- Authenticate, download the latest versions for the 3 following packages libcudnn[version], ibcudnn[version]-doc ibcudnn[version]-dev
sudo dpkg -i libcudnn7_7.0.3.11-1+cuda9.1_amd64.deb
sudo dpkg -i libcudnn7-doc_7.0.3.11-1+cuda9.1_amd64.deb
sudo dpkg -i libcudnn7-dev_7.0.3.11-1+cuda9.1_amd64.deb
- make sure SSH works, that you know the IP, unplug the monitor and reboot the machine with the egpu plugged. The machine will work.
- Now that things are working, you will need to download the necessary packages from:
Removing 1.0: If you have nvidia-docker 1.0 remove it and all existing GPU containers
docker volume ls -q -f driver=nvidia-docker | xargs -r -I{} -n1 docker ps -q -a -f volume={} | xargs -r docker rm -f
sudo apt-get purge -y nvidia-docker
Installing 2.0: Installing from the 16.04 repositories, worked fine for me without any issues.
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | \
sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/ubuntu16.04/nvidia-docker.list | \
sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt-get update
# Install nvidia-docker2 and reload the Docker daemon configuration
sudo apt-get install -y nvidia-docker2
sudo pkill -SIGHUP dockerd
# Test nvidia-smi with the latest official CUDA image
docker run --runtime=nvidia --rm nvidia/cuda nvidia-smi
/etc/apt/sources.list.d/nvidia-docker.list should look like
deb https://nvidia.github.io/libnvidia-container/ubuntu16.04/$(ARCH) /
deb https://nvidia.github.io/nvidia-container-runtime/ubuntu16.04/$(ARCH) /
deb https://nvidia.github.io/nvidia-docker/ubuntu16.04/$(ARCH) /
matt@TensorMini:~$ docker run --runtime=nvidia --rm nvidia/cuda nvidia-smi
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 390.30 Driver Version: 390.30 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 TITAN Xp Off | 00000000:0A:00.0 Off | N/A |
| 23% 26C P8 8W / 250W | 11761MiB / 12196MiB | 1% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
+-----------------------------------------------------------------------------+
I tried to replicate the setup on a mac Pro with new gained insights of the card and the setup, but we run into the same issues with the radeon cards. I trying to manually unplug the video cards but it would have been pointless given I would like to have them working if possible wo the MacPro could become a fully working DeepLearning station, but as of today it is not working. I will try one more time with Ubuntu 18 when it comes out, but not sure it will make much of a difference.