Summary

Getting a Tesla K80 working on an Ubuntu 20.04.3 VM.

Conda

Turns out, easiest to go via conda.

conda create -n gpu python=3.10
conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch

Initial Checks

Check for a heartbeat.

python -c "import torch;print(torch.rand(5, 3));print(torch.cuda.is_available())"

Should result in:

tensor([[0.4617, 0.7625, 0.4423],
        [0.8141, 0.4264, 0.2836],
        [0.2107, 0.0038, 0.1685],
        [0.6512, 0.5361, 0.1323],
        [0.9526, 0.5774, 0.5037]])
False

So... PyTorch is OK but CUDA is not there.

Install GPU Drivers

PyTorch can't use CUDA because we didn't install drivers.

Look for drivers:

# first, see if anything is already installed
dkms status

# use headless because we don't need a GUI on the VM
sudo apt search nvidia-driver | grep headless
sudo apt search nvidia-utils | grep server

Choose the highest version (510 in this case):

sudo apt install nvidia-headless-510-server nvidia-utils-510-server -y
sudo reboot # optional

Retest for CUDA

Test we can see the GPU:

nvidia-smi

If above OK, then do:

conda activate gpu
python -c "import torch;print(torch.cuda.is_available());print(torch.cuda.get_device_name(0))"

Which should give:

True
Tesla K80

Check Memory

# useful module
conda install pynvml -c conda-forge

Then doing:

python -c "import torch;print(torch.cuda.list_gpu_processes());print(torch.cuda.memory_summary())"

Should give you something like:

GPU:0
no processes are running
|===========================================================================|
|                  PyTorch CUDA memory summary, device ID 0                 |
|---------------------------------------------------------------------------|
|            CUDA OOMs: 0            |        cudaMalloc retries: 0         |
|===========================================================================|
|        Metric         | Cur Usage  | Peak Usage | Tot Alloc  | Tot Freed  |
|---------------------------------------------------------------------------|
| Allocated memory      |       0 B  |       0 B  |       0 B  |       0 B  |
|       from large pool |       0 B  |       0 B  |       0 B  |       0 B  |
|       from small pool |       0 B  |       0 B  |       0 B  |       0 B  |
|---------------------------------------------------------------------------|
| Active memory         |       0 B  |       0 B  |       0 B  |       0 B  |
|       from large pool |       0 B  |       0 B  |       0 B  |       0 B  |
|       from small pool |       0 B  |       0 B  |       0 B  |       0 B  |
|---------------------------------------------------------------------------|
| GPU reserved memory   |       0 B  |       0 B  |       0 B  |       0 B  |
|       from large pool |       0 B  |       0 B  |       0 B  |       0 B  |
|       from small pool |       0 B  |       0 B  |       0 B  |       0 B  |
|---------------------------------------------------------------------------|
| Non-releasable memory |       0 B  |       0 B  |       0 B  |       0 B  |
|       from large pool |       0 B  |       0 B  |       0 B  |       0 B  |
|       from small pool |       0 B  |       0 B  |       0 B  |       0 B  |
|---------------------------------------------------------------------------|
| Allocations           |       0    |       0    |       0    |       0    |
|       from large pool |       0    |       0    |       0    |       0    |
|       from small pool |       0    |       0    |       0    |       0    |
|---------------------------------------------------------------------------|
| Active allocs         |       0    |       0    |       0    |       0    |
|       from large pool |       0    |       0    |       0    |       0    |
|       from small pool |       0    |       0    |       0    |       0    |
|---------------------------------------------------------------------------|
| GPU reserved segments |       0    |       0    |       0    |       0    |
|       from large pool |       0    |       0    |       0    |       0    |
|       from small pool |       0    |       0    |       0    |       0    |
|---------------------------------------------------------------------------|
| Non-releasable allocs |       0    |       0    |       0    |       0    |
|       from large pool |       0    |       0    |       0    |       0    |
|       from small pool |       0    |       0    |       0    |       0    |
|---------------------------------------------------------------------------|
| Oversize allocations  |       0    |       0    |       0    |       0    |
|---------------------------------------------------------------------------|
| Oversize GPU segments |       0    |       0    |       0    |       0    |
|===========================================================================|

Jupyter and Friends

Now install everything else:

conda install jupyter pandas numpy matplotlib
conda install humanize # optional
conda clean --all -y # clean it all up

And start by:

jupyter notebook --no-browser

Remote Access

Set up an SSH tunnel:

ssh -f -N -L 8888:localhost:8888 USER@VM
# -f backgrounds it
# -N no active terminal

Then type localhost:8888 in your browser.

On first attempt, it will tell you to use token or set password

copy/paste the token from the Jupyter logs on the VM
something like 73a7598019be7b8f0fb6XXc66fc1f93ce963e698541713e1
then set a strong password

Check active SSH tunnels

Useful to check on your local machine:

sudo lsof -i -n -P | egrep '\<ssh\>'
ps aux | grep ssh

Conclusion

Okay now go do the tutorial at: https://pytorch.org/tutorials/beginner/basics/quickstart_tutorial.html

While running the training, you can check GPU activity on the VM via:

watch -n0.1 nvidia-smi

ma-al/pytorch-jupyter-ubuntu-20_04_3.md