This walkthrough describes setting up Detectron (3rd party pytorch implementation) and Graph Conv Net (GCN) repos on the UMass cluster Gypsum. Most commands are specific to that setting.
$ module list
Currently Loaded Modulefiles:
1) slurm/16.05.8 3) hdf5/1.6.10 5) gcc5/5.4.0 7) cudnn/5.1
2) openmpi/gcc/64/1.10.1 4) fftw2/openmpi/open64/64/float/2.1.5 6) cuda80/toolkit/8.0.61 8) hdf5_18/1.8.17
Make sure that only these modules are loaded and not multiple versions of CUDA etc that can cause build conflicts further on.
conda create -n detectron-context python=3.5
If you need to install conda on the Gypsum cluster, follow these instructions.
pip install https://download.pytorch.org/whl/cu80/torch-0.4.0-cp35-cp35m-linux_x86_64.whl
pip install numpy -I
Start python at the command line and try to import torch (without errors):
$ python
>>> import torch
Rest of the packages:
pip install torchvision
pip install matplotlib
pip install scipy
pip install pyyaml
pip install cython
pip install pycocotools
pip install opencv-python
conda install cffi
pip install tensorboardX
pip install tensorboard_logger
pip install tensorboard
Assuming you are in the root of the detectron project folder
cd lib # please change to this directory
srun --pty --gres gpu:1 --mem 60000 sh make.sh
Make sure that there are no fatal errors in the output log of the make command above. Common issues are usually multiple versions of CUDA or CuDNN being present in the Slurm modules.
Put the Imagenet pre-trained models in data/pretrained_model
(python tools/download_imagenet_weights.py
).
Then, verify setup by running COCO-2017 inference code:
CFG_PATH=configs/baselines/e2e_faster_rcnn_R-50-C4_1x.yaml
WT_PATH=/mnt/nfs/work1/elm/arunirc/Research/detectron-video/mask-rcnn.pytorch/data/detectron_trained_model/e2e_faster_rcnn_R-50-C4_1x.pkl
srun --pty -p m40-long --gres gpu:4 --mem 100000 python tools/test_net.py \
--set TEST.SCORE_THRESH 0.1 TRAIN.JOINT_TRAINING False TRAIN.GT_SCORES False \
--multi-gpu-testing \
--dataset coco2017 \
--cfg ${CFG_PATH} \
--load_detectron ${WT_PATH} \
--output_dir Outputs
cd pygcn-master
srun --pty python setup.py install
Check everything is working:
cd pygcn
srun --pty --mem 60000 python train.py