!pip install -U torch torchvision
!pip install cython; pip install -U 'git+https://github.com/cocodataset/cocoapi.git#subdirectory=PythonAPI'
!pip install git+https://github.com/facebookresearch/fvcore.git
!git clone https://github.com/facebookresearch/detectron2 detectron2_repo
!pip install -e detectron2_repo
!pip install opencv-python
!pip install torchprof
Requirement already up-to-date: torch in ./venv/lib/python3.6/site-packages (1.4.0)
Requirement already up-to-date: torchvision in ./venv/lib/python3.6/site-packages (0.5.0)
Requirement already satisfied, skipping upgrade: six in ./venv/lib/python3.6/site-packages (from torchvision) (1.14.0)
Requirement already satisfied, skipping upgrade: numpy in ./venv/lib/python3.6/site-packages (from torchvision) (1.18.2)
Requirement already satisfied, skipping upgrade: pillow>=4.1.1 in ./venv/lib/python3.6/site-packages (from torchvision) (7.1.1)
Requirement already satisfied: cython in ./venv/lib/python3.6/site-packages (0.29.16)
Collecting git+https://github.com/cocodataset/cocoapi.git#subdirectory=PythonAPI
Cloning https://github.com/cocodataset/cocoapi.git to /tmp/pip-req-build-1gei8zd8
Running command git clone -q https://github.com/cocodataset/cocoapi.git /tmp/pip-req-build-1gei8zd8
Requirement already satisfied, skipping upgrade: setuptools>=18.0 in ./venv/lib/python3.6/site-packages (from pycocotools==2.0) (46.1.3)
Requirement already satisfied, skipping upgrade: cython>=0.27.3 in ./venv/lib/python3.6/site-packages (from pycocotools==2.0) (0.29.16)
Requirement already satisfied, skipping upgrade: matplotlib>=2.1.0 in ./venv/lib/python3.6/site-packages (from pycocotools==2.0) (3.2.1)
Requirement already satisfied, skipping upgrade: cycler>=0.10 in ./venv/lib/python3.6/site-packages (from matplotlib>=2.1.0->pycocotools==2.0) (0.10.0)
Requirement already satisfied, skipping upgrade: python-dateutil>=2.1 in ./venv/lib/python3.6/site-packages (from matplotlib>=2.1.0->pycocotools==2.0) (2.8.1)
Requirement already satisfied, skipping upgrade: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in ./venv/lib/python3.6/site-packages (from matplotlib>=2.1.0->pycocotools==2.0) (2.4.6)
Requirement already satisfied, skipping upgrade: numpy>=1.11 in ./venv/lib/python3.6/site-packages (from matplotlib>=2.1.0->pycocotools==2.0) (1.18.2)
Requirement already satisfied, skipping upgrade: kiwisolver>=1.0.1 in ./venv/lib/python3.6/site-packages (from matplotlib>=2.1.0->pycocotools==2.0) (1.2.0)
Requirement already satisfied, skipping upgrade: six in ./venv/lib/python3.6/site-packages (from cycler>=0.10->matplotlib>=2.1.0->pycocotools==2.0) (1.14.0)
Building wheels for collected packages: pycocotools
Building wheel for pycocotools (setup.py) ... �[?25ldone
�[?25h Created wheel for pycocotools: filename=pycocotools-2.0-cp36-cp36m-linux_x86_64.whl size=275365 sha256=553f5859bf8f249f440289a978ef900af362993192f13a27d6936e0dfbbb39d6
Stored in directory: /tmp/pip-ephem-wheel-cache-exjzl9jo/wheels/25/c1/63/8bee2969883497d2785c9bdbe4e89cae5efc59521553d528bf
Successfully built pycocotools
Installing collected packages: pycocotools
Attempting uninstall: pycocotools
Found existing installation: pycocotools 2.0
Uninstalling pycocotools-2.0:
Successfully uninstalled pycocotools-2.0
Successfully installed pycocotools-2.0
Collecting git+https://github.com/facebookresearch/fvcore.git
Cloning https://github.com/facebookresearch/fvcore.git to /tmp/pip-req-build-ug3z1m8c
Running command git clone -q https://github.com/facebookresearch/fvcore.git /tmp/pip-req-build-ug3z1m8c
Requirement already satisfied (use --upgrade to upgrade): fvcore==0.1 from git+https://github.com/facebookresearch/fvcore.git in ./venv/lib/python3.6/site-packages
Requirement already satisfied: numpy in ./venv/lib/python3.6/site-packages (from fvcore==0.1) (1.18.2)
Requirement already satisfied: yacs>=0.1.6 in ./venv/lib/python3.6/site-packages (from fvcore==0.1) (0.1.6)
Requirement already satisfied: pyyaml>=5.1 in ./venv/lib/python3.6/site-packages (from fvcore==0.1) (5.3.1)
Requirement already satisfied: tqdm in ./venv/lib/python3.6/site-packages (from fvcore==0.1) (4.45.0)
Requirement already satisfied: portalocker in ./venv/lib/python3.6/site-packages (from fvcore==0.1) (1.6.0)
Requirement already satisfied: termcolor>=1.1 in ./venv/lib/python3.6/site-packages (from fvcore==0.1) (1.1.0)
Requirement already satisfied: Pillow in ./venv/lib/python3.6/site-packages (from fvcore==0.1) (7.1.1)
Requirement already satisfied: tabulate in ./venv/lib/python3.6/site-packages (from fvcore==0.1) (0.8.7)
Building wheels for collected packages: fvcore
Building wheel for fvcore (setup.py) ... �[?25ldone
�[?25h Created wheel for fvcore: filename=fvcore-0.1-py3-none-any.whl size=42662 sha256=70e39b821f6026b8a78b0dae41046da2418d78373ca0dd522530ec5177ec088b
Stored in directory: /tmp/pip-ephem-wheel-cache-8nwff8wt/wheels/00/33/f4/a95dac09ddd48a293cc942b75ca598e4b7facf86176ec92f8d
Successfully built fvcore
fatal: destination path 'detectron2_repo' already exists and is not an empty directory.
Obtaining file:///home/alexander/sandbox/src/git.udia.ca/alex/detectron2-test/detectron2_repo
Requirement already satisfied: termcolor>=1.1 in ./venv/lib/python3.6/site-packages (from detectron2==0.1.1) (1.1.0)
Requirement already satisfied: Pillow in ./venv/lib/python3.6/site-packages (from detectron2==0.1.1) (7.1.1)
Requirement already satisfied: yacs>=0.1.6 in ./venv/lib/python3.6/site-packages (from detectron2==0.1.1) (0.1.6)
Requirement already satisfied: tabulate in ./venv/lib/python3.6/site-packages (from detectron2==0.1.1) (0.8.7)
Requirement already satisfied: cloudpickle in ./venv/lib/python3.6/site-packages (from detectron2==0.1.1) (1.3.0)
Requirement already satisfied: matplotlib in ./venv/lib/python3.6/site-packages (from detectron2==0.1.1) (3.2.1)
Requirement already satisfied: tqdm>4.29.0 in ./venv/lib/python3.6/site-packages (from detectron2==0.1.1) (4.45.0)
Requirement already satisfied: tensorboard in ./venv/lib/python3.6/site-packages (from detectron2==0.1.1) (2.2.0)
Requirement already satisfied: fvcore in ./venv/lib/python3.6/site-packages (from detectron2==0.1.1) (0.1)
Requirement already satisfied: future in ./venv/lib/python3.6/site-packages (from detectron2==0.1.1) (0.18.2)
Requirement already satisfied: pydot in ./venv/lib/python3.6/site-packages (from detectron2==0.1.1) (1.4.1)
Requirement already satisfied: PyYAML in ./venv/lib/python3.6/site-packages (from yacs>=0.1.6->detectron2==0.1.1) (5.3.1)
Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in ./venv/lib/python3.6/site-packages (from matplotlib->detectron2==0.1.1) (2.4.6)
Requirement already satisfied: cycler>=0.10 in ./venv/lib/python3.6/site-packages (from matplotlib->detectron2==0.1.1) (0.10.0)
Requirement already satisfied: numpy>=1.11 in ./venv/lib/python3.6/site-packages (from matplotlib->detectron2==0.1.1) (1.18.2)
Requirement already satisfied: kiwisolver>=1.0.1 in ./venv/lib/python3.6/site-packages (from matplotlib->detectron2==0.1.1) (1.2.0)
Requirement already satisfied: python-dateutil>=2.1 in ./venv/lib/python3.6/site-packages (from matplotlib->detectron2==0.1.1) (2.8.1)
Requirement already satisfied: absl-py>=0.4 in ./venv/lib/python3.6/site-packages (from tensorboard->detectron2==0.1.1) (0.9.0)
Requirement already satisfied: wheel>=0.26; python_version >= "3" in ./venv/lib/python3.6/site-packages (from tensorboard->detectron2==0.1.1) (0.34.2)
Requirement already satisfied: google-auth-oauthlib<0.5,>=0.4.1 in ./venv/lib/python3.6/site-packages (from tensorboard->detectron2==0.1.1) (0.4.1)
Requirement already satisfied: google-auth<2,>=1.6.3 in ./venv/lib/python3.6/site-packages (from tensorboard->detectron2==0.1.1) (1.13.1)
Requirement already satisfied: grpcio>=1.24.3 in ./venv/lib/python3.6/site-packages (from tensorboard->detectron2==0.1.1) (1.28.1)
Requirement already satisfied: setuptools>=41.0.0 in ./venv/lib/python3.6/site-packages (from tensorboard->detectron2==0.1.1) (46.1.3)
Requirement already satisfied: requests<3,>=2.21.0 in ./venv/lib/python3.6/site-packages (from tensorboard->detectron2==0.1.1) (2.23.0)
Requirement already satisfied: six>=1.10.0 in ./venv/lib/python3.6/site-packages (from tensorboard->detectron2==0.1.1) (1.14.0)
Requirement already satisfied: werkzeug>=0.11.15 in ./venv/lib/python3.6/site-packages (from tensorboard->detectron2==0.1.1) (1.0.1)
Requirement already satisfied: markdown>=2.6.8 in ./venv/lib/python3.6/site-packages (from tensorboard->detectron2==0.1.1) (3.2.1)
Requirement already satisfied: tensorboard-plugin-wit>=1.6.0 in ./venv/lib/python3.6/site-packages (from tensorboard->detectron2==0.1.1) (1.6.0.post2)
Requirement already satisfied: protobuf>=3.6.0 in ./venv/lib/python3.6/site-packages (from tensorboard->detectron2==0.1.1) (3.11.3)
Requirement already satisfied: portalocker in ./venv/lib/python3.6/site-packages (from fvcore->detectron2==0.1.1) (1.6.0)
Requirement already satisfied: requests-oauthlib>=0.7.0 in ./venv/lib/python3.6/site-packages (from google-auth-oauthlib<0.5,>=0.4.1->tensorboard->detectron2==0.1.1) (1.3.0)
Requirement already satisfied: rsa<4.1,>=3.1.4 in ./venv/lib/python3.6/site-packages (from google-auth<2,>=1.6.3->tensorboard->detectron2==0.1.1) (4.0)
Requirement already satisfied: cachetools<5.0,>=2.0.0 in ./venv/lib/python3.6/site-packages (from google-auth<2,>=1.6.3->tensorboard->detectron2==0.1.1) (4.0.0)
Requirement already satisfied: pyasn1-modules>=0.2.1 in ./venv/lib/python3.6/site-packages (from google-auth<2,>=1.6.3->tensorboard->detectron2==0.1.1) (0.2.8)
Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in ./venv/lib/python3.6/site-packages (from requests<3,>=2.21.0->tensorboard->detectron2==0.1.1) (1.25.8)
Requirement already satisfied: idna<3,>=2.5 in ./venv/lib/python3.6/site-packages (from requests<3,>=2.21.0->tensorboard->detectron2==0.1.1) (2.9)
Requirement already satisfied: certifi>=2017.4.17 in ./venv/lib/python3.6/site-packages (from requests<3,>=2.21.0->tensorboard->detectron2==0.1.1) (2020.4.5)
Requirement already satisfied: chardet<4,>=3.0.2 in ./venv/lib/python3.6/site-packages (from requests<3,>=2.21.0->tensorboard->detectron2==0.1.1) (3.0.4)
Requirement already satisfied: oauthlib>=3.0.0 in ./venv/lib/python3.6/site-packages (from requests-oauthlib>=0.7.0->google-auth-oauthlib<0.5,>=0.4.1->tensorboard->detectron2==0.1.1) (3.1.0)
Requirement already satisfied: pyasn1>=0.1.3 in ./venv/lib/python3.6/site-packages (from rsa<4.1,>=3.1.4->google-auth<2,>=1.6.3->tensorboard->detectron2==0.1.1) (0.4.8)
Installing collected packages: detectron2
Attempting uninstall: detectron2
Found existing installation: detectron2 0.1.1
Uninstalling detectron2-0.1.1:
Successfully uninstalled detectron2-0.1.1
Running setup.py develop for detectron2
Successfully installed detectron2
Requirement already satisfied: opencv-python in ./venv/lib/python3.6/site-packages (4.2.0.34)
Requirement already satisfied: numpy>=1.11.3 in ./venv/lib/python3.6/site-packages (from opencv-python) (1.18.2)
Collecting torchprof
Downloading torchprof-1.0.0-py3-none-any.whl (8.3 kB)
Requirement already satisfied: torch<2,>=1.1.0 in ./venv/lib/python3.6/site-packages (from torchprof) (1.4.0)
Installing collected packages: torchprof
Successfully installed torchprof-1.0.0
# get image
!wget http://images.cocodataset.org/val2017/000000439715.jpg -O input.jpg
im = cv2.imread("./input.jpg")
--2020-04-05 10:22:46-- http://images.cocodataset.org/val2017/000000439715.jpg
Resolving images.cocodataset.org (images.cocodataset.org)... 52.216.138.203
Connecting to images.cocodataset.org (images.cocodataset.org)|52.216.138.203|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 209222 (204K) [image/jpeg]
Saving to: ‘input.jpg’
input.jpg 100%[===================>] 204.32K 1013KB/s in 0.2s
2020-04-05 10:22:46 (1013 KB/s) - ‘input.jpg’ saved [209222/209222]
# import some common detectron2 utilities
from detectron2.engine import DefaultPredictor
from detectron2.config import get_cfg
from detectron2.utils.visualizer import Visualizer
from detectron2.data import MetadataCatalog
import cv2
import torchprof
# Create config
cfg = get_cfg()
cfg.merge_from_file("./detectron2_repo/configs/COCO-Detection/faster_rcnn_R_101_FPN_3x.yaml")
cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.5 # set threshold for this model
cfg.MODEL.WEIGHTS = "detectron2://COCO-Detection/faster_rcnn_R_101_FPN_3x/137851257/model_final_f6e8b1.pkl"
# Create predictor
predictor = DefaultPredictor(cfg)
paths = [("GeneralizedRCNN", "proposal_generator", "rpn_head", "conv"),]
with torchprof.Profile(predictor.model, paths=paths, use_cuda=True) as prof:
predictor(im)
print(prof.display(show_events=False))
print("=" * 40)
trace, event_lists_dict = prof.raw()
# trace[262] # Trace(path=('GeneralizedRCNN', 'proposal_generator', 'rpn_head', 'conv'), leaf=True, module=Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)))
for evl_run in event_lists_dict[trace[262].path]:
print(evl_run)
Module | Self CPU total | CPU total | CUDA total
------------------------|----------------|-----------|-----------
GeneralizedRCNN | | |
├── backbone | | |
│├── fpn_lateral2 | | |
│├── fpn_output2 | | |
│├── fpn_lateral3 | | |
│├── fpn_output3 | | |
│├── fpn_lateral4 | | |
│├── fpn_output4 | | |
│├── fpn_lateral5 | | |
│├── fpn_output5 | | |
│├── top_block | | |
│├── bottom_up | | |
││├── stem | | |
│││├── conv1 | | |
││││└── norm | | |
││├── res2 | | |
│││├── 0 | | |
││││├── shortcut | | |
│││││└── norm | | |
││││├── conv1 | | |
│││││└── norm | | |
││││├── conv2 | | |
│││││└── norm | | |
││││├── conv3 | | |
│││││└── norm | | |
│││├── 1 | | |
││││├── conv1 | | |
│││││└── norm | | |
││││├── conv2 | | |
│││││└── norm | | |
││││├── conv3 | | |
│││││└── norm | | |
│││├── 2 | | |
││││├── conv1 | | |
│││││└── norm | | |
││││├── conv2 | | |
│││││└── norm | | |
││││├── conv3 | | |
│││││└── norm | | |
││├── res3 | | |
│││├── 0 | | |
││││├── shortcut | | |
│││││└── norm | | |
││││├── conv1 | | |
│││││└── norm | | |
││││├── conv2 | | |
│││││└── norm | | |
││││├── conv3 | | |
│││││└── norm | | |
│││├── 1 | | |
││││├── conv1 | | |
│││││└── norm | | |
││││├── conv2 | | |
│││││└── norm | | |
││││├── conv3 | | |
│││││└── norm | | |
│││├── 2 | | |
││││├── conv1 | | |
│││││└── norm | | |
││││├── conv2 | | |
│││││└── norm | | |
││││├── conv3 | | |
│││││└── norm | | |
│││├── 3 | | |
││││├── conv1 | | |
│││││└── norm | | |
││││├── conv2 | | |
│││││└── norm | | |
││││├── conv3 | | |
│││││└── norm | | |
││├── res4 | | |
│││├── 0 | | |
││││├── shortcut | | |
│││││└── norm | | |
││││├── conv1 | | |
│││││└── norm | | |
││││├── conv2 | | |
│││││└── norm | | |
││││├── conv3 | | |
│││││└── norm | | |
│││├── 1 | | |
││││├── conv1 | | |
│││││└── norm | | |
││││├── conv2 | | |
│││││└── norm | | |
││││├── conv3 | | |
│││││└── norm | | |
│││├── 2 | | |
││││├── conv1 | | |
│││││└── norm | | |
││││├── conv2 | | |
│││││└── norm | | |
││││├── conv3 | | |
│││││└── norm | | |
│││├── 3 | | |
││││├── conv1 | | |
│││││└── norm | | |
││││├── conv2 | | |
│││││└── norm | | |
││││├── conv3 | | |
│││││└── norm | | |
│││├── 4 | | |
││││├── conv1 | | |
│││││└── norm | | |
││││├── conv2 | | |
│││││└── norm | | |
││││├── conv3 | | |
│││││└── norm | | |
│││├── 5 | | |
││││├── conv1 | | |
│││││└── norm | | |
││││├── conv2 | | |
│││││└── norm | | |
││││├── conv3 | | |
│││││└── norm | | |
│││├── 6 | | |
││││├── conv1 | | |
│││││└── norm | | |
││││├── conv2 | | |
│││││└── norm | | |
││││├── conv3 | | |
│││││└── norm | | |
│││├── 7 | | |
││││├── conv1 | | |
│││││└── norm | | |
││││├── conv2 | | |
│││││└── norm | | |
││││├── conv3 | | |
│││││└── norm | | |
│││├── 8 | | |
││││├── conv1 | | |
│││││└── norm | | |
││││├── conv2 | | |
│││││└── norm | | |
││││├── conv3 | | |
│││││└── norm | | |
│││├── 9 | | |
││││├── conv1 | | |
│││││└── norm | | |
││││├── conv2 | | |
│││││└── norm | | |
││││├── conv3 | | |
│││││└── norm | | |
│││├── 10 | | |
││││├── conv1 | | |
│││││└── norm | | |
││││├── conv2 | | |
│││││└── norm | | |
││││├── conv3 | | |
│││││└── norm | | |
│││├── 11 | | |
││││├── conv1 | | |
│││││└── norm | | |
││││├── conv2 | | |
│││││└── norm | | |
││││├── conv3 | | |
│││││└── norm | | |
│││├── 12 | | |
││││├── conv1 | | |
│││││└── norm | | |
││││├── conv2 | | |
│││││└── norm | | |
││││├── conv3 | | |
│││││└── norm | | |
│││├── 13 | | |
││││├── conv1 | | |
│││││└── norm | | |
││││├── conv2 | | |
│││││└── norm | | |
││││├── conv3 | | |
│││││└── norm | | |
│││├── 14 | | |
││││├── conv1 | | |
│││││└── norm | | |
││││├── conv2 | | |
│││││└── norm | | |
││││├── conv3 | | |
│││││└── norm | | |
│││├── 15 | | |
││││├── conv1 | | |
│││││└── norm | | |
││││├── conv2 | | |
│││││└── norm | | |
││││├── conv3 | | |
│││││└── norm | | |
│││├── 16 | | |
││││├── conv1 | | |
│││││└── norm | | |
││││├── conv2 | | |
│││││└── norm | | |
││││├── conv3 | | |
│││││└── norm | | |
│││├── 17 | | |
││││├── conv1 | | |
│││││└── norm | | |
││││├── conv2 | | |
│││││└── norm | | |
││││├── conv3 | | |
│││││└── norm | | |
│││├── 18 | | |
││││├── conv1 | | |
│││││└── norm | | |
││││├── conv2 | | |
│││││└── norm | | |
││││├── conv3 | | |
│││││└── norm | | |
│││├── 19 | | |
││││├── conv1 | | |
│││││└── norm | | |
││││├── conv2 | | |
│││││└── norm | | |
││││├── conv3 | | |
│││││└── norm | | |
│││├── 20 | | |
││││├── conv1 | | |
│││││└── norm | | |
││││├── conv2 | | |
│││││└── norm | | |
││││├── conv3 | | |
│││││└── norm | | |
│││├── 21 | | |
││││├── conv1 | | |
│││││└── norm | | |
││││├── conv2 | | |
│││││└── norm | | |
││││├── conv3 | | |
│││││└── norm | | |
│││├── 22 | | |
││││├── conv1 | | |
│││││└── norm | | |
││││├── conv2 | | |
│││││└── norm | | |
││││├── conv3 | | |
│││││└── norm | | |
││├── res5 | | |
│││├── 0 | | |
││││├── shortcut | | |
│││││└── norm | | |
││││├── conv1 | | |
│││││└── norm | | |
││││├── conv2 | | |
│││││└── norm | | |
││││├── conv3 | | |
│││││└── norm | | |
│││├── 1 | | |
││││├── conv1 | | |
│││││└── norm | | |
││││├── conv2 | | |
│││││└── norm | | |
││││├── conv3 | | |
│││││└── norm | | |
│││├── 2 | | |
││││├── conv1 | | |
│││││└── norm | | |
││││├── conv2 | | |
│││││└── norm | | |
││││└── conv3 | | |
││││ └── norm | | |
├── proposal_generator | | |
│├── anchor_generator | | |
││└── cell_anchors | | |
│├── rpn_head | | |
││├── conv | 393.380us | 1.385ms | 23.805ms
││├── objectness_logits | | |
││└── anchor_deltas | | |
└── roi_heads | | |
├── box_pooler | | |
│├── level_poolers | | |
││├── 0 | | |
││├── 1 | | |
││├── 2 | | |
││└── 3 | | |
├── box_head | | |
│├── fc1 | | |
│└── fc2 | | |
└── box_predictor | | |
├── cls_score | | |
└── bbox_pred | | |
========================================
--------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- -----------------------------------
Name Self CPU total % Self CPU total CPU total % CPU total CPU time avg CUDA total % CUDA total CUDA time avg Number of Calls Input Shapes
--------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- -----------------------------------
conv2d 7.30% 5.930us 100.00% 81.220us 81.220us 25.06% 4.042ms 4.042ms 1 []
convolution 6.39% 5.190us 92.70% 75.290us 75.290us 25.02% 4.036ms 4.036ms 1 []
_convolution 14.27% 11.590us 86.31% 70.100us 70.100us 25.00% 4.032ms 4.032ms 1 []
contiguous 3.92% 3.180us 3.92% 3.180us 3.180us 0.02% 3.008us 3.008us 1 []
cudnn_convolution 68.12% 55.330us 68.12% 55.330us 55.330us 24.91% 4.018ms 4.018ms 1 []
--------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- -----------------------------------
Self CPU time total: 81.220us
CUDA time total: 16.131ms
--------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- -----------------------------------
Name Self CPU total % Self CPU total CPU total % CPU total CPU time avg CUDA total % CUDA total CUDA time avg Number of Calls Input Shapes
--------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- -----------------------------------
conv2d 6.76% 5.340us 100.00% 79.050us 79.050us 25.17% 1.168ms 1.168ms 1 []
convolution 6.91% 5.460us 93.24% 73.710us 73.710us 25.10% 1.165ms 1.165ms 1 []
_convolution 14.00% 11.070us 86.34% 68.250us 68.250us 24.99% 1.160ms 1.160ms 1 []
contiguous 3.61% 2.850us 3.61% 2.850us 2.850us 0.05% 2.112us 2.112us 1 []
cudnn_convolution 68.73% 54.330us 68.73% 54.330us 54.330us 24.70% 1.147ms 1.147ms 1 []
--------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- -----------------------------------
Self CPU time total: 79.050us
CUDA time total: 4.643ms
--------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- -----------------------------------
Name Self CPU total % Self CPU total CPU total % CPU total CPU time avg CUDA total % CUDA total CUDA time avg Number of Calls Input Shapes
--------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- -----------------------------------
conv2d 6.81% 5.300us 100.00% 77.880us 77.880us 25.53% 420.864us 420.864us 1 []
convolution 6.86% 5.340us 93.19% 72.580us 72.580us 25.22% 415.744us 415.744us 1 []
_convolution 14.23% 11.080us 86.34% 67.240us 67.240us 24.97% 411.648us 411.648us 1 []
contiguous 3.92% 3.050us 3.92% 3.050us 3.050us 0.12% 2.048us 2.048us 1 []
cudnn_convolution 68.19% 53.110us 68.19% 53.110us 53.110us 24.16% 398.336us 398.336us 1 []
--------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- -----------------------------------
Self CPU time total: 77.880us
CUDA time total: 1.649ms
--------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- -----------------------------------
Name Self CPU total % Self CPU total CPU total % CPU total CPU time avg CUDA total % CUDA total CUDA time avg Number of Calls Input Shapes
--------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- -----------------------------------
conv2d 6.63% 5.120us 100.00% 77.190us 77.190us 26.00% 212.992us 212.992us 1 []
convolution 6.48% 5.000us 93.37% 72.070us 72.070us 25.38% 207.872us 207.872us 1 []
_convolution 13.91% 10.740us 86.89% 67.070us 67.070us 24.99% 204.672us 204.672us 1 []
contiguous 3.38% 2.610us 3.38% 2.610us 2.610us 0.25% 2.048us 2.048us 1 []
cudnn_convolution 69.59% 53.720us 69.59% 53.720us 53.720us 23.38% 191.488us 191.488us 1 []
--------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- -----------------------------------
Self CPU time total: 77.190us
CUDA time total: 819.072us
--------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- -----------------------------------
Name Self CPU total % Self CPU total CPU total % CPU total CPU time avg CUDA total % CUDA total CUDA time avg Number of Calls Input Shapes
--------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- -----------------------------------
conv2d 6.70% 5.230us 100.00% 78.040us 78.040us 26.49% 149.504us 149.504us 1 []
convolution 6.66% 5.200us 93.30% 72.810us 72.810us 25.58% 144.384us 144.384us 1 []
_convolution 14.49% 11.310us 86.64% 67.610us 67.610us 24.86% 140.288us 140.288us 1 []
contiguous 3.77% 2.940us 3.77% 2.940us 2.940us 0.39% 2.176us 2.176us 1 []
cudnn_convolution 68.38% 53.360us 68.38% 53.360us 53.360us 22.68% 128.000us 128.000us 1 []
--------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- -----------------------------------
Self CPU time total: 78.040us
CUDA time total: 564.352us
# paths = [("GeneralizedRCNN", "proposal_generator", "rpn_head", "conv"),]
# with torchprof.Profile(predictor.model, paths=paths, use_cuda=True) as prof:
# predictor(im)
print(prof.display(show_events=True))
# print("=" * 40)
trace, event_lists_dict = prof.raw()
# trace[262] # Trace(path=('GeneralizedRCNN', 'proposal_generator', 'rpn_head', 'conv'), leaf=True, module=Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)))
for evl_run in event_lists_dict[trace[262].path]:
print(evl_run)
Module | Self CPU total | CPU total | CUDA total
-------------------------|----------------|-----------|-----------
GeneralizedRCNN | | |
├── backbone | | |
│├── fpn_lateral2 | | |
│├── fpn_output2 | | |
│├── fpn_lateral3 | | |
│├── fpn_output3 | | |
│├── fpn_lateral4 | | |
│├── fpn_output4 | | |
│├── fpn_lateral5 | | |
│├── fpn_output5 | | |
│├── top_block | | |
│├── bottom_up | | |
││├── stem | | |
│││├── conv1 | | |
││││└── norm | | |
││├── res2 | | |
│││├── 0 | | |
││││├── shortcut | | |
│││││└── norm | | |
││││├── conv1 | | |
│││││└── norm | | |
││││├── conv2 | | |
│││││└── norm | | |
││││├── conv3 | | |
│││││└── norm | | |
│││├── 1 | | |
││││├── conv1 | | |
│││││└── norm | | |
││││├── conv2 | | |
│││││└── norm | | |
││││├── conv3 | | |
│││││└── norm | | |
│││├── 2 | | |
││││├── conv1 | | |
│││││└── norm | | |
││││├── conv2 | | |
│││││└── norm | | |
││││├── conv3 | | |
│││││└── norm | | |
││├── res3 | | |
│││├── 0 | | |
││││├── shortcut | | |
│││││└── norm | | |
││││├── conv1 | | |
│││││└── norm | | |
││││├── conv2 | | |
│││││└── norm | | |
││││├── conv3 | | |
│││││└── norm | | |
│││├── 1 | | |
││││├── conv1 | | |
│││││└── norm | | |
││││├── conv2 | | |
│││││└── norm | | |
││││├── conv3 | | |
│││││└── norm | | |
│││├── 2 | | |
││││├── conv1 | | |
│││││└── norm | | |
││││├── conv2 | | |
│││││└── norm | | |
││││├── conv3 | | |
│││││└── norm | | |
│││├── 3 | | |
││││├── conv1 | | |
│││││└── norm | | |
││││├── conv2 | | |
│││││└── norm | | |
││││├── conv3 | | |
│││││└── norm | | |
││├── res4 | | |
│││├── 0 | | |
││││├── shortcut | | |
│││││└── norm | | |
││││├── conv1 | | |
│││││└── norm | | |
││││├── conv2 | | |
│││││└── norm | | |
││││├── conv3 | | |
│││││└── norm | | |
│││├── 1 | | |
││││├── conv1 | | |
│││││└── norm | | |
││││├── conv2 | | |
│││││└── norm | | |
││││├── conv3 | | |
│││││└── norm | | |
│││├── 2 | | |
││││├── conv1 | | |
│││││└── norm | | |
││││├── conv2 | | |
│││││└── norm | | |
││││├── conv3 | | |
│││││└── norm | | |
│││├── 3 | | |
││││├── conv1 | | |
│││││└── norm | | |
││││├── conv2 | | |
│││││└── norm | | |
││││├── conv3 | | |
│││││└── norm | | |
│││├── 4 | | |
││││├── conv1 | | |
│││││└── norm | | |
││││├── conv2 | | |
│││││└── norm | | |
││││├── conv3 | | |
│││││└── norm | | |
│││├── 5 | | |
││││├── conv1 | | |
│││││└── norm | | |
││││├── conv2 | | |
│││││└── norm | | |
││││├── conv3 | | |
│││││└── norm | | |
│││├── 6 | | |
││││├── conv1 | | |
│││││└── norm | | |
││││├── conv2 | | |
│││││└── norm | | |
││││├── conv3 | | |
│││││└── norm | | |
│││├── 7 | | |
││││├── conv1 | | |
│││││└── norm | | |
││││├── conv2 | | |
│││││└── norm | | |
││││├── conv3 | | |
│││││└── norm | | |
│││├── 8 | | |
││││├── conv1 | | |
│││││└── norm | | |
││││├── conv2 | | |
│││││└── norm | | |
││││├── conv3 | | |
│││││└── norm | | |
│││├── 9 | | |
││││├── conv1 | | |
│││││└── norm | | |
││││├── conv2 | | |
│││││└── norm | | |
││││├── conv3 | | |
│││││└── norm | | |
│││├── 10 | | |
││││├── conv1 | | |
│││││└── norm | | |
││││├── conv2 | | |
│││││└── norm | | |
││││├── conv3 | | |
│││││└── norm | | |
│││├── 11 | | |
││││├── conv1 | | |
│││││└── norm | | |
││││├── conv2 | | |
│││││└── norm | | |
││││├── conv3 | | |
│││││└── norm | | |
│││├── 12 | | |
││││├── conv1 | | |
│││││└── norm | | |
││││├── conv2 | | |
│││││└── norm | | |
││││├── conv3 | | |
│││││└── norm | | |
│││├── 13 | | |
││││├── conv1 | | |
│││││└── norm | | |
││││├── conv2 | | |
│││││└── norm | | |
││││├── conv3 | | |
│││││└── norm | | |
│││├── 14 | | |
││││├── conv1 | | |
│││││└── norm | | |
││││├── conv2 | | |
│││││└── norm | | |
││││├── conv3 | | |
│││││└── norm | | |
│││├── 15 | | |
││││├── conv1 | | |
│││││└── norm | | |
││││├── conv2 | | |
│││││└── norm | | |
││││├── conv3 | | |
│││││└── norm | | |
│││├── 16 | | |
││││├── conv1 | | |
│││││└── norm | | |
││││├── conv2 | | |
│││││└── norm | | |
││││├── conv3 | | |
│││││└── norm | | |
│││├── 17 | | |
││││├── conv1 | | |
│││││└── norm | | |
││││├── conv2 | | |
│││││└── norm | | |
││││├── conv3 | | |
│││││└── norm | | |
│││├── 18 | | |
││││├── conv1 | | |
│││││└── norm | | |
││││├── conv2 | | |
│││││└── norm | | |
││││├── conv3 | | |
│││││└── norm | | |
│││├── 19 | | |
││││├── conv1 | | |
│││││└── norm | | |
││││├── conv2 | | |
│││││└── norm | | |
││││├── conv3 | | |
│││││└── norm | | |
│││├── 20 | | |
││││├── conv1 | | |
│││││└── norm | | |
││││├── conv2 | | |
│││││└── norm | | |
││││├── conv3 | | |
│││││└── norm | | |
│││├── 21 | | |
││││├── conv1 | | |
│││││└── norm | | |
││││├── conv2 | | |
│││││└── norm | | |
││││├── conv3 | | |
│││││└── norm | | |
│││├── 22 | | |
││││├── conv1 | | |
│││││└── norm | | |
││││├── conv2 | | |
│││││└── norm | | |
││││├── conv3 | | |
│││││└── norm | | |
││├── res5 | | |
│││├── 0 | | |
││││├── shortcut | | |
│││││└── norm | | |
││││├── conv1 | | |
│││││└── norm | | |
││││├── conv2 | | |
│││││└── norm | | |
││││├── conv3 | | |
│││││└── norm | | |
│││├── 1 | | |
││││├── conv1 | | |
│││││└── norm | | |
││││├── conv2 | | |
│││││└── norm | | |
││││├── conv3 | | |
│││││└── norm | | |
│││├── 2 | | |
││││├── conv1 | | |
│││││└── norm | | |
││││├── conv2 | | |
│││││└── norm | | |
││││└── conv3 | | |
││││ └── norm | | |
├── proposal_generator | | |
│├── anchor_generator | | |
││└── cell_anchors | | |
│├── rpn_head | | |
││├── conv | | |
│││├── conv2d | 5.090us | 76.560us | 148.480us
│││├── convolution | 4.890us | 71.470us | 143.360us
│││├── _convolution | 10.410us | 66.580us | 140.288us
│││├── contiguous | 2.730us | 2.730us | 2.912us
│││└── cudnn_convolution | 53.440us | 53.440us | 128.000us
││├── objectness_logits | | |
││└── anchor_deltas | | |
└── roi_heads | | |
├── box_pooler | | |
│├── level_poolers | | |
││├── 0 | | |
││├── 1 | | |
││├── 2 | | |
││└── 3 | | |
├── box_head | | |
│├── fc1 | | |
│└── fc2 | | |
└── box_predictor | | |
├── cls_score | | |
└── bbox_pred | | |
--------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- -----------------------------------
Name Self CPU total % Self CPU total CPU total % CPU total CPU time avg CUDA total % CUDA total CUDA time avg Number of Calls Input Shapes
--------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- -----------------------------------
conv2d 6.40% 5.290us 100.00% 82.600us 82.600us 25.05% 4.041ms 4.041ms 1 []
convolution 6.57% 5.430us 93.60% 77.310us 77.310us 25.03% 4.037ms 4.037ms 1 []
_convolution 14.99% 12.380us 87.02% 71.880us 71.880us 25.00% 4.031ms 4.031ms 1 []
contiguous 3.75% 3.100us 3.75% 3.100us 3.100us 0.02% 3.072us 3.072us 1 []
cudnn_convolution 68.28% 56.400us 68.28% 56.400us 56.400us 24.91% 4.017ms 4.017ms 1 []
--------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- -----------------------------------
Self CPU time total: 82.600us
CUDA time total: 16.129ms
--------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- -----------------------------------
Name Self CPU total % Self CPU total CPU total % CPU total CPU time avg CUDA total % CUDA total CUDA time avg Number of Calls Input Shapes
--------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- -----------------------------------
conv2d 6.65% 5.150us 100.00% 77.440us 77.440us 25.17% 1.166ms 1.166ms 1 []
convolution 6.55% 5.070us 93.35% 72.290us 72.290us 25.08% 1.162ms 1.162ms 1 []
_convolution 14.24% 11.030us 86.80% 67.220us 67.220us 24.97% 1.157ms 1.157ms 1 []
contiguous 3.56% 2.760us 3.56% 2.760us 2.760us 0.07% 3.072us 3.072us 1 []
cudnn_convolution 69.00% 53.430us 69.00% 53.430us 53.430us 24.71% 1.145ms 1.145ms 1 []
--------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- -----------------------------------
Self CPU time total: 77.440us
CUDA time total: 4.633ms
--------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- -----------------------------------
Name Self CPU total % Self CPU total CPU total % CPU total CPU time avg CUDA total % CUDA total CUDA time avg Number of Calls Input Shapes
--------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- -----------------------------------
conv2d 6.93% 5.360us 100.00% 77.340us 77.340us 25.47% 414.720us 414.720us 1 []
convolution 6.88% 5.320us 93.07% 71.980us 71.980us 25.22% 410.624us 410.624us 1 []
_convolution 14.61% 11.300us 86.19% 66.660us 66.660us 24.96% 406.336us 406.336us 1 []
contiguous 3.71% 2.870us 3.71% 2.870us 2.870us 0.19% 3.072us 3.072us 1 []
cudnn_convolution 67.87% 52.490us 67.87% 52.490us 52.490us 24.15% 393.216us 393.216us 1 []
--------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- -----------------------------------
Self CPU time total: 77.340us
CUDA time total: 1.628ms
--------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- -----------------------------------
Name Self CPU total % Self CPU total CPU total % CPU total CPU time avg CUDA total % CUDA total CUDA time avg Number of Calls Input Shapes
--------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- -----------------------------------
conv2d 6.76% 5.250us 100.00% 77.710us 77.710us 25.88% 211.968us 211.968us 1 []
convolution 6.16% 4.790us 93.24% 72.460us 72.460us 25.38% 207.872us 207.872us 1 []
_convolution 14.18% 11.020us 87.08% 67.670us 67.670us 24.88% 203.776us 203.776us 1 []
contiguous 3.47% 2.700us 3.47% 2.700us 2.700us 0.36% 2.976us 2.976us 1 []
cudnn_convolution 69.42% 53.950us 69.42% 53.950us 53.950us 23.49% 192.352us 192.352us 1 []
--------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- -----------------------------------
Self CPU time total: 77.710us
CUDA time total: 818.944us
--------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- -----------------------------------
Name Self CPU total % Self CPU total CPU total % CPU total CPU time avg CUDA total % CUDA total CUDA time avg Number of Calls Input Shapes
--------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- -----------------------------------
conv2d 6.65% 5.090us 100.00% 76.560us 76.560us 26.37% 148.480us 148.480us 1 []
convolution 6.39% 4.890us 93.35% 71.470us 71.470us 25.46% 143.360us 143.360us 1 []
_convolution 13.60% 10.410us 86.96% 66.580us 66.580us 24.92% 140.288us 140.288us 1 []
contiguous 3.57% 2.730us 3.57% 2.730us 2.730us 0.52% 2.912us 2.912us 1 []
cudnn_convolution 69.80% 53.440us 69.80% 53.440us 53.440us 22.73% 128.000us 128.000us 1 []
--------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- -----------------------------------
Self CPU time total: 76.560us
CUDA time total: 563.040us