Skip to content

Instantly share code, notes, and snippets.

View sanchitintel's full-sized avatar
💭
Please send a message on Slack/MS Teams if I miss a notification. Thanks!

sanchitintel

💭
Please send a message on Slack/MS Teams if I miss a notification. Thanks!
  • San Francisco Bay Area
View GitHub Profile
@mingfeima
mingfeima / pytorch_performance_profiling.md
Last active September 7, 2024 06:21
How to do performance profiling on PyTorch

(Internal Tranining Material)

Usually the first step in performance optimization is to do profiling, e.g. to identify performance hotspots of a workload. This gist tells basic knowledge of performance profiling on PyTorch, you will get:

  • How to find the bottleneck operator?
  • How to trace source file of a particular operator?
  • How do I indentify threading issues? (oversubscription)
  • How do I tell a specific operator is running efficiently or not?

This tutorial takes one of my recent projects - pssp-transformer as an example to guide you through path of PyTorch CPU peformance optimization. Focus will be on Part 1 & Part 2.