Skip to content

Instantly share code, notes, and snippets.

View manmartgarc's full-sized avatar
🐢

Manuel Martinez manmartgarc

🐢
View GitHub Profile
@TengdaHan
TengdaHan / ddp_notes.md
Last active September 18, 2024 22:58
Multi-node-training on slurm with PyTorch

Multi-node-training on slurm with PyTorch

What's this?

  • A simple note for how to start multi-node-training on slurm scheduler with PyTorch.
  • Useful especially when scheduler is too busy that you cannot get multiple GPUs allocated, or you need more than 4 GPUs for a single job.
  • Requirement: Have to use PyTorch DistributedDataParallel(DDP) for this purpose.
  • Warning: might need to re-factor your own code.
  • Warning: might be secretly condemned by your colleagues because using too many GPUs.
@mosquito
mosquito / README.md
Last active September 9, 2024 17:44
Add doker-compose as a systemd unit

Docker compose as a systemd unit

Create file /etc/systemd/system/docker-compose@.service. SystemD calling binaries using an absolute path. In my case is prefixed by /usr/local/bin, you should use paths specific for your environment.

[Unit]
Description=%i service with docker compose
PartOf=docker.service
After=docker.service
@bsweger
bsweger / useful_pandas_snippets.md
Last active April 19, 2024 18:04
Useful Pandas Snippets

Useful Pandas Snippets

A personal diary of DataFrame munging over the years.

Data Types and Conversion

Convert Series datatype to numeric (will error if column has non-numeric values)
(h/t @makmanalp)