Dennis Bappert dbpprt

Sr. Solutions Architect @ AWS - passionate about analytics and machine learning — all opinions are my own.

27 followers · 1 following

Amazon Web Services
Mainz

View GitHub Profile

Recently created

Least recently created

Recently updated

Least recently updated

dbpprt / gist:bf06f54cfda9f15f11c2ce46fc0be216

Created November 5, 2021 14:44

Arcface

	import argparse
	import logging
	import os

	import torch
	import torch.distributed as dist
	import torch.nn.functional as F
	import torch.utils.data.distributed
	from torch.nn.utils import clip_grad_norm_

dbpprt / oom.md

Last active September 6, 2021 13:27

A simple helper function to handle OOM errors while training with PyTorch. On my Windows system I sometimes get strange OutOfMemory errors in the middle of a training job. This wrapper tries to recover by freeing up as much memory as possible and splits the batches into half.

Usage

optimizer.zero_grad()

def criterion(output, target, steps, batch_size):
    loss = F.cross_entropy(output, target)
    loss.backward()
    return loss