Skip to content

Instantly share code, notes, and snippets.

@nilesh0109
Last active December 21, 2020 12:34
Show Gist options
  • Save nilesh0109/9b8e68cca6d36cb7e226f76bb7a5ff08 to your computer and use it in GitHub Desktop.
Save nilesh0109/9b8e68cca6d36cb7e226f76bb7a5ff08 to your computer and use it in GitHub Desktop.
Accumulating gradients for large batch training on single GPU.
TARGET_BATCH_SIZE, BATCH_FIT_IN_MEMORY = 256, 32
accumulation_steps = int(TARGET_BATCH_SIZE / BATCH_FIT_IN_MEMORY)
network.zero_grad() # Reset gradients tensors
for i, (imgs, labels) in enumerate(dataloader):
preds = network(imgs) # Forward pass
loss = loss_function(preds, labels) # Compute loss function
loss = loss / accumulation_steps # Normalize our loss (if averaged)
loss.backward() # Backward pass
if (i+1) % accumulation_steps == 0: # Wait for several backward steps
optim.step() # Perform an optimizer step
network.zero_grad() # Reset gradients tensors
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment