Skip to content

Instantly share code, notes, and snippets.

@philschmid
Created October 18, 2023 22:11
Show Gist options
  • Save philschmid/8753822361f393c3a80f6b1da68a6b1b to your computer and use it in GitHub Desktop.
Save philschmid/8753822361f393c3a80f6b1da68a6b1b to your computer and use it in GitHub Desktop.
torchrun --nnodes 2 --nproc_per_node 32 --master_addr algo-1 --master_port 7777 --node_rank 0 train_llama.py \
--model_id "meta-llama/Llama-2-70b-hf" \
--lr 5e-5 \
--per_device_train_batch_size 16 \
--bf16 True \
--epochs 3
torchrun --nproc_per_node=32 train_llama.py \
--model_id "meta-llama/Llama-2-7b-hf" \
--lr 5e-5 \
--per_device_train_batch_size 16 \
--bf16 True \
--epochs 3
from transformers import AutoTokenizer, AutoModelForCausalLM
from optimum.neuron import NeuronTrainer, NeuronTrainingArguments
# Prepare and tokenize dataset
# ....
# Load Llama model
llama = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-2-7b-hf")
# Define Hyperparameters
training_args = TrainingArguments(...)
# Create Trainer instance
trainer = Trainer(
model=llama,
args=training_args,
train_dataset=train_dataset,
eval_dataset=eval_dataset
)
# Start training
trainer.train()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment