Fine-Tuning GPT-2 on a Custom Dataset

alluringstorms commented Aug 16, 2022

What would the format be for the dataset? Could it just be sentences? JSON Format or Text format? This is the stuff that one needs to know, at least for beginners in machine learning. If you're still active on GitHub, would love a reply. Thanks!

Author

MattPitlyk commented Aug 16, 2022

Just a text file with each example sentence on its own line. No json needed.

seungjun-green commented Jan 20, 2023

If I want to create a text summary ml model by fine tuning GPT2, then how the text file should be formatted?

anujsahani01 commented May 20, 2023

the text will get tokenized in its own?
we just have to pass the text file, what kind of formatting should be done,

Question: 'the ques'

Answer: 'the answer'

will this format work?
And secondly my colab session is crashing when we train the model, what can be the solution to this?

AhmedAskar12 commented Jul 11, 2023

It requires >20gb memory so you can subscribe to colab plus or use a free trial virtual machine.

AhmedAskar12 commented Jul 11, 2023 •

edited

Loading

Is that gpt2 fine tuning approach effective btw?

MattPitlyk/fine-tuning-gpt-2-on-a-custom-dataset.ipynb