- LLama.cpp
- git clone git@github.com:ggerganov/llama.cpp.git
- pip install -r llama.cpp/requirements.txt
- Compile llama.cpp a) cd llama.cpp b) make
- wikiextractor (this is used for creating imatrix files from wikipedia dumps)
- git clone git@github.com:attardi/wikiextractor.git
- cd wikiextractor
- wget https://dumps.wikimedia.org/enwiki/latest/enwiki-latest-pages-articles.xml.bz2
- this is a 20gb + file so may take a while to download
- python -m wikiextractor/WikiExtractor -o extracted enwiki-latest-pages-articles.xml.bz2
- cat extracted// > wiki.train.raw
- mv wiki.train.raw ../
- cd ../
- rm -rf wikiextractor
- Download the HF model (see download script)
- Convert HF model to gguf format
- python llama.cpp/convert.py {hf_model} --outfile {gguf_model} --outtype fp16
- [Optional if doing IQ style quantization] Generate the raw data for creating an imatrix file
- python ./wikiextractor/WikiExtractor.py -o extracted enwiki-latest-pages-articles.xml.bz2
- [Optional if doing IQ style quantization] Generate the imatrix file
- ./llama.cpp/imatrix -m {gguf_model} -f wiki.train.raw -o imatrix_{gguf_model}.dat --chunks 100
- Quantize
- [iQ] ./llama.cpp/quantize --imatrix imatrix_{gguf_model}.dat {gguf_model} quantized_model.gguf iq2_xxs
- [legacy] ./llama.cpp/quantize {gguf_model} quantized_model_Q4_K_M.gguf Q4_K_M
- Upload to HF (see upload script)
Created
May 16, 2024 21:31
-
-
Save thesven/dbbc3b8dff7ba14820991fcefe2e1602 to your computer and use it in GitHub Desktop.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment