Today on Hacker News, the top article was LLaMA2 Chat 70B outperformed ChatGPT linking to a leaderboard of LLMs. As of today, July 27, 2023, the top 10 is as follows:
Model Name | Win Rate | Length |
---|---|---|
GPT-4 | 95.28% | 1365 |
LLaMA2 Chat 70B | 92.66% | 1790 |
Claude 2 | 91.36% | 1069 |
ChatGPT | 89.37% | 827 |
WizardLM 13B V1.2 | 89.17% | 1635 |
Vicuna 33B v1.3 | 88.99% | 1479 |
Claude | 88.39% | 1082 |
OpenChat V2-W 13B | 87.13% | 1566 |
WizardLM 13B V1.1 | 86.32% | 1525 |
OpenChat V2 13B | 84.97% | 1564 |
Vicuna 13B v1.3 | 82.11% | 1132 |
LLaMA2 Chat 13B | 81.09% | 1513 |
Incidentally I have been playing around with inference on my own Mac and randomly trying different models, the leaderboard has been a good place to focus my experimentation.
I'm currently running a MacBook Pro with an M1 Max chip (64 gB RAM). Much to my surprise, I can infer most open source LLMs using llama.cpp. Here's the setup:
- Download and install the text generation UI
- Follow these instructions to use the llama.cpp backend
- Download GGML models (llama.cpp formatted files) from TheBloke on HF and
ln -s
them to themodels
directory
Here are a list of the models I'm experimenting with:
- TheBloke/Llama-2-70B-Chat-GGML
- TheBloke/WizardLM-13B-V1.2-GGML
- TheBloke/vicuna-33B-GGML
- TheBloke/Vicuna-33B-1-3-SuperHOT-8K-GGML
- TheBloke/WizardLM-13B-V1.1-GGML
- TheBloke/WizardLM-13B-V1-1-SuperHOT-8K-GGML
- TheBloke/vicuna-13b-v1.3.0-GGML
Note: I'm using the q4_K_M.bin
version of these models, model cards on HF and llama.cpp have a more detailed discussion on different quantization values.
Typically for 13B models I'm achieving 5-7 tokens/sec whereas with larger models I'm getting 1-2 tokens/sec. I have yet to dive deeper into parameter tuning for performance.
As for the results, my main use case is for data engineering tasks such as parsing sql, reformatting code, converting unstructured data to structured formats, and so on. So far, I've been pleasantly surprised with the results when compared to OpenAI models. I will continue to experiment with these models and find new tasks for these free and open AIs!