Get the model file Hugging face: The Bloke. I chose llama-2-13b-chat.ggmlv3.q4_K_S.bin
. It's a good tradeoff between size, speed, and quality. It's about 12G on my card and I get 60 tokens/ second throughput.
install and start the llama-2 server
$ llama.cpp/server -t 8 -ngl 128 -m ./llama-2-13b-chat.ggmlv3.q4_K_S.bin -eps 1e-5 -c 4096 -b 1024 --port 8009
Here's my examples data format for the examples I want to use in my few shot prompt
[{"text": "Here's a bunch of text that I want to extract info from.", "facts": ["this is text", "there are facts in this"]}, \
{"text": "ANother blob of text with lots of great facts", "facts": ["example fact 1", "another example fact", "fact alpha"]}, ...]
Here's the python code that pulls your examples into the prompt and uses it for a few more examples.
import json
import requests
import random
all_samples = []
with open("data/text_and_facts.json", "r") as f:
all_samples = json.load(f)
random.shuffle(all_samples)
# llama-2-13b-chat.ggmlv3.q4_K_S.bin
SERVER_ADDRESS = "http://localhost:8009/completion"
HEADERS = {"Content-Type": "application/json"}
def get_completion(message):
data = {
"prompt": message,
"temperature": 0.8,
"repeat_penalty": 1.4,
"top_p": 0.90,
"n_predict": 4096,
"stop": ["\n"],
}
r = requests.post(SERVER_ADDRESS, headers=HEADERS, data=json.dumps(data))
d = r.json()
return d["content"].strip()
# B_INST, E_INST = "[INST]", "[/INST]"
# B_SYS, E_SYS = "<>\n", "\n<>\n\n"
SYSTEM_PROMPT = """<>\nGiven the following text, generate a list of potential facts from the text. Start every fact with "F: ". Do not repeat facts.\n<>\n\n"""
TEMPLATE = """[INST] {TEXT} [/INST]\n{FACTS}"""
N_SHOTS = 2
few_shots = SYSTEM_PROMPT
for sample in all_samples[:N_SHOTS]:
text = sample["text"]
facts = " ".join([f"F: {q}" for q in sample["facts"]])
few_shots += TEMPLATE.format(TEXT=text, FACTS=facts) + "\n"
print("Your few shot prompt:")
print(few_shots)
N_SAMPLES = 6
for sample in all_samples[N_SHOTS : N_SHOTS + N_SAMPLES]:
text = sample["text"]
user_content = few_shots + TEMPLATE.format(TEXT=text, FACTS="")
print("============================================================")
print(text)
print("==========")
print(get_completion(user_content))
print("============================================================")