Notes on prompting with llama-2

Get the model file Hugging face: The Bloke. I chose llama-2-13b-chat.ggmlv3.q4_K_S.bin. It's a good tradeoff between size, speed, and quality. It's about 12G on my card and I get 60 tokens/ second throughput.

install and start the llama-2 server

$ llama.cpp/server -t 8 -ngl 128 -m ./llama-2-13b-chat.ggmlv3.q4_K_S.bin -eps 1e-5 -c 4096 -b 1024 --port 8009

Here's my examples data format for the examples I want to use in my few shot prompt

[{"text": "Here's a bunch of text that I want to extract info from.", "facts": ["this is text", "there are facts in this"]}, \
  {"text": "ANother blob of text with lots of great facts", "facts": ["example fact 1", "another example fact", "fact alpha"]}, ...]

Here's the python code that pulls your examples into the prompt and uses it for a few more examples.

import json
import requests
import random

all_samples = []
with open("data/text_and_facts.json", "r") as f:
    all_samples = json.load(f)
random.shuffle(all_samples)

# llama-2-13b-chat.ggmlv3.q4_K_S.bin
SERVER_ADDRESS = "http://localhost:8009/completion"
HEADERS = {"Content-Type": "application/json"}


def get_completion(message):
    data = {
        "prompt": message,
        "temperature": 0.8,
        "repeat_penalty": 1.4,
        "top_p": 0.90,
        "n_predict": 4096,
        "stop": ["\n"],
    }
    r = requests.post(SERVER_ADDRESS, headers=HEADERS, data=json.dumps(data))
    d = r.json()
    return d["content"].strip()


# B_INST, E_INST = "[INST]", "[/INST]"
# B_SYS, E_SYS = "<>\n", "\n<>\n\n"
SYSTEM_PROMPT = """<>\nGiven the following text, generate a list of potential facts from the text. Start every fact with "F: ". Do not repeat facts.\n<>\n\n"""
TEMPLATE = """[INST] {TEXT} [/INST]\n{FACTS}"""

N_SHOTS = 2
few_shots = SYSTEM_PROMPT
for sample in all_samples[:N_SHOTS]:
    text = sample["text"]
    facts = " ".join([f"F: {q}" for q in sample["facts"]])
    few_shots += TEMPLATE.format(TEXT=text, FACTS=facts) + "\n"

print("Your few shot prompt:")
print(few_shots)

N_SAMPLES = 6
for sample in all_samples[N_SHOTS : N_SHOTS + N_SAMPLES]:
    text = sample["text"]
    user_content = few_shots + TEMPLATE.format(TEXT=text, FACTS="")
    print("============================================================")
    print(text)
    print("==========")
    print(get_completion(user_content))
    print("============================================================")

mcminis1/Llama-2-prompting-example.md