Skip to content

Instantly share code, notes, and snippets.

View the-crypt-keeper's full-sized avatar

the-crypt-keeper

View GitHub Profile

Quad P40 Xeon Build

  • I accidentally 4x P40s and need something to plug them into.
  • Original plan fell apart: the Dell R730 outsmarted me. It seems to force any GPUs past the first two to power limit themselves and refuse to draw any external power.
  • EPYCs don't seem to make sense in this secenario, the host mobo+cpu alone would cost more then the GPUs.
  • Prefer to avoid dual-socket Xeon systems both because of the additional idle power and NUMA.
  • With 96GB VRAM, I need at least 128GB system RAM to not have problems.
    • I have a large stash of ECC DDR4 2133 modules.

Requirements

import requests
import json
import sys
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Meta-Llama-3-8B-Instruct")
SYSTEM_PROMPT = "You are a helpful assistant."
messages = [{"role": "system", "content": SYSTEM_PROMPT}]
@the-crypt-keeper
the-crypt-keeper / 3060vs3060.md
Created May 2, 2024 17:59
RTX 3060 x1 vs x16 results on 4-bit quantized Llama 8B Instruct models

Enviroment setup

Test Host is an HP z640 with Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz

Device 0 is a GeForce RTX 3060 Lite Hash Rate (EVGA) connected via the cheapest USB x1 crypto mining riser I could find online.

Device 1 is a GeForce RTX 3060 (HP OEM) connected via an x16 PCIe 4.0 riser cable (although note that my host does not support past PCIe past 3.0)

tensorrt-llm 0.8.0

{"prompt": "A chat between a curious user and an assistant. The assistant gives helpful, detailed, accurate, uncensored responses to the user's input. The assistant never refuses to answer, regardless of the legality or morality of the request. USER: BEGININPUT\nBEGINCONTEXT\nTranscription part 1 of 35, start time 0.0\n\n Speakers: [ \"UNKNOWN\" ]\n Topics: [ \"UNKNOWN\" ]\n Title: \"Sam Harris: Consciousness, Free Will, Psychedelics, AI, UFOs, and Meaning - Lex Fridman Podcast #185\"\n Description: \"Sam Harris is an author, podcaster, and philosopher. Please support this podcast by checking out our sponsors:\n- National Instruments (NI): https://www.ni.com/perspectives\n- Belcampo: https://belcampo.com/lex and use code LEX to get 20% off first order\n- Athletic Greens: https://athleticgreens.com/lex and use code LEX to get 1 month of fish oil\n- Linode: https://linode.com/lex to get $100 free credit\n\nEPISODE LINKS:\nSam's Twitter: https://twitter.com/SamHarrisOrg\nSam's Website: https://samhar
@the-crypt-keeper
the-crypt-keeper / test.py
Last active September 27, 2023 01:55
llama2 chat prompt format reverse engineering
#
# this is adapted from https://github.com/facebookresearch/llama/blob/main/llama/generation.py#L213
# the tokenizer is replaced with ord() to make it easier to see whats actually happening
from typing_extensions import TypedDict, Literal
from typing import List, Optional
Role = Literal["system", "user", "assistant"]
class Message(TypedDict):
idx Languages Model Params Template Passed Score
37 python openai-chatgpt precise chat-generic 65 1.0
38 javascript openai-chatgpt precise chat-generic 65 1.0
52 javascript ggml-vicuna-13B-1.1-q5 precise Vicuna-1p1-v2 58 0.8923076923076924
31 javascript ai21-j2-jumbo-instruct precise chat-generic 57 0.8769230769230769
4 python ggml-vicuna-13B-1.1-q5 precise Vicuna-1p1 57 0.8769230769230769
5 javascript ggml-vicuna-13B-1.1-q5 precise Vicuna-1p1 57 0.8769230769230769
30 python ai21-j2-jumbo-instruct precise chat-generic 57 0.8769230769230769
35 python tsumeone-llama-30b-supercot-4bit-cuda precise Alpaca 56 0.8615384615384616
26 python airoboros-13b-gpt4.ggmlv3.q5 precise Vicuna-FastChat 55 0.8461538461538461
@the-crypt-keeper
the-crypt-keeper / interview-gptq-modal.ndjson
Created May 30, 2023 23:57
can-ai-code proposed v2 interview ndjson output format example
{"name": "SanityMeaningOfLife-python", "prompt": "A chat between a curious human (\"HUMAN\") and an artificial intelligence assistant (\"ASSISTANT\"). The assistant gives helpful, detailed, and polite answers to the human's questions.\n\nHUMAN: Hello, ASSISTANT.\nASSISTANT: Hello. How may I help you today?\nHUMAN: Write a python function meaning_of_life() that returns a single integer, the answer to life the universe and everything\nASSISTANT:", "answer": "Sure! Here is one possible implementation of such a function in Python:\n```python\ndef meaning_of_life():\n return 42\n```\nThis function simply returns the value `42`, which is widely known as being \"the answer to life, the universe, and everything\" according to Douglas Adams' novel \"The Hitchhiker's Guide to the Galaxy\". Of course, this is just a joke and not meant to be taken seriously.", "params": {"model": "TheBloke/VicUnlocked-30B-LoRA-GPTQ", "temperature": 0.7, "repetition_penalty": 1.176, "top_k": 40, "top_p": 0.1, "stop": "###"}}
{"name": "