Skip to content

Instantly share code, notes, and snippets.

View BlackHC's full-sized avatar

Andreas Kirsch BlackHC

View GitHub Profile
@BlackHC
BlackHC / verify_oai_fine_tuning_jsonl.py
Created August 27, 2024 17:23
Verify OAI fine-tuning JSONL
# /// script
# requires-python = ">=3.10"
# dependencies = [
# "tiktoken",
# "typer",
# "numpy",
# ]
# ///
"""
Verify and analyze a JSONL dataset for fine-tuning with OpenAI models.
@BlackHC
BlackHC / imagenet_v2.py
Created August 4, 2024 11:59
ImageNet v2 Loader for PyTorch
import typing
import datasets
import torch
import torch.utils.data
def load_imagenet_v2(
split: typing.Literal[
"threshold0.7", "top-images", "matching-frequency"
] = "threshold0.7"
@BlackHC
BlackHC / spike-aaronson-oracle.py
Created March 11, 2024 16:50
Scott Aaronson Oracle
# %%
import collections
import random
def predict_next_letter(model, five_gram):
if len(five_gram) != 5:
raise ValueError("five_gram must be of length 5")
m = model[tuple(five_gram)]
return m[True] > m[False]
#!/bin/bash
pdffile="$1"
prefix=$(basename "${pdffile}" .pdf)
convert -density 300 "${pdffile}" "${prefix}-%03d.png"
mogrify -background white -flatten "${prefix}-*.png"
total_pages=$(ls ${prefix}-*.png | wc -l)
for ((i=0; i<$total_pages; i+=2)); do
# Format page numbers
@BlackHC
BlackHC / early_stopping.py
Created September 9, 2023 18:14
Generic Early Stopping Generator
"""
MIT License
Copyright (c) 2023 Andreas 'blackhc' Kirsch
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
@BlackHC
BlackHC / ipython_html_console.py
Last active May 31, 2023 10:03
Render HTML to IPython terminals
# !pip install rich markdownify
# https://chat.openai.com/share/5f3f2019-2051-4217-93ea-c926fa3c2749
import markdownify
from IPython.core.getipython import get_ipython
from IPython.display import HTML
from rich import print
from rich.console import Console
from rich.markdown import Markdown
@BlackHC
BlackHC / nesk.txt
Last active August 14, 2024 09:25
Numbers everyone (programmer) should know (2023)
Latency Comparison Numbers (~2023)
----------------------------------
L1 cache reference 0.5 ns
Branch mispredict 5 ns
L2 cache reference 7 ns 14x L1 cache
Mutex lock/unlock 25 ns
Main memory reference 100 ns 20x L2 cache, 200x L1 cache
Compress 1K bytes with Snappy 3,000 ns 3 µs
Read 1 MB sequentially from memory 20,000 ns 20 µs ~50GB/sec DDR5
Read 1 MB sequentially from NVMe 100,000 ns 100 µs ~10GB/sec NVMe, 5x memory
@BlackHC
BlackHC / latency.txt
Created May 18, 2023 18:53 — forked from jboner/latency.txt
Latency Numbers Every Programmer Should Know
Latency Comparison Numbers (~2012)
----------------------------------
L1 cache reference 0.5 ns
Branch mispredict 5 ns
L2 cache reference 7 ns 14x L1 cache
Mutex lock/unlock 25 ns
Main memory reference 100 ns 20x L2 cache, 200x L1 cache
Compress 1K bytes with Zippy 3,000 ns 3 us
Send 1K bytes over 1 Gbps network 10,000 ns 10 us
Read 4K randomly from SSD* 150,000 ns 150 us ~1GB/sec SSD
@BlackHC
BlackHC / cached_chat_open_ai.py
Last active May 12, 2023 15:08
Until LangChain adds support for a ChatOpenAI cache, here is a drop-in class that adds support for it
# PoC to cache prompts. Drop in your code.
# Andreas 'blackhc' Kirsch, 2023
from typing import List, Optional
import langchain
from langchain import OpenAI
from langchain.cache import SQLiteCache
from langchain.schema import (
AIMessage,
### Keybase proof
I hereby claim:
* I am blackhc on github.
* I am akirsch (https://keybase.io/akirsch) on keybase.
* I have a public key ASAL244d-fNnU5c_WrWOrPiQYhXXjYWWL9chxLqzoCIgkgo
To claim this, I am signing this object: