Skip to content

Instantly share code, notes, and snippets.

View cedrickchee's full-sized avatar
⚒️
⚡ 🦀 🐿️ 🐘 🐳 ⬡ ⚛️ 🚢 🚀 🦄 🍵

Cedric Chee cedrickchee

⚒️
⚡ 🦀 🐿️ 🐘 🐳 ⬡ ⚛️ 🚢 🚀 🦄 🍵
View GitHub Profile
@cedrickchee
cedrickchee / whisperfile.md
Last active August 21, 2024 07:50
Trying whisperfile

Trying whisperfile

llamafile v0.8.13 (and whisperfile) is out:

This release introduces whisperfile which is a single-file implementation of OpenAI's Whisper model. It lets you transcribe speech to text and even translate it too. Our implementation is based off Georgi Gerganov's whisper.cpp project.

The project to turn it into a whisperfile was founded by CJ Pais who's handed over maintenance of his awesome work.

I want to kick the tires of whisperfile. I will transcribe a podcast audio with whisperfile.

@cedrickchee
cedrickchee / nvidia-llama-3.1-minitron.md
Last active August 16, 2024 08:46
NVIDIA developed a method to efficiently create Llama-3.1-Minitron, a smaller & accurate language models by using pruning and knowledge distillation

NVIDIA Developed A Method To Create A Smaller & Accurate LLM, Llama-3.1-Minitron 4B Using Pruning & Distillation

Minitron is an interesting finetune of Llama 3.1 by NVIDIA Research.

The group investigate if pruning an existing LLM and then re-training it with a fraction (<3%) of the original training data can be an effective way to create smaller models, instead of full retraining. They hypothesized that this approach can significantly reduce the training cost while maintaining good performance. They developed a method to efficiently create smaller, accurate language models by using structured weight pruning and knowledge distillation, offering several advantages:

  • 16% improvement in MMLU scores
  • Up to 40x fewer training tokens per model
  • Compute cost savings of 1.8x for training the full model family
@cedrickchee
cedrickchee / context-caching-claude.md
Created August 16, 2024 07:01
Prompt caching with Claude

Prompt caching with Anthropic Claude

🤯 The Claude API has introduced prompt caching, enabling you to mark and reuse portions of long prompts, such as large documents provided as context. Claude caches these prompts for up to 5 minutes, resulting in significantly faster processing times and discounted costs (~10% of the original cost) for any subsequent prompts that reuse the cached context.

✨ With the ability to load vast amounts of data into the context window, this enables exciting possibilities, such as:

  • Caching content libraries, such as entire books or coding documentation, and retrieving specific information with ease through multiple API calls
  • Providing large examples for a specific task, thereby achieving results that surpass traditional fine-tuning methods with significantly less effort
  • Sharing entire codebases with the LLM, enabling more efficient collaboration
@cedrickchee
cedrickchee / vram.rb
Created August 4, 2024 14:09 — forked from jrruethe/vram.rb
Calculate VRAM requirements for LLM models
#!/usr/bin/env ruby
# https://asmirnov.xyz/vram
# https://vram.asmirnov.xyz
require "fileutils"
require "json"
require "open-uri"
# https://huggingface.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator/blob/main/index.html
@cedrickchee
cedrickchee / llama-31-405b.md
Last active July 24, 2024 04:54
🐐 Llama 3.1 405B matches or beats the best closed models

🐐 Llama 3.1 405B matches or beats the best closed models

Llama 3.1 405B, 70B, 8B is officially out. Llama 3.1 405B is the first openly available model that matches or beats the best closed models across many benchmarks.

Model evaluations

The performance of 405B model is very similar to Claude 3.5 Sonnet. It beats GPT4 on every single benchmark but one.

70B model has an even more impressive performance. It is significantly better than GPT-3.5 Turbo and beats Nemotron 4 340B on many tests.

@cedrickchee
cedrickchee / analysis-llama-3-405b.md
Last active July 23, 2024 03:58
Llama 3.1 Leaks: SoTA Open Model 405B & What We Know So Far

Llama 3.1 Leaks: SoTA Open Model 405B & What We Know So Far

TLDR: 8B gets a big bump across the board, 70B instruct shows minor improvements, and 405B is the SoTA open model. But 405B still lags behind flagship models.

Here are the notable upgrades:

  • Every model now supports 128k context length (up from 8k)
  • Trained on a massive ~15T tokens of public data
  • Fine-tuning data includes publicly available instruction datasets and over 25M synthetically generated examples
  • Multilingual support for 7 languages: French, German, Hindi, Italian, Portuguese, Spanish, and Thai
@cedrickchee
cedrickchee / co-intelligence-book-review.md
Created July 16, 2024 09:26
Co-Intelligence: Living and Working with AI - A Book Review

Co-Intelligence: Living and Working with AI - A Book Review

In 200 Words

If you're just dipping your toes into the AI pool, Ethan Mollick's "Co-Intelligence" is a solid starting point. But let's be clear — when we're talking AI here, we're really discussing the cutting-edge innovations: those Large Language Model (LLM) powered Generative AI applications that are creating buzz in the tech world.

Book cover Sample pages
book-cover book-pages
@cedrickchee
cedrickchee / vibechecks_latest_llms_coding_skills.md
Last active July 17, 2024 05:45
Vibe checking Claude 3.5, DeepSeek-Coder-V2, and GPT-4o for "alien" Coding Skills

Vibe checking Claude 3.5, DeepSeek-Coder-V2, and GPT-4o for "alien" Coding Skills

Introduction

In the world of AI and LLM, it's often said that "vibe checks" can provide valuable insights into model performance. With this in mind, I've conducted a brief evaluation of Claude 3.5 Sonnet, DeepSeek-Coder-V2, and GPT-4o to assess their capabilities in solving complex coding problems. This evaluation aims to provide a better intuition of these models' strengths and weaknesses, verifying findings published in public benchmarks and leaderboards.

Evaluation Design

For this assessment, I selected a challenging problem from recent competitive coding competitions (2023 onwards). The chosen problem, "Power of Heroes", is a dynamic programming challenge that requires advanced knowledge of algorithms and data structures. This problem was selected because:

@cedrickchee
cedrickchee / ai-news-tldr-agent.md
Last active July 7, 2024 13:14
AI News TLDR App [Ideas, WIP]

AI News TLDR App

Can't keep up with the exponential progress of AI and LLM?

Fret not. We got you!

This is a minimal working app that goes thru all top Tweets and Reddits and summarizes LLM/GenAI news and what people are talking about. And send you a roundup daily.

You can think of it like some kind of generated AI newsletter.

@cedrickchee
cedrickchee / claude-3.5-sonnet.md
Created June 21, 2024 06:29
Claude 3.5 Sonnet

Claude 3.5 Sonnet

Anthropic introducing Claude 3.5 Sonnet today: https://www.anthropic.com/news/claude-3-5-sonnet

👑 We now have a true challenger to GPT-4o. Claude 3.5 Sonnet takes the top spot on the leaderboards. It surpasses GPT-4o by 3.3 points on the MixEval-Hard and leads in almost all sub-benchmarks.

🏆 MixEval leaderboard: https://mixeval.github.io/#leaderboard (no more waiting for days for the LMSys Arena leaderboard update)