Cedric Chee cedrickchee

Trying whisperfile

llamafile v0.8.13 (and whisperfile) is out:

This release introduces whisperfile which is a single-file implementation of OpenAI's Whisper model. It lets you transcribe speech to text and even translate it too. Our implementation is based off Georgi Gerganov's whisper.cpp project.

The project to turn it into a whisperfile was founded by CJ Pais who's handed over maintenance of his awesome work.

I want to kick the tires of whisperfile. I will transcribe a podcast audio with whisperfile.

NVIDIA Developed A Method To Create A Smaller & Accurate LLM, Llama-3.1-Minitron 4B Using Pruning & Distillation

Minitron is an interesting finetune of Llama 3.1 by NVIDIA Research.

The group investigate if pruning an existing LLM and then re-training it with a fraction (<3%) of the original training data can be an effective way to create smaller models, instead of full retraining. They hypothesized that this approach can significantly reduce the training cost while maintaining good performance. They developed a method to efficiently create smaller, accurate language models by using structured weight pruning and knowledge distillation, offering several advantages:

16% improvement in MMLU scores
Up to 40x fewer training tokens per model
Compute cost savings of 1.8x for training the full model family

Prompt caching with Anthropic Claude

🤯 The Claude API has introduced prompt caching, enabling you to mark and reuse portions of long prompts, such as large documents provided as context. Claude caches these prompts for up to 5 minutes, resulting in significantly faster processing times and discounted costs (~10% of the original cost) for any subsequent prompts that reuse the cached context.

✨ With the ability to load vast amounts of data into the context window, this enables exciting possibilities, such as:

Caching content libraries, such as entire books or coding documentation, and retrieving specific information with ease through multiple API calls
Providing large examples for a specific task, thereby achieving results that surpass traditional fine-tuning methods with significantly less effort
Sharing entire codebases with the LLM, enabling more efficient collaboration

🐐 Llama 3.1 405B matches or beats the best closed models

Llama 3.1 405B, 70B, 8B is officially out. Llama 3.1 405B is the first openly available model that matches or beats the best closed models across many benchmarks.

Model evaluations

The performance of 405B model is very similar to Claude 3.5 Sonnet. It beats GPT4 on every single benchmark but one.

70B model has an even more impressive performance. It is significantly better than GPT-3.5 Turbo and beats Nemotron 4 340B on many tests.

Llama 3.1 Leaks: SoTA Open Model 405B & What We Know So Far

TLDR: 8B gets a big bump across the board, 70B instruct shows minor improvements, and 405B is the SoTA open model. But 405B still lags behind flagship models.

Here are the notable upgrades:

Every model now supports 128k context length (up from 8k)
Trained on a massive ~15T tokens of public data
Fine-tuning data includes publicly available instruction datasets and over 25M synthetically generated examples
Multilingual support for 7 languages: French, German, Hindi, Italian, Portuguese, Spanish, and Thai

Co-Intelligence: Living and Working with AI - A Book Review

In 200 Words

If you're just dipping your toes into the AI pool, Ethan Mollick's "Co-Intelligence" is a solid starting point. But let's be clear — when we're talking AI here, we're really discussing the cutting-edge innovations: those Large Language Model (LLM) powered Generative AI applications that are creating buzz in the tech world.

Book cover	Sample pages

Vibe checking Claude 3.5, DeepSeek-Coder-V2, and GPT-4o for "alien" Coding Skills

Introduction

In the world of AI and LLM, it's often said that "vibe checks" can provide valuable insights into model performance. With this in mind, I've conducted a brief evaluation of Claude 3.5 Sonnet, DeepSeek-Coder-V2, and GPT-4o to assess their capabilities in solving complex coding problems. This evaluation aims to provide a better intuition of these models' strengths and weaknesses, verifying findings published in public benchmarks and leaderboards.

Evaluation Design

For this assessment, I selected a challenging problem from recent competitive coding competitions (2023 onwards). The chosen problem, "Power of Heroes", is a dynamic programming challenge that requires advanced knowledge of algorithms and data structures. This problem was selected because:

AI News TLDR App

Can't keep up with the exponential progress of AI and LLM?

Fret not. We got you!

This is a minimal working app that goes thru all top Tweets and Reddits and summarizes LLM/GenAI news and what people are talking about. And send you a roundup daily.

You can think of it like some kind of generated AI newsletter.

Claude 3.5 Sonnet

Anthropic introducing Claude 3.5 Sonnet today: https://www.anthropic.com/news/claude-3-5-sonnet

👑 We now have a true challenger to GPT-4o. Claude 3.5 Sonnet takes the top spot on the leaderboards. It surpasses GPT-4o by 3.3 points on the MixEval-Hard and leads in almost all sub-benchmarks.

🏆 MixEval leaderboard: https://mixeval.github.io/#leaderboard (no more waiting for days for the LMSys Arena leaderboard update)

	#!/usr/bin/env ruby

	# https://asmirnov.xyz/vram
	# https://vram.asmirnov.xyz

	require "fileutils"
	require "json"
	require "open-uri"

	# https://huggingface.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator/blob/main/index.html