Skip to content

Instantly share code, notes, and snippets.

View msaroufim's full-sized avatar
Putting the finishing touches on my robot army

Mark Saroufim msaroufim

Putting the finishing touches on my robot army
View GitHub Profile
(ao) [marksaroufim@devvm4567.ash0 ~/ao/tutorials/quantize_vit (main)]$ python
Downloading: "" to /home/marksaroufim/.cache/torch/hub/checkpoints/vit_b_16-c867db91.pth
100%|█████████████████████████████████████████████████████████████████████████████████| 330M/330M [00:01<00:00, 209MB/s]
AUTOTUNE convolution(1x3x224x224, 768x3x16x16)
triton_convolution_4 0.1184 ms 100.0%
convolution 0.1450 ms 81.7%
triton_convolution_3 0.2024 ms 58.5%
triton_convolution_5 0.2268 ms 52.2%
triton_convolution_6 0.2445 ms 48.4%
*Nim Sum Dim Sum*, a bustling local dumpling restaurant, has two game theory-loving servers named, you guessed it, Alice and Bob. Its dining area can be represented as a two-dimensional grid of \(R\) rows (numbered \(1..R\) from top to bottom) by \(C\) columns (numbered \(1..C\) from left to right\).
Currently, both of them are standing at coordinates \((1, 1)\) where there is a big cart of dim sum. Their job is to work together to push the cart to a customer at coordinates \((R, C)\). To make the job more interesting, they've turned it into a game.
Alice and Bob will take turns pushing the cart. On Alice's turn, the cart must be moved between \(1\) and \(A\) units down. On Bob's turn, the cart must be moved between \(1\) and \(B\) units to the right. The cart may not be moved out of the grid. If the cart is already at row \(R\) on Alice's turn or column \(C\) on Bob's turn, then that person loses their turn.
The "winner" is the person to ultimately move the cart to \((R, C)\) and thus get all the recognit
import torch
# >>> import sys
# >>> size_of_bool = sys.getsizeof(True) # or sys.getsizeof(False)
# >>> print(size_of_bool)
# 28
# Copyright (c) Meta Platforms, Inc. and affiliates.
# All rights reserved.
# This source code is licensed under the license found in the
# LICENSE file in the root directory of this source tree.
import os
import glob
from datetime import datetime
from setuptools import find_packages, setup
Autotune Best Config: BLOCK_M: 32, BLOCK_N: 128, BLOCK_K: 32, SPLIT_K: 1, num_warps: 4, num_ctas: 1, num_stages: 3
Autotune Best Config: BLOCK_M: 16, BLOCK_N: 32, BLOCK_K: 32, SPLIT_K: 1, num_warps: 2, num_ctas: 1, num_stages: 5
Autotune Best Config: BLOCK_M: 32, BLOCK_N: 128, BLOCK_K: 32, SPLIT_K: 1, num_warps: 4, num_ctas: 1, num_stages: 3
~ nvcc -O3 --use_fast_math -o attention_forward -lcublas
⚡ ~ ./attention_forward 1
Using kernel 1
-0.529510 -0.529510
0.889394 0.889394
0.881674 0.881674
0.651789 0.651789
-0.483486 -0.483486
Results match!
block_size 32 | time 7618.906250 ms
import time
from typing import Callable, List
import torch
# Llama-7B
SIZES = [torch.Size([32000, 4096]), torch.Size([4096, 4096]), torch.Size([4096, 4096]), torch.Size([4096, 4096]), torch.Size([4096, 4096]), torch.Size([11008, 4096]), torch.Size([4096, 11008]), torch.Size([11008, 4096]), torch.Size([4096]), torch.Size([4096]), torch.Size([4096, 4096]), torch.Size([4096, 4096]), torch.Size([4096, 4096]), torch.Size([4096, 4096]), torch.Size([11008, 4096]), torch.Size([4096, 11008]), torch.Size([11008, 4096]), torch.Size([4096]), torch.Size([4096]), torch.Size([4096, 4096]), torch.Size([4096, 4096]), torch.Size([4096, 4096]), torch.Size([4096, 4096]), torch.Size([11008, 4096]), torch.Size([4096, 11008]), torch.Size([11008, 4096]), torch.Size([4096]), torch.Size([4096]), torch.Size([4096, 4096]), torch.Size([4096, 4096]), torch.Size([4096, 4096]), torch.Size([4096, 4096]), torch.Size([11008, 4096]), torch.Size([4096, 11008]), torch.Size([11008, 4096]), torch.Size([40
import time
from typing import Callable, List
import torch
# Llama-7B
SIZES = [torch.Size([32000, 4096]), torch.Size([4096, 4096]), torch.Size([4096, 4096]), torch.Size([4096, 4096]), torch.Size([4096, 4096]), torch.Size([11008, 4096]), torch.Size([4096, 11008]), torch.Size([11008, 4096]), torch.Size([4096]), torch.Size([4096]), torch.Size([4096, 4096]), torch.Size([4096, 4096]), torch.Size([4096, 4096]), torch.Size([4096, 4096]), torch.Size([11008, 4096]), torch.Size([4096, 11008]), torch.Size([11008, 4096]), torch.Size([4096]), torch.Size([4096]), torch.Size([4096, 4096]), torch.Size([4096, 4096]), torch.Size([4096, 4096]), torch.Size([4096, 4096]), torch.Size([11008, 4096]), torch.Size([4096, 11008]), torch.Size([11008, 4096]), torch.Size([4096]), torch.Size([4096]), torch.Size([4096, 4096]), torch.Size([4096, 4096]), torch.Size([4096, 4096]), torch.Size([4096, 4096]), torch.Size([11008, 4096]), torch.Size([4096, 11008]), torch.Size([11008, 4096]), torch.Size([40
set -eo pipefail
# Mode: Select which components to install. PyTorch and Intel® Extension for PyTorch* are always installed.
# High bit: 8 7 6 5 4 3 2 1 :Low bit
# | | | | | | | └- torch-ccl
# | | | | | | └--- TorchAudio
# | | | | | └----- TorchVision

How to build a Discord community TL;DR: Be responsive, have a bold raison d’etre, make sure people have low and high effort things to do, impact the real world with as many artifacts as possible and share the impact with external partners.

A lot of the leading applied research in ML these days is happening on Discord so a common question I get asked is “Hey Mark, which Discord group should I join?”. That’s an easy enough question to answer these days just subscribe to but then I always make sure to remind people: “You should probably create your own Discord community” and I feel like people don’t quite like it when I say this because well how do you create a discord community from scratch?

I’ve created 3 communities so far and each one has grown larger more quickly than the last so hopefully some of these lessons apply to you as well.

Robot Overlords: Took about a year to reach 450 people NeurIPS LLM Efficiency Competition: Took about 6 months to reach 1,300 people. Learn m