Skip to content

Instantly share code, notes, and snippets.

View gglin001's full-sized avatar

Allen Guo gglin001

View GitHub Profile
@fxkamd
fxkamd / TinyGrad-notes.md
Last active August 28, 2024 03:50
Observations about HSA and KFD backends in TinyGrad

This is Felix Kuehling, long time KFD driver architect. I started looking into the TinyGrad source code yesterday, focusing on ops_kfd.py, ops_hsa.py and driver/hsa.py, to understand how TinyGrad talks to our HW and help with the ongoing debugging effort from the top down. This analysis is based on this commit: https://github.com/tinygrad/tinygrad/tree/3de855ea50d72238deac14fc05cda2a611497778

I'm intrigued by the use of Python for low-level programming. I think I can learn something from your use of ctypes and clang2py for fast prototyping and test development. I want to share some observations based on my initial review.

ops_kfd looks pretty new, and I see many problems with it based on my long experience working on KFD. I think it's interesting, but probably not relevant for the most pressing problems at hand, so I'll cover that last.

ops_hsa uses ROCr APIs to manage GPU memory, create a user mode AQL queue for GPU kernel dispatch, async SDMA copies, and signal-based synchronization with barrier packets

@bjacob
bjacob / README.md
Last active August 9, 2024 08:30
Exploring IREE CPU microkernels on a simple matmul example

Exploring IREE CPU microkernels on a simple matmul example

Basic setup, command lines

Source file: matmul.mlir:

func.func @matmul_dynamic(%lhs: tensor<?x?xf32>, %rhs: tensor<?x?xf32>, %acc: tensor<?x?xf32>) -> tensor<?x?xf32> {
  %result = linalg.matmul ins(%lhs, %rhs: tensor<?x?xf32>, tensor<?x?xf32>) outs(%acc: tensor<?x?xf32>) -> tensor<?x?xf32>
  return %result: tensor<?x?xf32>

Some remarks on Large Language Models

Yoav Goldberg, January 2023

Audience: I assume you heard of chatGPT, maybe played with it a little, and was imressed by it (or tried very hard not to be). And that you also heard that it is "a large language model". And maybe that it "solved natural language understanding". Here is a short personal perspective of my thoughts of this (and similar) models, and where we stand with respect to language understanding.

Intro

Around 2014-2017, right within the rise of neural-network based methods for NLP, I was giving a semi-academic-semi-popsci lecture, revolving around the story that achieving perfect language modeling is equivalent to being as intelligent as a human. Somewhere around the same time I was also asked in an academic panel "what would you do if you were given infinite compute and no need to worry about labour costs" to which I cockily responded "I would train a really huge language model, just to show that it doesn't solve everything!". We

@Chillee
Chillee / 1-pw_op_fusion.py
Last active August 1, 2024 18:11
PT 2.0 Benchmarks
import torch
import torch._inductor.config
import time
torch._inductor.config.triton.cudagraphs = False
torch.set_float32_matmul_precision('high')
def bench(f, name=None, iters=100, warmup=5, display=True, profile=False):
for _ in range(warmup):
f()
@Narsil
Narsil / pure_torch.py
Created November 10, 2022 15:06
Loading a safetensors file with pure torch only
import mmap
import torch
import json
import os
from huggingface_hub import hf_hub_download
def load_file(filename, device):
with open(filename, mode="r", encoding="utf8") as file_obj:
with mmap.mmap(file_obj.fileno(), length=0, access=mmap.ACCESS_READ) as m:
@res0nat0r
res0nat0r / colima.md
Last active September 13, 2024 10:48
Set proxy in Colima Docker container

abiosoft/colima#294 (comment)


Note: this assumes Colima v0.4.0 or newer.

SSH into the VM colima ssh

Edit docker init script sudo vi /etc/init.d/docker.

@b01
b01 / download-vs-code-server.sh
Last active September 5, 2024 21:23
Linux script to download latest VS Code Server, good for Docker (tested in Alpine).
#!/bin/sh
# Copyright 2023 Khalifah K. Shabazz
#
# Permission is hereby granted, free of charge, to any person obtaining a
# copy of this software and associated documentation files (the “Software”),
# to deal in the Software without restriction, including without limitation
# the rights to use, copy, modify, merge, publish, distribute, sublicense,
# and/or sell copies of the Software, and to permit persons to whom the
# Software is furnished to do so, subject to the following conditions:
@x0nu11byt3
x0nu11byt3 / elf_format_cheatsheet.md
Created February 27, 2021 05:26
ELF Format Cheatsheet

ELF Format Cheatsheet

Introduction

Executable and Linkable Format (ELF), is the default binary format on Linux-based systems.

ELF

Compilation

@apivovarov
apivovarov / riscv-debian.md
Last active May 23, 2024 05:49
Run RISC-V Debian via QEMU #riscv #qemu

Run RISC-V Debian GNU/Linux bullseye/sid via QEMU.

  1. Run the latest version of Debian on regular x86_64 box (at least ver 10 Buster, better to run ver 11 Bullseye)
  2. If opensbi and u-boot-qemu packages are not found add testing apt repository (aka bullseye). Or even unstable (aka sid)
sudo vi /etc/apt/sources.list

# Add testing repo (or unstable)
deb http://cdn-aws.deb.debian.org/debian testing main
deb-src http://cdn-aws.deb.debian.org/debian testing main
@geerlingguy
geerlingguy / nvidia-gt710-arm-pi-setup.sh
Last active April 14, 2024 16:26
Set up the Nvidia GeForce GT 710 on Raspberry Pi Compute Module 4
#!/bin/bash
# Attempt to set up the Nvidia GeForce GT 710 on a Pi CM4.
#
# I have tried both armv7l and aarch64 versions of the proprietary driver, in
# addition to the nouveau open source driver (which needs to be compiled into
# a custom Raspberry Pi kernel).
#
# tl;dr - None of the drivers worked :P