Allen Guo gglin001

This is Felix Kuehling, long time KFD driver architect. I started looking into the TinyGrad source code yesterday, focusing on ops_kfd.py, ops_hsa.py and driver/hsa.py, to understand how TinyGrad talks to our HW and help with the ongoing debugging effort from the top down. This analysis is based on this commit: https://github.com/tinygrad/tinygrad/tree/3de855ea50d72238deac14fc05cda2a611497778

I'm intrigued by the use of Python for low-level programming. I think I can learn something from your use of ctypes and clang2py for fast prototyping and test development. I want to share some observations based on my initial review.

ops_kfd looks pretty new, and I see many problems with it based on my long experience working on KFD. I think it's interesting, but probably not relevant for the most pressing problems at hand, so I'll cover that last.

ops_hsa uses ROCr APIs to manage GPU memory, create a user mode AQL queue for GPU kernel dispatch, async SDMA copies, and signal-based synchronization with barrier packets

Exploring IREE CPU microkernels on a simple matmul example

Basic setup, command lines

Source file: matmul.mlir:

func.func @matmul_dynamic(%lhs: tensor<?x?xf32>, %rhs: tensor<?x?xf32>, %acc: tensor<?x?xf32>) -> tensor<?x?xf32> {
  %result = linalg.matmul ins(%lhs, %rhs: tensor<?x?xf32>, tensor<?x?xf32>) outs(%acc: tensor<?x?xf32>) -> tensor<?x?xf32>
  return %result: tensor<?x?xf32>

Some remarks on Large Language Models

Yoav Goldberg, January 2023

Audience: I assume you heard of chatGPT, maybe played with it a little, and was imressed by it (or tried very hard not to be). And that you also heard that it is "a large language model". And maybe that it "solved natural language understanding". Here is a short personal perspective of my thoughts of this (and similar) models, and where we stand with respect to language understanding.

Intro

Around 2014-2017, right within the rise of neural-network based methods for NLP, I was giving a semi-academic-semi-popsci lecture, revolving around the story that achieving perfect language modeling is equivalent to being as intelligent as a human. Somewhere around the same time I was also asked in an academic panel "what would you do if you were given infinite compute and no need to worry about labour costs" to which I cockily responded "I would train a really huge language model, just to show that it doesn't solve everything!". We

abiosoft/colima#294 (comment)

Note: this assumes Colima v0.4.0 or newer.

SSH into the VM colima ssh

Edit docker init script sudo vi /etc/init.d/docker.

ELF Format Cheatsheet

Introduction

Executable and Linkable Format (ELF), is the default binary format on Linux-based systems.

Compilation

Run RISC-V Debian GNU/Linux bullseye/sid via QEMU.

Run the latest version of Debian on regular x86_64 box (at least ver 10 Buster, better to run ver 11 Bullseye)
If opensbi and u-boot-qemu packages are not found add testing apt repository (aka bullseye). Or even unstable (aka sid)

sudo vi /etc/apt/sources.list

# Add testing repo (or unstable)
deb http://cdn-aws.deb.debian.org/debian testing main
deb-src http://cdn-aws.deb.debian.org/debian testing main

	import torch
	import torch._inductor.config
	import time

	torch._inductor.config.triton.cudagraphs = False
	torch.set_float32_matmul_precision('high')

	def bench(f, name=None, iters=100, warmup=5, display=True, profile=False):
	for _ in range(warmup):
	f()

	import mmap
	import torch
	import json
	import os
	from huggingface_hub import hf_hub_download


	def load_file(filename, device):
	with open(filename, mode="r", encoding="utf8") as file_obj:
	with mmap.mmap(file_obj.fileno(), length=0, access=mmap.ACCESS_READ) as m:

	#!/bin/sh

	# Copyright 2023 Khalifah K. Shabazz
	#
	# Permission is hereby granted, free of charge, to any person obtaining a
	# copy of this software and associated documentation files (the “Software”),
	# to deal in the Software without restriction, including without limitation
	# the rights to use, copy, modify, merge, publish, distribute, sublicense,
	# and/or sell copies of the Software, and to permit persons to whom the
	# Software is furnished to do so, subject to the following conditions:

	#!/bin/bash

	# Attempt to set up the Nvidia GeForce GT 710 on a Pi CM4.
	#
	# I have tried both armv7l and aarch64 versions of the proprietary driver, in
	# addition to the nouveau open source driver (which needs to be compiled into
	# a custom Raspberry Pi kernel).
	#
	# tl;dr - None of the drivers worked :P