tyoc213 tyoc213

Programming is about information not data, or: you might not need dependent types

When I took "Fundamentals of Computer Science" in college, my professor was very adamant about the distinction between data and information and about how data doesn't have any inherent meaning. At the time, it seemed a bit silly to me how much emphasis he put on such a seemingly insignificant difference.

In retrospect, I think he was exactly right about this and I wish more programmers took it to heart.

Data is something you can store in a computer, such as, let's say, the byte 0b01000001.

Accelerating Discrete Program Search

I am investigating how to use Bend (a parallel language) to accelerate Symbolic AI; in special, Discrete Program Search. Basically, think of it as an alternative to LLMs, GPTs, NNs, that is also capable of generating code, but by entirely different means. This kind of approach was never scaled with mass compute before - it wasn't possible! - but Bend changes this. So, my idea was to do it, and see where it goes.

Now, while I was implementing some candidate algorithms on Bend, I realized that, rather than mass parallelism, I could use an entirely different mechanism to speed things up: SUP Nodes. Basically, it is a feature that Bend inherited from its underlying model ("Interaction Combinators") that, in simple terms, allows us to combine multiple functions into a single superposed one, and apply them all to an argument "at the same time". In short, it allows us to call N functions at a fraction of the expected cost. Or, in simple terms: why parallelize when we can sha

Notes / Links about Stable Diffusion VAE

Stable Diffusion's VAE is a neural network that encodes images into a compressed "latent" format and decodes them back. The encoder performs 48x lossy compression, and the decoder generates new detail to fill in the gaps.

(Calling this model a "VAE" is sort of a misnomer - it's an encoder with some very slight KL regularization, and a conditional GAN decoder)

This document is a big pile of various links with more info.

Variational Autoencoders Will Never Work

So you want to generate images with neural networks. You're in luck! VAEs are here to save the day. They're simple to implement, they generate images in one inference step (unlike those awful slow autoregressive models) and (most importantly) VAEs are 🚀🎉🎂🥳 theoretically grounded 🚀🎉🎂🥳 (unlike those scary GANs - don't look at the GANs)!

The idea

The idea of VAE is so simple, even an AI chatbot could explain it:

Your goal is to train a "decoder" neural network that consumes blobs of random noise from a fixed distribution (like torch.randn(1024)), interprets that noise as decisions about what to generate, and produces corresponding real-looking images. You want to train this network with nice simple image-space MSE loss against your dataset of real images.

Reinforcement Learning for Language Models

Yoav Goldberg, April 2023.

Why RL?

With the release of the ChatGPT model and followup large language models (LLMs), there was a lot of discussion of the importance of "RLHF training", that is, "reinforcement learning from human feedback". I was puzzled for a while as to why RL (Reinforcement Learning) is better than learning from demonstrations (a.k.a supervised learning) for training language models. Shouldn't learning from demonstrations (or, in language model terminology "instruction fine tuning", learning to immitate human written answers) be sufficient? I came up with a theoretical argument that was somewhat convincing. But I came to realize there is an additional argumment which not only supports the case of RL training, but also requires it, in particular for models like ChatGPT. This additional argument is spelled out in (the first half of) a talk by John Schulman from OpenAI. This post pretty much

List of good image generator training logs

A list of public training logs from neural network image generation models, since I think they're interesting.

The Criteria

Publicly accessible link
Losses plotted every so often
Samples generated every so often
Nontrivial dataset (i.e. not MNIST - 64x64 output RGB or better)

	import torch
	import torch.nn.functional as F
	import triton
	import triton.language as tl
	from triton.ops.matmul import matmul as triton_matmul
	from triton.ops.matmul import _kernel
	from triton import Config
	from torch._inductor import config
	from torch import _dynamo
	torch._inductor.config.coordinate_descent_tuning = True

	#!/bin/zsh
	# WARNING! The script is meant to show how and what can be disabled. Don’t use it as it is, adapt it to your needs.
	# Credit: Original idea and script disable.sh by pwnsdx https://gist.github.com/pwnsdx/d87b034c4c0210b988040ad2f85a68d3
	# Disabling unwanted services on macOS Big Sur (11), macOS Monterey (12), macOS Ventura (13) and macOS Sonoma (14)
	# Disabling SIP is required ("csrutil disable" from Terminal in Recovery)
	# Modifications are written in /private/var/db/com.apple.xpc.launchd/ disabled.plist, disabled.501.plist
	# To revert, delete /private/var/db/com.apple.xpc.launchd/ disabled.plist and disabled.501.plist and reboot; sudo rm -r /private/var/db/com.apple.xpc.launchd/*


	# user

	function svg2png(svgContent, width, height, callback) {
	// svgContent should be in base64
	let svgData = svgContent;
	let canvas = document.createElement("canvas");
	let context = canvas.getContext("2d");

	canvas.width = width;
	canvas.height = height;

	let image = new Image();