tdunning’s gists

tdunning / content-based recommendation paragraphs

Created May 23, 2023 18:23

	The idea of content-based recommendation is that instead of looking purely at a history of
	how users interact with items where both users and items are considered as things we know
	nothing about (other than their interactions), we can consider the features of the items.
	By content here, we might consider actual textual descriptions, but we might also consider
	more structured information about the objects like their color or whether they are shoes,
	books or music.

	If we look at the content associated with items, we can restate the user x item history as
	a user x content-feature history. That is to say that we can look at what content features
	our users interacted with as opposed to which items. Essentially, we are recommending features

tdunning / gist:d1970aec133e96fe4c8cbb4515ecb8aa

Created March 16, 2023 01:58

	Pragma version;
	CREATE TABLE distributors (
	did integer CHECK (did > 100),
	name varchar(40)
	);

	insert into distributors values (200, 'a');
	insert into distributors values (201, 'b');

	select min(columns('d.*')) from distributors;

tdunning / csv-sample.csv

Created January 23, 2022 21:11

random data in CSV form

x1	x2	x3
0.7231422916301575	0.819657781416707	0.6567508886461839
0.4020425739176958	0.1549076251851813	0.4282647678658029
0.4629109586444531	0.9094363294197141	0.1236688659876839
0.747467460858015	0.2428975528400832	0.6360313817514556

tdunning / alpha-is-not-a

Last active November 22, 2021 19:40

	julia> A = 3 # this is \Alpha
	3

	julia> Α = 4 # this is A
	4

	julia> Α == A # they aren't the same
	false

	julia> x′ = rand(2,2) # this is x\prime

tdunning / median-error.r

Created June 18, 2021 22:00

	library (dplyr)

	data = read.csv('median-error.csv')

	png("max-error-uniform.png", width=1200, height=1000, pointsize=25)
	i = -3.8
	boxplot(abs(error) ~ delta, (data %>% filter(n0==20)), ylim=c(0, 0.05), xlim=c(0.6,4.4), boxwex=0.1, at=(1:4)+i/11, xaxt='n', xlab=expression(delta), cex.lab=1.4)
	axis(side=1, at=1:4, labels=c(50,100,200,500))

	for (nx in c(20, 50, 100, 1000, 10000, 100000)) {

tdunning / figure.r

Created June 18, 2021 07:30

Snippet of R to recreate an analysis of t-digest interpolation on real data

	# Analysis of how two t-digests see some sample data
	png("figure.png", width=1200, height=1000, points=30)
	# the first few actual data points with filler for the remainder
	d = c(241, 543, 575, 702, 890, 1530, 1940, 2166, 2168, rep(3000,33))
	# the cumulative distribution function
	f = ecdf(d)
	# plot the actual CDF
	plot(x=d, y=f(d), xlim = c(700, 2300), ylim = c(0.08, 0.25), type='s',
	xlab="Sample value", ylab="Cumulative Distribution Function",
	cex.lab=1.3)

tdunning / lorenz-animator.jl

Created May 4, 2021 00:49

Animates the evolution of an initially tight group of points ... my intro to Julia


	using DifferentialEquations
	using Plots
	using Statistics
	using LinearAlgebra

	function lorenz!(du, u, p, t)
	x, y, z = u
	σ, ρ, β = p

tdunning / shift-detection.r

Last active December 6, 2020 02:15

Sample code that shows how distributional changes in a single tail can be detected accurately using counts targeted at particular parts of a reference dataset

	### Draws a figure illustrating change detection in the distribution of synthetic data.
	### Each dot represents a single time period with 1000 samples. Before the change,
	### the data is sampled from a unit normal distribution. After the change, 20 samples
	### in each time period are taken from N(3,1). Comparing counts with a chi^2 test that
	### is robust to small expected counts robustly detects this shift.

	### log-likelihood ratio test for multinomial data
	llr = function(k) {
	2 * sum(k) * (H(k) - H(rowSums(k)) - H(colSums(k)))
	}

tdunning / mcem.r

Last active December 7, 2020 22:59

Implementation of Monte Carlo EM algorithm for reconstructing a standard distribution from censored observations

	### This is a demonstration of a Monte Carlo Expectation Maximization
	### algorithm that can recover the mean and standard deviation of
	### truncated normally distributed data. We get 10,000 samples from
	### a unit normal distribution, but every sample below 0.5 is truncated
	### to that value. Every sample above 2.5 is truncated to that value.
	### These choices were made to get quick and visually appealling convergence
	### but the algorithm still converges for any choice. The converges
	### could be very, very slow if there is little information in the samples
	### and the final answer could have substantial uncertainty. For instance,
	### if we truncated at 4 and 6, almost all samples would be piled up at

tdunning / tesla-range-sim

Last active July 27, 2020 23:21

	### This code builds a simple physical model of the range of an 85kWh Tesla Model S and
	### compares it to real data. The data here is digitized from
	### https://www.tesla.com/blog/model-s-efficiency-and-range

	### The model here accounts for aerodynamic drag, viscous drag, constant
	### friction and constant power drain

	### First the digitized data
	x = read.csv(text="v,range
	10.22976354700292, 393.9005561997566

Ted Dunning tdunning