Martin Papenberg m-Py

Post-doctoral researcher at the University of Duesseldorf.

m-Py / PCA_Variance_Explained.R

Created February 22, 2024 11:54

Get % of variance explained by Principal Component Analysis

	# Get % of variance explained by Principal Component Analysis

	library(psych)

	pca_variance_explained <- function(data, n_components) {
	pca <- psych::principal(data, n_components, rotate = "none")
	list(
	by_variable = colSums(cor(pca$scores, data)^2),
	total = summary(prcomp(data))$importance["Cumulative Proportion", paste0("PC", n_components)]
	)

m-Py / compare_phi_h.R

Created July 4, 2022 14:19

	# compare Cohen's h and phi coefficient as effect sizes for comparing proportions

	library(effectsize)

	# test data (have to be contingency tables; one column is always the same)
	matrices <- lapply(1:999, function(i) matrix(c(i, 1000-i, 999, 1), ncol = 2))
	phis <- sapply(matrices, function(x) effectsize::phi(x)$phi)
	hs <- sapply(matrices, function(x) effectsize::cohens_h(x)$Cohens_h)

	plot(abs(hs), phis, type = "l")

m-Py / small_anticlust_simulation.R

Last active October 21, 2020 17:39

Small anticlust simulation

	# Test if splitting data via anticlustering leads to closer groups means to the true population means,
	# as compared to a random split (e.g., for cross validation
	simulate <- function(N = 100, split = c(1, 3) / 4) { # default: split 75/25
	data <- rnorm(N)
	groups <- anticlustering(
	data,
	K = round(N * split),
	objective = "variance"
	)
	c(

m-Py / test_anticlust.R

Last active October 13, 2020 15:24

Test out the most recent version (v0.5.4) of anticlust

	## 1. Load - and, if required, install - package `anticlust`

	if (!requireNamespace("remotes")) {
	install.packages("remotes")
	}
	remotes::install_github("m-Py/anticlust")

	library(anticlust)

m-Py / simulate_glm.R

Last active July 13, 2020 14:34


	# Show that interaction in glm() changes nature of main effect
	# (only if a categorical predictor is dummy coded - not contrast coded)

	# Returns the p-value associated with a predictor main effect, once
	# with and once without interaction with a (non-predictive) categorical
	# independent variable

	simulate_glm <- function(N = 100, contrast_coding = FALSE) {
	iv1 <- rnorm(N) # related to DV

m-Py / KNN_RANN.R

Last active February 25, 2020 19:16

	# Author: Martin Papenberg
	# Year: 2019

	# Perform fast KNN classifier using RANN for nearest neighbour search

	library("RANN")
	library("data.table")

	# param data: The numeric data matrix used
	# param labels: the labels to predict

m-Py / covariate_regression.R

Last active February 12, 2020 10:58



	## This document illustrates that type 1 sum of squares lead to increased alpha
	## error rates when a predictive covariate is included in the regression model.


	# Estimate p-value for treatment (null) effect via linear regression,
	# including a covariate that is predictive of the outcome
	#
	# param N: sample size, default 100

m-Py / correlated_data.R

Last active May 13, 2020 08:05

Function to generate bivariate normal data with specified correlation

	## Year 2019 - 2020
	## Author: Martin Papenberg

	## This code is in the public domain, do with it whatever you like.
	# Generate bivariate normal data with specified correlation

	# param n: how many data points
	# param mx: the mean of the first variable
	# param my: the mean of the second variable
	# param sdx: the standard deviation of the first variable

m-Py / SIX_OUT_OF_THIRTY.R

Created January 16, 2019 13:24

How the p value in a t test can be minimized by data removal

	## Warning: This code is just for fun / educational purposes; the file contains functions
	## to find out how severely the p value in a t-test can be minimized by systematic removal of data points.

	## SIX OUT OF THIRTY - Martin's approach
	## Based on @juli_tkotz's (https://twitter.com/juli_tkotz/status/1085446224117985281)
	## idea that removing from the most extreme values is the best apporach.


	#' Simulate t-tests and store best p values
	#'

m-Py / ordinal_scores.R

Last active February 20, 2018 10:34

Compute ordinal scores from continuous data

	## Author Martin Papenberg
	## Year 2018

	## This code is released into the public domain. Anybody may use, alter
	## and distribute the code without restriction. The author makes no
	## guarantees, and takes no liability of any kind for use of this code.

	#' Compute ordinal scores from continuous data
	#'
	#' Might be useful for data exploration with highly skewed data