Skip to content

Instantly share code, notes, and snippets.

View ianmcook's full-sized avatar

Ian Cook ianmcook

View GitHub Profile
ianmcook /
Last active August 20, 2024 21:20
Examples demonstrating whether systems maintain row order

This is a set of examples demonstrating whether various Python and R dataframe libraries and OLAP query engines preserve (or do not preserve) the original order of the records in the data.

Example data

The examples all use this dataset describing the 28 times when a person walked on the moon:

year mission name minutes
1969 Apollo 11 Neil Armstrong 151
1969 Apollo 11 Buzz Aldrin 151
Line # Mem usage Increment Occurrences Line Contents
7 147.8 MiB 147.8 MiB 1 @profile
8 def my_func():
9 # Load Vaex example
10 173.0 MiB 25.3 MiB 1 df = vaex.example()
11 # Create a virtual column
12 173.0 MiB 0.0 MiB 1 df.add_virtual_column("r", "sqrt(x**2 + y**2 + z**2)")
14 # Create a __dataframe__ instance
illepic /
Last active August 3, 2024 16:44
Download the latest release binary from a private GitHub repo. (i.e. a .tar.gz that you have manually uploaded in a GitHub release). Update OAUTH_TOKEN, OWNER, REPO, FILE_NAME with your custom values.
#!/usr/bin/env bash
# Authorize to GitHub to get the latest release tar.gz
# Requires: oauth token,
# Requires: jq package to parse json
# Your oauth token goes here, see link above
# Repo owner (user id)
piccolbo /
Last active June 23, 2018 03:58
Dplyr backends: the ultimate collection

Dplyr is a well known R package to work on structured data, either in memory or in DB and, more recently, in cluster. The in memory implementations have in general capabilities that are not found in the others, so the notion of backend is used with a bit of a poetic license. Even the different DB and cluster backends differ in subtle ways. But it sure is better than writing SQL directly! Here I provide a list of backends with links to the packages that implement them when necessary. I've done my best to provide links to active projects, but I am not endorsing any of them. Do your own testing. Enjoy and please contribute any corrections or additions, in the comments.

Backend Package
data.frame builtin
data.table builtin
arrays builtin
SQLite builtin
PostgreSQL/Redshift builtin
hadley / s3.r
Created May 7, 2013 13:16
Implementation of request signing for Amazon's S3 in R.
s3_request <- function(verb, bucket, path = "/", query = NULL,
content = NULL, date = NULL) {
verb = verb,
bucket = bucket,
path = path,
chitchcock /
Created October 12, 2011 15:53
Stevey's Google Platforms Rant

Stevey's Google Platforms Rant

I was at Amazon for about six and a half years, and now I've been at Google for that long. One thing that struck me immediately about the two companies -- an impression that has been reinforced almost daily -- is that Amazon does everything wrong, and Google does everything right. Sure, it's a sweeping generalization, but a surprisingly accurate one. It's pretty crazy. There are probably a hundred or even two hundred different ways you can compare the two companies, and Google is superior in all but three of them, if I recall correctly. I actually did a spreadsheet at one point but Legal wouldn't let me show it to anyone, even though recruiting loved it.

I mean, just to give you a very brief taste: Amazon's recruiting process is fundamentally flawed by having teams hire for themselves, so their hiring bar is incredibly inconsistent across teams, despite various efforts they've made to level it out. And their operations are a mess; they don't real

sandro / bookmarklet_template.js
Created January 29, 2009 05:01
bookmarklet template
// TODO: remove spaces and newlines
// TODO: bookmarklet code needs to be in an anchor tag
// <a href="javascript:(function(){s=document.createElement('script');s.type='text/javascript';s.src='';document.body.appendChild(s);})();">My Bookmarklet</a>