Skip to content

Instantly share code, notes, and snippets.

@erinshellman
Last active November 27, 2017 23:32
Show Gist options
  • Save erinshellman/7ba5ea61d5d83aef4d35 to your computer and use it in GitHub Desktop.
Save erinshellman/7ba5ea61d5d83aef4d35 to your computer and use it in GitHub Desktop.
A collection of little helper functions for quick data cleaning in R
clean_headers = function(headers) {
# Make lowercase
headers = tolower(headers)
# Replace symbols
headers = gsub(' ', '', headers, fixed = TRUE)
headers = gsub('.', '_', headers, fixed = TRUE)
headers = gsub('[^[:alnum:]_]', '', headers) # remove all symbols except '_'
headers = gsub('__', '_', headers, fixed = TRUE)
headers = gsub('_$', '', headers) # if last char is '_', remove
return(headers)
}
remove_whitespace = function(df, side = 'both') {
# Goes over each element of a df and strips out whitespace.
# Defaults to stripping out both sides.
require(stringr); require(dplyr)
df_no_whitespace = mutate_each(df, funs(str_trim(., side = side)))
return(df_no_whitespace)
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment