explodecomputer/tidyverse.md

Last active December 18, 2020 10:09

Star () You must be signed in to star a gist
Fork () You must be signed in to fork a gist

Learn more about clone URLs
Clone this repository at <script src="https://gist.github.com/explodecomputer/599ab29607db0f5a491fb85f555d8921.js"></script>
Save explodecomputer/599ab29607db0f5a491fb85f555d8921 to your computer and use it in GitHub Desktop.

Download ZIP

Tidyverse notes

Raw

tidyverse.md

Tidyverse notes

Website: https://www.tidyverse.org/packages/

Comparison of dplyr and base functions: https://cran.r-project.org/web/packages/dplyr/vignettes/base.html

Piping:

library(dplyr)
author <- " Person 1, Person 2, ..."

author %>% 
  as.character %>% 
  stringr::str_trim() %>% 
  gsub("\\.\\.\\.", "et al", .)

vs

gsub("\\.\\.\\.", "et al", stringr::str_trim(as.character(author)))

What is "tidy data"?

Tidy datasets are all alike, but every messy dataset is messy in its own way.

R for Data Science book describes "tidy data" https://r4ds.had.co.nz/tidy-data.html

Each variable must have its own column.
Each observation must have its own row.
Each value must have its own cell.

More in depth discussion in this paper: https://www.jstatsoft.org/article/view/v059i10

Lots of stuff on youtube eg https://www.youtube.com/watch?v=ZM04jn95YP0 which includes this gist of examples: https://gist.github.com/larsentom/727da01476ad1fe5c066a53cc784417b

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment