Skip to content

Instantly share code, notes, and snippets.

@erictleung
Last active February 2, 2022 06:44
Show Gist options
  • Save erictleung/09741b45f5f7722889b00ab4b539c932 to your computer and use it in GitHub Desktop.
Save erictleung/09741b45f5f7722889b00ab4b539c932 to your computer and use it in GitHub Desktop.
Correlation between data scientist and veganism
We can make this file beautiful and searchable if this error is corrected: It looks like row 2 should actually have 1 column, instead of 2 in line 1.
Category: All categories
Month,Data scientist: (United States)
2004-01,0
2004-02,0
2004-03,0
2004-04,0
2004-05,0
2004-06,0
2004-07,0
2004-08,0
2004-09,0
2004-10,0
2004-11,0
2004-12,0
2005-01,0
2005-02,5
2005-03,0
2005-04,0
2005-05,0
2005-06,0
2005-07,0
2005-08,5
2005-09,13
2005-10,8
2005-11,0
2005-12,0
2006-01,0
2006-02,4
2006-03,0
2006-04,0
2006-05,0
2006-06,0
2006-07,3
2006-08,0
2006-09,3
2006-10,0
2006-11,0
2006-12,3
2007-01,0
2007-02,2
2007-03,2
2007-04,2
2007-05,0
2007-06,2
2007-07,0
2007-08,0
2007-09,2
2007-10,0
2007-11,2
2007-12,2
2008-01,3
2008-02,3
2008-03,0
2008-04,3
2008-05,2
2008-06,3
2008-07,0
2008-08,0
2008-09,5
2008-10,3
2008-11,0
2008-12,3
2009-01,0
2009-02,0
2009-03,1
2009-04,1
2009-05,1
2009-06,3
2009-07,0
2009-08,0
2009-09,3
2009-10,0
2009-11,1
2009-12,0
2010-01,3
2010-02,1
2010-03,0
2010-04,2
2010-05,0
2010-06,1
2010-07,1
2010-08,2
2010-09,5
2010-10,6
2010-11,2
2010-12,3
2011-01,3
2011-02,2
2011-03,3
2011-04,4
2011-05,1
2011-06,1
2011-07,4
2011-08,6
2011-09,7
2011-10,5
2011-11,2
2011-12,6
2012-01,8
2012-02,6
2012-03,4
2012-04,10
2012-05,7
2012-06,8
2012-07,5
2012-08,6
2012-09,16
2012-10,9
2012-11,5
2012-12,10
2013-01,8
2013-02,12
2013-03,9
2013-04,18
2013-05,18
2013-06,12
2013-07,11
2013-08,15
2013-09,22
2013-10,16
2013-11,18
2013-12,12
2014-01,19
2014-02,21
2014-03,20
2014-04,16
2014-05,22
2014-06,19
2014-07,20
2014-08,34
2014-09,21
2014-10,26
2014-11,27
2014-12,27
2015-01,35
2015-02,24
2015-03,32
2015-04,31
2015-05,31
2015-06,39
2015-07,33
2015-08,31
2015-09,36
2015-10,46
2015-11,38
2015-12,40
2016-01,52
2016-02,34
2016-03,34
2016-04,36
2016-05,37
2016-06,40
2016-07,26
2016-08,39
2016-09,47
2016-10,47
2016-11,44
2016-12,36
2017-01,56
2017-02,62
2017-03,54
2017-04,53
2017-05,61
2017-06,45
2017-07,59
2017-08,52
2017-09,76
2017-10,66
2017-11,63
2017-12,49
2018-01,65
2018-02,76
2018-03,74
2018-04,72
2018-05,78
2018-06,69
2018-07,72
2018-08,77
2018-09,87
2018-10,75
2018-11,71
2018-12,69
2019-01,78
2019-02,79
2019-03,81
2019-04,77
2019-05,88
2019-06,78
2019-07,79
2019-08,90
2019-09,97
2019-10,86
2019-11,77
2019-12,67
2020-01,75
2020-02,83
2020-03,67
2020-04,71
2020-05,70
2020-06,72
2020-07,66
2020-08,73
2020-09,100
2020-10,76
2020-11,54
2020-12,72
2021-01,82
2021-02,69
2021-03,72
2021-04,74
2021-05,74
2021-06,67
2021-07,75
2021-08,79
2021-09,97
2021-10,85
2021-11,73
2021-12,66
2022-01,80
# Load libraries
library(ggplot2) # CRAN v3.3.5
library(dplyr) # CRAN v1.0.6
library(readr) # CRAN v1.4.0
library(lubridate) # CRAN v1.7.10
# Load data
search_ds <- janitor::clean_names(read_csv("data_scientist_search.csv", skip = 2))
search_ve <- janitor::clean_names(read_csv("veganism_search.csv", skip = 2))
# Clean up data
search_ds_clean <- search_ds %>%
mutate(name = "Data Scientist",
month = ym(month)) %>%
rename(value = data_scientist_united_states)
search_ve_clean <- search_ve %>%
mutate(name = "Veganism",
month = ym(month)) %>%
rename(value = veganism_united_states)
# Put together
search <- bind_rows(search_ds_clean, search_ve_clean)
# Correlation
r <- cor.test(search_ds_clean$value, search_ve_clean$value, method = "pearson")
# Plot results
search %>%
ggplot(aes(x = month, y = value, color = name)) +
geom_line(size = 2) +
theme_minimal() +
scale_color_manual("", values = c(1,2)) +
theme(legend.position = "bottom") +
labs(
title = "Veganism correlates with Data Scientists",
subtitle = paste0("Pearson Correlation: ", round(r$estimate, 3) * 100, "%"),
x = "Time",
y = "Popularity on Google Trends",
caption = "Data source: Google Trends https://trends.google.com/trends/"
)
ggsave("ds_vegan_suprious.png", bg = "white", scale = 0.75)
We can make this file beautiful and searchable if this error is corrected: It looks like row 2 should actually have 1 column, instead of 2 in line 1.
Category: All categories
Month,Veganism: (United States)
2004-01,14
2004-02,14
2004-03,9
2004-04,12
2004-05,13
2004-06,12
2004-07,11
2004-08,13
2004-09,11
2004-10,14
2004-11,16
2004-12,16
2005-01,14
2005-02,12
2005-03,12
2005-04,11
2005-05,13
2005-06,12
2005-07,13
2005-08,16
2005-09,11
2005-10,14
2005-11,16
2005-12,16
2006-01,13
2006-02,13
2006-03,12
2006-04,12
2006-05,13
2006-06,12
2006-07,13
2006-08,13
2006-09,12
2006-10,13
2006-11,15
2006-12,13
2007-01,13
2007-02,14
2007-03,15
2007-04,14
2007-05,18
2007-06,14
2007-07,15
2007-08,15
2007-09,15
2007-10,15
2007-11,20
2007-12,19
2008-01,18
2008-02,18
2008-03,17
2008-04,16
2008-05,17
2008-06,17
2008-07,19
2008-08,16
2008-09,17
2008-10,16
2008-11,20
2008-12,19
2009-01,18
2009-02,17
2009-03,18
2009-04,18
2009-05,17
2009-06,16
2009-07,19
2009-08,18
2009-09,16
2009-10,18
2009-11,23
2009-12,20
2010-01,20
2010-02,20
2010-03,20
2010-04,20
2010-05,19
2010-06,19
2010-07,19
2010-08,20
2010-09,21
2010-10,21
2010-11,22
2010-12,22
2011-01,25
2011-02,28
2011-03,26
2011-04,27
2011-05,24
2011-06,23
2011-07,24
2011-08,26
2011-09,29
2011-10,29
2011-11,32
2011-12,31
2012-01,32
2012-02,33
2012-03,31
2012-04,31
2012-05,31
2012-06,32
2012-07,33
2012-08,31
2012-09,29
2012-10,29
2012-11,34
2012-12,34
2013-01,37
2013-02,36
2013-03,37
2013-04,34
2013-05,33
2013-06,33
2013-07,34
2013-08,34
2013-09,32
2013-10,32
2013-11,37
2013-12,37
2014-01,39
2014-02,36
2014-03,37
2014-04,35
2014-05,33
2014-06,32
2014-07,33
2014-08,32
2014-09,29
2014-10,30
2014-11,37
2014-12,32
2015-01,34
2015-02,32
2015-03,32
2015-04,30
2015-05,34
2015-06,35
2015-07,37
2015-08,37
2015-09,37
2015-10,40
2015-11,48
2015-12,42
2016-01,48
2016-02,49
2016-03,52
2016-04,51
2016-05,55
2016-06,56
2016-07,54
2016-08,51
2016-09,47
2016-10,54
2016-11,73
2016-12,89
2017-01,58
2017-02,54
2017-03,57
2017-04,56
2017-05,54
2017-06,61
2017-07,92
2017-08,93
2017-09,78
2017-10,78
2017-11,85
2017-12,81
2018-01,88
2018-02,84
2018-03,83
2018-04,76
2018-05,73
2018-06,76
2018-07,76
2018-08,74
2018-09,70
2018-10,72
2018-11,79
2018-12,77
2019-01,79
2019-02,76
2019-03,75
2019-04,75
2019-05,71
2019-06,77
2019-07,73
2019-08,76
2019-09,70
2019-10,75
2019-11,100
2019-12,92
2020-01,97
2020-02,88
2020-03,68
2020-04,71
2020-05,72
2020-06,74
2020-07,77
2020-08,77
2020-09,70
2020-10,71
2020-11,72
2020-12,75
2021-01,78
2021-02,73
2021-03,68
2021-04,70
2021-05,71
2021-06,69
2021-07,70
2021-08,67
2021-09,65
2021-10,54
2021-11,59
2021-12,54
2022-01,54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment