Skip to content

Instantly share code, notes, and snippets.

@vpnagraj
Last active May 14, 2024 18:35
Show Gist options
  • Save vpnagraj/afce89e2bb31d0caa3640439a8065a9a to your computer and use it in GitHub Desktop.
Save vpnagraj/afce89e2bb31d0caa3640439a8065a9a to your computer and use it in GitHub Desktop.
Demo of anomaly detection in R with anomalize()
###############################################################################
## brief demo of anomaly detection in R using the timetk anomalize() function
## current as of 2024-05-13
###############################################################################
## set up
## load dplyr for data manipulation
library(dplyr)
## load timetk for anomaly detection functionality
library(timetk)
## load jsonlite to read in example data
library(jsonlite)
## we will use data from CDC RESP-NET to motivate the example
## https://www.cdc.gov/surveillance/resp-net/dashboard.html
## note that the CDC makes this data available for download via flat file or API
## https://data.cdc.gov/Public-Health-Surveillance/Rates-of-Laboratory-Confirmed-RSV-COVID-19-and-Flu/kvib-3txy/about_data
## when querying the API the default is to return 1000 rows ...
## overriding that default to a large number to get back all the data (about 41000 rows as of 2024-05-13)
respnet <- fromJSON("https://data.cdc.gov/resource/kvib-3txy.json?$limit=1000000")
###############################################################################
## for this example we will just look at RSV data ...
## ... but RESP-NET includes flu and covid data as well
rsv <-
respnet %>%
## filter for RSV-NET data source
filter(surveillance_network == "RSV-NET") %>%
## need to convert the column that contains the date as a string to a date object
## the syntax here is "special" within the mutate statement ...
## ... because we need to access the bare column name with "`" to handle the underscore
mutate(date = as.Date(`_weekenddate`))
###############################################################################
## the RSV data is returned overall and within strata
## first we'll look at anomalies overall
rsv_overall <-
rsv %>%
filter(age_group == "Overall" & race_ethnicity == "Overall" & sex == "Overall" & site == "Overall")
## run the anomalize() function from timetk with defaults from documentation
rsv_overall_anomalies <-
rsv_overall %>%
anomalize(
.date_var = date,
.value = weekly_rate,
.iqr_alpha = 0.05,
.max_anomalies = 0.20,
.message = FALSE
)
## plot the anomalies detected
rsv_overall_anomalies %>%
plot_anomalies(date, .interactive = FALSE)
###############################################################################
## now we can do something similar using strata for ages
## note that the data does include more granular age groups ...
## ... but for this demo we will just look at children vs adults
rsv_ages <-
rsv %>%
## filter for the age groups of interest
filter(age_group %in% c("0-17 years (Children)", "18+ years (Adults)")) %>%
## make sure we have 'Overall' category for other variables
filter(site == "Overall" & sex == "Overall" & race_ethnicity == "Overall")
## run the anomaly detection algorithm
## using a group_by() along the way to make sure analysis is performed *within groups
rsv_grouped_anomalies <-
rsv_ages %>%
group_by(age_group) %>%
anomalize(
.date_var = date,
.value = weekly_rate,
.iqr_alpha = 0.05,
.max_anomalies = 0.20,
.message = FALSE
)
## plot the anomalies detected
rsv_grouped_anomalies %>%
plot_anomalies(date, .interactive = FALSE, .facet_ncol = 2)
###############################################################################
## other considerations
##
## the timetk anomalize() function is an evolution of functionality delivered in the anomalize package
## the anomalize() features are generalized such that they can operate with any kind of time series signal
## the anomaly detection methods with anomalize() should be tuned to fit specific features of your data
##
## for "multidimensional anomaly detection" our group has developed the rplanes package
## https://github.com/signaturescience/rplanes
## https://signaturescience.github.io/rplanes/
## this package is built to investigate forecasts and observed epidemiological signals and assess plausibility
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment