Last active
February 27, 2024 18:34
-
-
Save huddlej/3dc1504c2af5e999fba2e683c2f7f5e1 to your computer and use it in GitHub Desktop.
Example subsampling configurations for public Nextstrain seasonal flu builds with different implementations. See the original configuration file for more context: https://github.com/nextstrain/seasonal-flu/blob/45bf4336d9485c1c9bfc44b09b384595d7685032/profiles/nextstrain-public.yaml#L78-L86
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Current implementation approach where "subsamples" is defined in build configuration YAML file. | |
# Build configuration parameters get passed to the optionally-templated subsample "filters" strings | |
# such that the same subsampling scheme can be shared across multiple builds by passing build-specific | |
# variables. For the full context of this subsampling scheme, see the original build configuration file: | |
# https://github.com/nextstrain/seasonal-flu/blob/45bf4336d9485c1c9bfc44b09b384595d7685032/profiles/nextstrain-public.yaml#L78-L86 | |
subsamples: &subsampling-scheme | |
regions_except_europe: | |
filters: --query "(passage_category != 'egg') & (region != 'Europe') & (ha == True) & (na == True)" --group-by region year month --subsample-max-sequences 2700 --min-date {min_date} --exclude {exclude} --exclude-where passage=egg | |
# Note that a priority of "titers" has a special meaning in the flu | |
# workflow which is not portable to other pathogen workflows. | |
priorities: "titers" | |
europe: | |
filters: --query "(passage_category != 'egg') & (region == 'Europe') & (ha == True) & (na == True)" --group-by country year month --subsample-max-sequences 300 --min-date {min_date} --exclude {exclude} --exclude-where passage=egg | |
priorities: "titers" | |
references: | |
filters: --query "(is_reference == True)" --min-date {reference_min_date} --exclude {exclude} --exclude-where passage=egg |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Seasonal flu public builds with `augur subsample` YAML config. | |
regions_except_europe: | |
filter: --query "(passage_category != 'egg') & (region != 'Europe') & (ha == True) & (na == True)" --group-by region year month --subsample-max-sequences 2700 --min-date {min_date} --exclude {exclude} --exclude-where passage=egg | |
priorities: | |
# Note that this type of priority requires some pathogen-specific and data-specific calculation per | |
# lineage of flu. A more generic implementation might require a filename of priorities that the workflow | |
# could generate per lineage on demand. | |
type: titers | |
europe: | |
filter: --query "(passage_category != 'egg') & (region == 'Europe') & (ha == True) & (na == True)" --group-by country year month --subsample-max-sequences 300 --min-date {min_date} --exclude {exclude} --exclude-where passage=egg | |
priorities: | |
type: titers | |
references: | |
filter: --query "(is_reference == True)" --min-date {reference_min_date} --exclude {exclude} --exclude-where passage=egg | |
# Note that in this implementation "output" has to be treated as a | |
# keyword by the subsample command, so users cannot define a subsampling | |
# group with the name "output". | |
output: | |
- regions_except_europe | |
- europe | |
- references |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# augur subsample config approach but with direct one-to-one mapping of config | |
# key/value pairs to augur filter arguments/flags using hyphens in the key names. | |
# | |
# Note that seasonal flu builds can pass build parameters like min date | |
# to templated subsampling string to enable sharing of complex subsampling | |
# logic across multiple lineages, time resolutions, and segments. In | |
# this augur subsample implementation, we would need to create a separate | |
# subsample configuration file for each of the 4 public tree resolutions | |
# (e.g., "subsample_6m.yaml", "subsample_2y.yaml", etc.). | |
# I've left the variable notation with curly brackets here to indicate | |
# where in the configuration we would need to hardcode parameters. | |
regions_except_europe: | |
query: "(passage_category != 'egg') & (region != 'Europe') & (ha == True) & (na == True)" | |
group-by: region year month | |
subsample-max-sequences: 2700 | |
min-date: {min_date} | |
exclude: {exclude} | |
exclude-where: passage=egg | |
# Note: In this config, "priorities" conflicts directly with | |
# the augur filter argument of the same name. | |
priorities: | |
type: titers | |
europe: | |
query: "(passage_category != 'egg') & (region == 'Europe') & (ha == True) & (na == True)" | |
group-by: country year month | |
subsample-max-sequences: 300 | |
min-date: {min_date} | |
exclude: {exclude} | |
exclude-where: passage=egg | |
priorities: | |
type: titers | |
references: | |
query: "(is_reference == True)" | |
min-date: {reference_min_date} | |
exclude: {exclude} | |
exclude-where: passage=egg | |
output: | |
- regions_except_europe | |
- europe | |
- references |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment