Created
December 22, 2021 00:14
-
-
Save huddlej/861cb1a31a4c69b5d5145160b2bf8d08 to your computer and use it in GitHub Desktop.
Example Nextstrain builds definition for seasonal flu
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Define lineages for the analysis. | |
lineages: | |
- h3n2 | |
- h1n1pdm | |
- vic | |
- yam | |
# Define genes to translate from nucleotide to amino acid sequences. These names | |
# must match coding regions defined in the reference. | |
genes: | |
- SigPep | |
- HA1 | |
- HA2 | |
# Define parameters for specific steps of the workflow by name. | |
download_sequences: | |
database: vdb | |
virus: flu | |
segment: ha | |
fasta_fields: strain virus accession collection_date region country division location passage_category submitting_lab age gender | |
resolve_method: split_passage | |
download_titers: | |
databases: cdc_tdb vidrl_tdb tdb | |
virus: flu | |
select: assay_type:hi serum_passage_category:cell | |
filter_titers: | |
exclude: niid | |
parse: | |
fasta_fields: strain virus isolate_id date region country division location passage authors age gender | |
subsample: | |
# Omit test strains that have egg passage annotated in their names. This can | |
# happen even if the passage type is not set to 'egg'. | |
query: "~strain.str.contains('egg')" | |
# Exclude low-quality strains (missing key metadata) or those passaged through | |
# eggs. | |
exclude_where: country=? region=? passage=egg | |
# For optimal date inference of internal nodes by TreeTime, omit strains with | |
# date ambiguity at the month or year resolution. | |
exclude_ambiguous_dates_by: month | |
min_date: 2010-01-01 | |
max_date: 2020-01-01 | |
min_length: 900 | |
# Evenly sample strains across major geographic regions (e.g., Southeast Asia, | |
# Europe, North America, etc.) and time. | |
group_by: region year month | |
subsample_max_sequences: 3000 | |
filter: | |
# Allow earlier strains that were included as references in the initial | |
# subsampling. | |
min_date: 2009-01-01 | |
max_date: 2020-01-01 | |
refine: | |
coalescent: const | |
date_inference: marginal | |
clock_filter_iqd: 4 | |
clock_rate: 0.0043 | |
clock_std_dev: 0.00086 | |
ancestral: | |
inference: joint | |
export: | |
color_by_metadata: country region | |
frequencies: | |
narrow_bandwidth: 0.1667 | |
proportion_wide: 0.0 | |
min_date: 2010-01-01 | |
max_date: 2020-01-01 | |
pivot_interval: 1 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment