Skip to content

Instantly share code, notes, and snippets.

@huddlej
Created December 22, 2021 00:14
Show Gist options
  • Save huddlej/861cb1a31a4c69b5d5145160b2bf8d08 to your computer and use it in GitHub Desktop.
Save huddlej/861cb1a31a4c69b5d5145160b2bf8d08 to your computer and use it in GitHub Desktop.
Example Nextstrain builds definition for seasonal flu
# Define lineages for the analysis.
lineages:
- h3n2
- h1n1pdm
- vic
- yam
# Define genes to translate from nucleotide to amino acid sequences. These names
# must match coding regions defined in the reference.
genes:
- SigPep
- HA1
- HA2
# Define parameters for specific steps of the workflow by name.
download_sequences:
database: vdb
virus: flu
segment: ha
fasta_fields: strain virus accession collection_date region country division location passage_category submitting_lab age gender
resolve_method: split_passage
download_titers:
databases: cdc_tdb vidrl_tdb tdb
virus: flu
select: assay_type:hi serum_passage_category:cell
filter_titers:
exclude: niid
parse:
fasta_fields: strain virus isolate_id date region country division location passage authors age gender
subsample:
# Omit test strains that have egg passage annotated in their names. This can
# happen even if the passage type is not set to 'egg'.
query: "~strain.str.contains('egg')"
# Exclude low-quality strains (missing key metadata) or those passaged through
# eggs.
exclude_where: country=? region=? passage=egg
# For optimal date inference of internal nodes by TreeTime, omit strains with
# date ambiguity at the month or year resolution.
exclude_ambiguous_dates_by: month
min_date: 2010-01-01
max_date: 2020-01-01
min_length: 900
# Evenly sample strains across major geographic regions (e.g., Southeast Asia,
# Europe, North America, etc.) and time.
group_by: region year month
subsample_max_sequences: 3000
filter:
# Allow earlier strains that were included as references in the initial
# subsampling.
min_date: 2009-01-01
max_date: 2020-01-01
refine:
coalescent: const
date_inference: marginal
clock_filter_iqd: 4
clock_rate: 0.0043
clock_std_dev: 0.00086
ancestral:
inference: joint
export:
color_by_metadata: country region
frequencies:
narrow_bandwidth: 0.1667
proportion_wide: 0.0
min_date: 2010-01-01
max_date: 2020-01-01
pivot_interval: 1
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment