Skip to content

Instantly share code, notes, and snippets.

@huddlej
Created March 7, 2024 18:23
Show Gist options
  • Save huddlej/cde8e98d0a1ceb1546f9068039a433f2 to your computer and use it in GitHub Desktop.
Save huddlej/cde8e98d0a1ceb1546f9068039a433f2 to your computer and use it in GitHub Desktop.
Example subsampling logic for multi-species seasonal flu build
custom_rules:
- profiles/gisaid/prepare_data.smk
metadata_fields:
- Isolate_Name
- Isolate_Id
- Passage_History
- Location
- Authors
- Originating_Lab
- Collection_Date
- Submission_Date
renamed_metadata_fields:
- strain
- accession
- passage
- location
- authors
- originating_lab
- date
- date_submitted
lat-longs: "config/lat_longs.tsv"
segments:
- ha
submission_date_field: date_submitted
recency:
date_bins: [7, 30, 90]
date_bin_labels: ["last week", "last month", "last quarter"]
upper_bin_label: older
builds:
"h3n2":
lineage: h3n2
reference: "config/h3n2/{segment}/reference.fasta"
annotation: "config/h3n2/{segment}/genemap.gff"
tree_exclude_sites: "config/h3n2/{segment}/exclude-sites.txt"
clades: "config/h3n2/ha/clades.tsv"
subclades: "config/h3n2/ha/subclades.tsv"
auspice_config: "config/h3n2/auspice_config.json"
enable_lbi: true
enable_glycosylation: true
subsamples:
human:
filters: --query "host == 'human'" --subsample-max-sequences 100 --group-by country year month
avian:
filters: --query "host in ['duck', 'chicken', 'stork']" --subsample-max-sequences 100 --group-by region
swine:
filters: --query "host == 'swine'" --subsample-max-sequences 100 --group-by region
custom_rules:
- profiles/gisaid/prepare_data.smk
metadata_fields:
- Isolate_Name
- Isolate_Id
- Passage_History
- Location
- Authors
- Originating_Lab
- Collection_Date
- Submission_Date
renamed_metadata_fields:
- strain
- accession
- passage
- location
- authors
- originating_lab
- date
- date_submitted
lat-longs: "config/lat_longs.tsv"
segments:
- ha
submission_date_field: date_submitted
recency:
date_bins: [7, 30, 90]
date_bin_labels: ["last week", "last month", "last quarter"]
upper_bin_label: older
builds:
"h3n2":
lineage: h3n2
reference: "config/h3n2/{segment}/reference.fasta"
annotation: "config/h3n2/{segment}/genemap.gff"
tree_exclude_sites: "config/h3n2/{segment}/exclude-sites.txt"
clades: "config/h3n2/ha/clades.tsv"
subclades: "config/h3n2/ha/subclades.tsv"
auspice_config: "config/h3n2/auspice_config.json"
enable_lbi: true
enable_glycosylation: true
subsamples:
specific_strains:
filters: --exclude-all --include profiles/gisaid/specific_strains.txt
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment