Skip to content

Instantly share code, notes, and snippets.

@huddlej
Created January 13, 2022 23:09
Show Gist options
  • Save huddlej/21c25b08487725f0ce9f3dda8c9380c4 to your computer and use it in GitHub Desktop.
Save huddlej/21c25b08487725f0ce9f3dda8c9380c4 to your computer and use it in GitHub Desktop.
Example of a composable config for Nextstrain seasonal flu builds using CUE definitions

Example of a composable config for Nextstrain seasonal flu builds using CUE definitions

Background

An example of how one might use a configuration language like CUE to abstract a Nextstrain build configuration across multiple resolutions, segments, or lineages. The attached CUE config defines 20 builds for H3N2 and H1N1pdm lineages, HA and NA segments, and five different resolutions. Resolution- and lineage-specific parameters are defined once as CUE definitions and combined through CUE structs. These 100 lines of CUE (with comments) evaluate to 181 lines of YAML. CUE also allows us to define default values for fields like filter.sequences_per_group that we can override in specific builds. This approach allows us to create builds that require exceptions to the rules like the h3n2_na_6m build.

This example builds on the idea of a simple flat YAML config for builds like the following where rule-level configuration parameters are defined as top-level, period-delimited keys in the YAML and then optionally overridden by individual build definitions:

# Set global defaults.
parse.fasta_fields: accession strain isolate_id segment passage submitting_lab lineage date
filter.group_by: region year month

# Define builds for H3N2 HA and NA.
builds:
  - name: h3n2_ha
    sequences: data/gisaid_epiflu_sequence_h3n2_ha.fasta
  - name: h3n2_ha_europe
    sequences: data/gisaid_epiflu_sequence_h3n2_ha.fasta
    filter.group_by: country year month

Usage

Copy and paste the code from composable_flu_config.cue into the CUE playground to see the corresponding rendered/expanded YAML version that could be consumed by a hypothetical Nextstrain workflow.

#BUILD: {
name: string
sequences: string
"filter.group_by": "region year month"
"refine.clock_rate": float
"refine.clock_std_dev": float
"filter.min_date_offset": string
"filter.sequences_per_group": uint
"lbi.tau": float
"lbi.time_window": float
}
// Define parameters for specific build resolutions
// like "6m", "2y", etc. These parameters are shared
// across lineages and segments.
#ALL_6M: #BUILD & {
"filter.min_date_offset": "6M"
"filter.sequences_per_group": uint | *360
"lbi.tau": 0.3
"lbi.time_window": 0.5
}
#ALL_2Y: #BUILD & {
"filter.min_date_offset": "2Y"
"filter.sequences_per_group": uint | *90
"lbi.tau": 0.3
"lbi.time_window": 0.5
}
#ALL_3Y: #BUILD & {
"filter.min_date_offset": "3Y"
"filter.sequences_per_group": uint | *60
"lbi.tau": 0.4
"lbi.time_window": 0.6
}
#ALL_6Y: #BUILD & {
"filter.min_date_offset": "6Y"
"filter.sequences_per_group": uint | *30
"lbi.tau": 0.25
"lbi.time_window": 0.75
}
#ALL_12Y: #BUILD & {
"filter.min_date_offset": "12Y"
"filter.sequences_per_group": uint | *15
"lbi.tau": 0.25
"lbi.time_window": 0.75
}
// Define parameters for specific lineage and segment
// combinations.
#H3N2_HA: #BUILD & {
sequences: "data/gisaid_epiflu_sequence_h3n2_ha.fasta"
"refine.clock_rate": 0.00382
"refine.clock_std_dev": 0.000764
}
#H3N2_NA: #BUILD & {
sequences: "data/gisaid_epiflu_sequence_h3n2_na.fasta"
"refine.clock_rate": 0.00267
"refine.clock_std_dev": 0.000534
}
#H1N1pdm_HA: #BUILD & {
sequences: "data/gisaid_epiflu_sequence_h1n1pdm_ha.fasta"
"refine.clock_rate": 0.00329
"refine.clock_std_dev": 0.000658
}
#H1N1pdm_NA: #BUILD & {
sequences: "data/gisaid_epiflu_sequence_h1n1pdm_na.fasta"
"refine.clock_rate": 0.00326
"refine.clock_std_dev": 0.000652
}
builds: [
#H3N2_HA & #ALL_6M & {name: "h3n2_ha_6m"},
#H3N2_HA & #ALL_2Y & {name: "h3n2_ha_2y"},
#H3N2_HA & #ALL_3Y & {name: "h3n2_ha_3y"},
#H3N2_HA & #ALL_6Y & {name: "h3n2_ha_6y"},
#H3N2_HA & #ALL_12Y & {name: "h3n2_ha_12y"},
#H3N2_NA & #ALL_6M & {
name: "h3n2_na_6m"
"filter.sequences_per_group": 1
},
#H3N2_NA & #ALL_2Y & {name: "h3n2_na_2y"},
#H3N2_NA & #ALL_3Y & {name: "h3n2_na_3y"},
#H3N2_NA & #ALL_6Y & {name: "h3n2_na_6y"},
#H3N2_NA & #ALL_12Y & {name: "h3n2_na_12y"},
#H1N1pdm_HA & #ALL_6M & {name: "h1n1pdm_ha_6m"},
#H1N1pdm_HA & #ALL_2Y & {name: "h1n1pdm_ha_2y"},
#H1N1pdm_HA & #ALL_3Y & {name: "h1n1pdm_ha_3y"},
#H1N1pdm_HA & #ALL_6Y & {name: "h1n1pdm_ha_6y"},
#H1N1pdm_HA & #ALL_12Y & {name: "h1n1pdm_ha_12y"},
#H1N1pdm_NA & #ALL_6M & {name: "h1n1pdm_na_6m"},
#H1N1pdm_NA & #ALL_2Y & {name: "h1n1pdm_na_2y"},
#H1N1pdm_NA & #ALL_3Y & {name: "h1n1pdm_na_3y"},
#H1N1pdm_NA & #ALL_6Y & {name: "h1n1pdm_na_6y"},
#H1N1pdm_NA & #ALL_12Y & {name: "h1n1pdm_na_12y"},
]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment