An example of how one might use a configuration language like CUE to abstract a Nextstrain build configuration across multiple resolutions, segments, or lineages. The attached CUE config defines 20 builds for H3N2 and H1N1pdm lineages, HA and NA segments, and five different resolutions. Resolution- and lineage-specific parameters are defined once as CUE definitions and combined through CUE structs. These 100 lines of CUE (with comments) evaluate to 181 lines of YAML. CUE also allows us to define default values for fields like filter.sequences_per_group
that we can override in specific builds. This approach allows us to create builds that require exceptions to the rules like the h3n2_na_6m
build.
This example builds on the idea of a simple flat YAML config for builds like the following where rule-level configuration parameters are defined as top-level, period-delimited keys in the YAML and then optionally overridden by individual build definitions:
# Set global defaults.
parse.fasta_fields: accession strain isolate_id segment passage submitting_lab lineage date
filter.group_by: region year month
# Define builds for H3N2 HA and NA.
builds:
- name: h3n2_ha
sequences: data/gisaid_epiflu_sequence_h3n2_ha.fasta
- name: h3n2_ha_europe
sequences: data/gisaid_epiflu_sequence_h3n2_ha.fasta
filter.group_by: country year month
Copy and paste the code from composable_flu_config.cue
into the CUE playground to see the corresponding rendered/expanded YAML version that could be consumed by a hypothetical Nextstrain workflow.