Tips from Johannes Köster - Author of Snakemake on advanced use of snakemake
- When definining conda environments, prefer using channel priority in order of
bioconda > conda-forge > anything else
- When definining dependencies for conda env, never ever use version constraints beyond
major.minor
(1.2) or rarely usemajor.minor.patch
(1.2.3) - You should never use full version
major.minor.patch-blah_blah
like libgcc-ng=7.2.0=hdf63c60_3. Specifying full versions will inevitably fail future snakemake runs as conda website can not maintains all work-in-progress (likemajor.minor.patch-blah_blah
) on its website. - Instead of conda env, you may containerize mature snakemake workflow using
snakemake --containerize > Dockerfile
. You can then run snakemake workflow by specifying this container. Container should be compatible for both, docker and singularity. More at https://snakemake.readthedocs.io/en/stable/snakefiles/deployment.html - You may also use conda based snakemake wrappers and related website for frequently used rules, e.g, samtools sort, bwa align, etc. Also, checkout community supported snakemake workflow catalog and related website.
- For large files which are shared across several workflows, e.g., bwa indexes, aggregated variant calls, bam files, etc., you may leverage snakemake workflow caching. https://snakemake.readthedocs.io/en/stable/executing/caching.html
- Prefer using best practices when working with and distributing your workflow. https://snakemake.readthedocs.io/en/stable/snakefiles/deployment.html
- You can also combine one or more rules from other snakemake workflows (published on version control websites). https://snakemake.readthedocs.io/en/stable/snakefiles/deployment.html#using-and-combining-pre-exising-workflows
- Follow best practices while working with snakemake.
- For log files, you can redirect stdout and stderr using directives in either shell commands or script file. For script files, use language specific log collection methods at the top of the script, e.g.,
sink
function in R (example) andsys
module in python (example).
log <- file(snakemake@log[[1]], open="wt")
sink(log)
sink(log, type="message")
import sys
sys.stderr = open(snakemake.log[0], "w")
- snakemake can use input functions that can be a simple python function using snakemake wildcards (as defined from output). You can also use similar functions in params section, including use of python lamda funciton. See details here. If input function returns more than one file, you can also use snakemake
unpack
function to return dictionary object with key-value pairs details here. - In current version, 6.8.0, snakemake will not rerun entire workflow if say you add some of input files in configfile, e.g., add more fastqs in the first rule, but the final
calls/all.vcf
is present as in here. You can override this behavior using--list-input-changes
. In upcoming release, snakemake will introducesnakemake --rerun-changes
to rerun entire workflow on changes in ? one or more of input files. - snakemake has an experimental support for logging via slack. Ref.: snakemake logging and engenegr/log_handler_slack.py
- snakemake can use scatter-gather similar to HPC array-like logic. Details here. Also, check dna-seq-varlociraptor workflow.