An example script to convert an Auspice tree JSON to a data frame and Newick tree for processing by downstream analyses.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
kill all running containers with docker kill $(docker ps -q) | |
delete all stopped containers with docker rm $(docker ps -a -q) | |
delete all images with docker rmi $(docker images -q) |
Fastqc is a program to perform some basic quality checks on fastq files. It makes nice html reports for a given file, but (as far as I can tell) doesn't provide a straightfowrard way to compare the results across files (which might represent different library preps, sequencing lanes or samples).
Here is the (really pretty hacky) solution to aggregating these stats that I came up with. This all assumes that you have a directory where reports for each fastq file are in a subdirectories containing the reports with names ./library_name.L001.R1.fastqc/fastqc_data.txt
. We will then use regular expressions to match just those parts of the file we care about.
import os
import re
#percent sequences left after de_dup
re_dup = re.compile('Total Deduplicated Percentage\t(\d\d\.\d)')