Skip to content

Instantly share code, notes, and snippets.

View aofarrel's full-sized avatar
🧬

Ash O'Farrell aofarrel

🧬
  • Genomics Institute, UC Santa Cruz
  • Santa Cruz (via Ireland's Ancient East)
View GitHub Profile
@huddlej
huddlej / README.md
Last active March 2, 2024 02:45
Command line tool to convert annotated phylogenetic trees nextstrain.org's JSON format to a tidy data frame of tree attributes

Convert Auspice tree JSON to a data frame and Newick tree

An example script to convert an Auspice tree JSON to a data frame and Newick tree for processing by downstream analyses.

Setup

Install Nextstrain.

Usage

@SirSerje
SirSerje / docker-kill.txt
Last active March 18, 2021 22:11
Remove all docker's shit
kill all running containers with docker kill $(docker ps -q)
delete all stopped containers with docker rm $(docker ps -a -q)
delete all images with docker rmi $(docker images -q)
@dwinter
dwinter / parse_fq.md
Last active June 6, 2024 18:22
Parse fastqc outputs

Fastqc is a program to perform some basic quality checks on fastq files. It makes nice html reports for a given file, but (as far as I can tell) doesn't provide a straightfowrard way to compare the results across files (which might represent different library preps, sequencing lanes or samples).

Here is the (really pretty hacky) solution to aggregating these stats that I came up with. This all assumes that you have a directory where reports for each fastq file are in a subdirectories containing the reports with names ./library_name.L001.R1.fastqc/fastqc_data.txt. We will then use regular expressions to match just those parts of the file we care about.

import os
import re

#percent sequences left after de_dup
re_dup = re.compile('Total Deduplicated Percentage\t(\d\d\.\d)')