{:db/ident :meta/tag
:db/valueType :db.type/tuple
:db/tupleAttrs [:meta/tag-namespace :meta/tag-key :meta/tag-value] ;; all unique strings
:db/cardinality :db.cardinality/one
:db/unique :db.unique/identity}
Then
Andy Thomason is a Senior Programmer at Genomics PLC. He has been witing graphics systems, games and compilers since the '70s and specialises in code performance.
This is a small experiment on the alignment of ~50bp INDELs. The query sequences are shown in 0.01.fq
below, where seq_ori
is a 204bp sequence extracted from the human reference genome, seq_del54
contains a 54bp deletion in the middle, seq_del84
contains a 84bp deletion in a 120bp read, and seq_ins40
contains a 40bp insertion in a 140bp read. These four short sequences were mapped to the human reference genome with Bowtie2, BWA-MEM, LAST, Novoalign, SNAP and Stampy with default settings. Non-default scoring functions were also tested for Bowtie2 (--rdg 5,1 --rfg 5,1), BWA-MEM (-A2 -E1) and LAST (-r2 -q4). The output by various mappers/settings can be found in this gist. The following table gives my summary:
Mapper | Setting | -84bp | -54bp | +40bp |
---|---|---|---|---|
BBMAP | default | Yes | Yes | Yes |
Bowtie2 | default | No | No | No |
Bowtie2 | --rdg 5,1 --rfg 5,1 | as insertion | as insertion | Yes |
BWA-MEM | default | as split | Yes | Yes |
BWA-MEM | -A2 -E1 | Yes | Yes | Yes |
LAST | default | as split | as split |
References:
Steps:
https://basespace.illumina.com/sample/9804795/files/tree/NA12878-L1_S1_L001_R1_001.fastq.gz?id=515013503
. The "id" is the unique file identifier.wget -O filename 'https://api.basespace.illumina.com/v1pre3/files/{id}/content?access_token={token}'
, where {token} is from step 1 and {id} from step 2.############################################################ | |
# Novoalign | |
############################################################ | |
export GENOME=/home/arq5x/cphg-home/shared/genomes/hg19/bwa/gatk/hg19_gatk.fa.novo.k14.s1.idx | |
export IRCHOME=/net/midtier18/vol79/cphg-quinlan2/projects/irradiated-clones | |
export STEPNAME=ircnovo | |
export QSUB="qsub -W group_list=cphg_arq5x -q arq5xlab -V -l select=1:mem=32000m:ncpus=16 -N $STEPNAME -m bea -M arq5x@virginia.edu"; | |
echo "cd $IRCHOME; novoalign -d $GENOME -o SAM $'@RG\tID:parental\tSM:parental' -r Random \ | |
-f fastq/CgmW_AGTCAA_L001_R1.fastq.gz fastq/CgmW_AGTCAA_L001_R2.fastq.gz \ |
#!/bin/bash | |
# trim.sh - generic, slightly insane paired end quality trimming script | |
# Vince Buffalo <vsbuffaloAAAAAA@gmail.com> (sans poly-A) | |
set -e | |
set -u | |
## pre-config | |
ADAPTERS=illumina_adapters.fa | |
SAMPLE_NAME=some_sample_name | |
IN1=in1.fastq |
These notes build from several excellent sources:
and assume you're working with GATK 2.2-16. These notes also assume
import javax.net.ssl.HostnameVerifier; | |
import javax.net.ssl.SSLContext; | |
import javax.net.ssl.SSLSession; | |
import com.sun.jersey.api.client.config.ClientConfig; | |
import com.sun.jersey.api.client.config.DefaultClientConfig; | |
import com.sun.jersey.api.json.JSONConfiguration; | |
import com.sun.jersey.client.urlconnection.HTTPSProperties; | |
... |
I 'm fleshing out some of these ideas here: https://github.com/lynaghk/todoFRP/tree/master/todo/angular-cljs