Skip to content

Instantly share code, notes, and snippets.

@anpefi
Last active August 29, 2015 14:16
Show Gist options
  • Save anpefi/5724e90ae8504f6165ed to your computer and use it in GitHub Desktop.
Save anpefi/5724e90ae8504f6165ed to your computer and use it in GitHub Desktop.
In order to add some sequences as "contaminants database" in the fastqc analysis, a contaminant file should be provided with the following format: header[tabulation]sequence. Usage: fasta2oneline.sh example.fa > contaminants.txt
#!/bin/bash
# fasta2oneline.sh
# Convert fasta file in one line: header[tab]sequence (useful for fastqc contaminant file, for example)
# Output to the stdout, redirect it to a file
# Usage: fast2oneline.sh x_example.fa > z_contaminants.txt
INPUT_FILE=$1
cat ${INPUT_FILE} | sed '/^$/d' | sed -n '/^>/!{H;$!b};s/$/ \t/;x;1b;s/\n//g;p'
>seq1
catcgatcgtacgatcgtacgtacgtagc
>seq2
acgtacgtcatgcatgatactgtagctacgtacgtacgt
>seq3
agctagtcgatcgatcgatcgatcgatcgatcgatcgtacgtacg
>seq1 catcgatcgtacgatcgtacgtacgtagc
>seq2 acgtacgtcatgcatgatactgtagctacgtacgtacgt
>seq3 agctagtcgatcgatcgatcgatcgatcgatcgatcgtacgtacg
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment