Tutorial: Force calling with DeepVariant v1.1.0

Using --variant_caller=vcf_candidate_importer in make_examples, we can use DeepVariant to act like a genotyper to force call candidates in the VCF file specified in --proposed_variants.

Context

Before reading this guide, read the WES Case Study and metrics:

The instructions below will be similar to: https://raw.githubusercontent.com/google/deepvariant/r1.1/scripts/run_wes_case_study_doc ker.sh

But we’ll provide new flags for you to run in force calling mode.

Machine

I got a GCE instance following this command: https://github.com/google/deepvariant/blob/r1.1/docs/deepvariant-details.md#command-for-a-cpu-only-machine-on-google-cloud-platform

I used ubuntu-1804-lts instead of 1604. (Ubuntu 16.04 should work too. I just picked one for this tutorial.)

Install necessary tools

sudo apt -y update && \
  sudo apt -y install docker.io && \
  sudo apt -y install bedtools && \
  sudo apt -y install tabix

Use run_deepvariant.py with customized args

With the files from the WES Case Study, run:

bedtools intersect -header \
  -a benchmark/HG003_GRCh38_1_22_v4.2_benchmark.vcf.gz \
  -b benchmark/HG003_GRCh38_1_22_v4.2_benchmark.bed \
  | bgzip > input/force_calling.candidates.vcf.gz
  tabix -p vcf input/force_calling.candidates.vcf.gz

Prepare a VCF file with proposed sites (that we can do force calling on):

Replace the run_deepvariant section in the WES Case Study with this:

mkdir -p output
mkdir -p output/intermediate_results_dir

BIN_VERSION="1.1.0"
MAKE_EXAMPLE_ARGS="variant_caller=vcf_candidate_importer,proposed_variants=/input/force_calling.candidates.vcf.gz,vsc_min_count_snps=0,vsc_min_count_indels=0,vsc_min_fraction_snps=0,vsc_min_fraction_indels=0"

sudo docker run \
  -v "${PWD}/input":"/input" \
  -v "${PWD}/output":"/output" \
  -v "${PWD}/reference":"/reference" \
  google/deepvariant:"${BIN_VERSION}" \
  /opt/deepvariant/bin/run_deepvariant \
  --model_type WES \
  --ref /reference/GRCh38_no_alt_analysis_set.fasta \
  --reads /input/HG003.novaseq.wes_idt.100x.dedup.bam \
  --regions /input/idt_capture_novogene.grch38.bed \
  --output_vcf /output/HG003.output.vcf.gz \
  --output_gvcf /output/HG003.output.g.vcf.gz \
  --num_shards $(nproc) \
  --make_examples_extra_args="${MAKE_EXAMPLE_ARGS}" \
  --intermediate_results_dir /output/intermediate_results_dir

Callset is generated at output/HG003.output.vcf.gz.

Runtime

make_examples step took:

real    7m34.109s
user    446m27.685s   
sys     9m19.618s

call_variants step took:

real    0m59.444s
user    25m49.371s
sys     1m25.558s

postprocess_variants step took:

real    0m44.978s
user    0m43.506s
sys     0m8.865s

Evaluate with hap.py

Baseline (WES metrics):

Type	# TP	# FN	# FP	Recall	Precision	F1_Score
Indel	1025	28	19	0.973409	0.982126	0.977748
SNP	25005	319	169	0.987403	0.993287	0.990337

Force calling on NIST v4.2 truth:

Type	# TP	# FN	# FP	Recall	Precision	F1_Score
Indel	1026	27	7	0.974359	0.993340	0.983758
SNP	25022	302	48	0.988075	0.998085	0.993055

hap.py output:

Benchmarking Summary:
  Type Filter  TRUTH.TOTAL  TRUTH.TP  TRUTH.FN  QUERY.TOTAL  QUERY.FP  QUERY.UNK  FP.gt  METRIC.Recall  METRIC.Precision  METRIC.Frac_NA  METRIC.F1_Score  TRUTH.TOTAL.TiTv_ratio  QUERY.TOTAL.TiTv_ratio  TRUTH.TOTAL.het_hom_ratio  QUERY.TOTAL.het_hom_ratio
 INDEL    ALL         1053      1026        27         1051         7          0      7       0.974359          0.993340               0         0.983758                     NaN                     NaN                   1.752717                   1.833333
 INDEL   PASS         1053      1026        27         1051         7          0      7       0.974359          0.993340               0         0.983758                     NaN                     NaN                   1.752717                   1.833333
   SNP    ALL        25324     25022       302        25071        48          0     48       0.988075          0.998085               0         0.993055                2.856273                2.860508                   1.625246                   1.625773
   SNP   PASS        25324     25022       302        25071        48          0     48       0.988075          0.998085               0         0.993055                2.856273                2.860508                   1.625246                   1.625773

pichuan/force_calling_v1.1.0.md