The run_prs.py
script slack_utils.py
was created to generate polygenic risk scores (PRS) in the UK Biobank (UKB) using Hail 0.2. Currently, I have hard-coded the file locations in-line with my own file structure and that of the Neale Lab UKB Round 2 GWAS.
NOTE: This is under active development and generally intended for my own use and the use of my collaborators. ADAPT AT YOUR OWN PERIL!!
-
This script assumes that you have generated an input file using my process-sumstats and LDPred repos.
-
Install cloudtools and start up a cluster using the following code, where N is the number of preemptible workers you want to start up:
cluster start CLUSTERNAME --version devel --spark 2.2.0 --preemptible-worker-boot-disk-size=10 --worker-boot-disk-size=10 -p N
- Submit the
run_prs.py
script to the cluster using the following code, where PHENOTYPE is the phenotype you want to score:
cluster submit ccarey run_prs.py --args "--phenotype PHENOTYPE"
With 100 total workers (2 non-preemptible, 98 preemptible), it typically takes ~20 minutes to score the entire UK Biobank white European subsample, and costs ~$12.