Reproducible Genome-wide Analysis with Cloud Computing

Genome-wide association studies (GWAS) have revealed associations between human genetic variants and hundreds of human traits, including disease susceptibility. We found that differences in quality control (QC) upstream of GWAS critically affect the results. Despite this, no clear mechanism for reproducibility of GWAS QC pipelines exists to our knowledge. We developed CLINK, a cloud-based interactive GWAS QC tool focused on reproducibility. We use Amazon Web Services (AWS) machine images, adopting a dev/ops model to develop a version-controlled virtual appliance geared around the Jupyter Notebook. We demonstrate the reproducibility of CLINK using a large case/control study of primary open-angle glaucoma, a leading cause of irreversible blindness.



Brian S. Cole, PhD

Postdoctoral Researcher, University of Pennsylvania Institute for Biomedical Informatics, Associate Solutions Architect (AWS)