PPM1-1229-150022
Project Information |
---|
Proposal: Number PPM1-1229-150022 Program Cycle : PPM 01 Submitting Institution Name : Sidra Medicine
Project Status : Award Tech. Completed Start Date : 3/11/2016 Lead Investigator : Dr. Khalid Fakhro Project Duration : 2 Year(s) End Date : 2/3/2020 Submission Type : New Proposal Title : A high resolution map of structural variation in Qatari genomes and their contribution to quantitative traits and disease |
Project Summary | |||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Proposal Description: The rapid proliferation of next generation technologies over the past decade has enabled whole genome sequencing (WGS) to advance at a phenomenal scale, uncovering remarkable levels of genomic diversity among humans. Surprisingly, though global efforts such as the 1000 Genomes projects have identified millions of variants in different multi-ethnic populations, the Arab world remains poorly represented in public databases [1-3]. We recently demonstrated that a subset of the Qatari population represents one of the most ancient and genetically diverse populations in the world [4], explaining the extensive genetic heterogeneity underlying disease in this part of the world [5-8]. As such, it is of paramount importance to generate definitive genetic variant databases for this population if promises of precision health and personalized medicine are to be achieved in Arabs. The Qatar Genome Program (QGP) represents one striking attempt to achieve this goal. In its pilot phase, >2,500 genomes will be sequenced and analyzed in light of the extensive deep phenotyping available at the Qatar Biobank. In addition to single- and multi- nucleotide variants called in this dataset, there will be an urgent need to continue our work on generating a comprehensive Structural Variant (SV) map for the Qatari population [9]. This will have implications not only for human health and disease (e.g. finding ‘knockout’ humans with homozygous genic deletions), but also for generating the backbone structure for a Qatari-specific reference genome in the future. This proposal transcends our previous work and is novel on many levels, primarily due to the technical challenges of SV prediction in such a large WGS dataset. Indeed, unlike small-variant calling, for which best practices are well established [10], there are no standardized pipelines for SV calling from thousands of genomes yet. Additionally, there is no unified environment yet for the management and analysis of big genomics datasets required to achieve this goal. While the R Project for Statistical Computing has had a fundamental role in supporting the development of methods for statistical-genetics analyses of high throughput datasets, the processing of large WGS data often surpasses R’s capability and is therefore carried out through standalone programs. We identified in the ROOT framework developed by European Council for Nuclear Research (CERN) a potential platform capable of management, compression, and analyses of such large datasets. ROOT is an object-oriented framework developed in C++, originally conceived in the high-energy particle physics community where it is routine to store, analyze and visualize petabytes of data in an efficient, collaborative way. In ROOT the data are stored as instances of C++ classes, in a hierarchical object-oriented database optimized for data analysis and thus are highly compressed. The ROOT framework can call R functions, and allow sharing of novel libraries and functions among users, akin to the philosophy underlying R’s open-source, community driven success. Moreover, analysis on ROOT can be performed in parallel on clusters of computers or multi-core machines even if they are located in different geographical locations – ideal for our proposed collaboration. Our project can therefore be summarized with the following 3 objectives: 1) In collaboration with the CERN we will implement a novel data representation (including relevant wrappers for standard read data format) optimized for the compression and storage of 2,500 Qatari subjects’ WGS data in ROOT. We will also adapt SV prediction-algorithms that use combined detection approaches to work with ROOT. 2) Through common efforts between CERN and Sidra data centers the SV prediction will be carried out using the ROOT distributed parallel processing environment. We will integrate resulting SV call files to generate a comprehensive map of SVs at base-pair resolution in the Qatari population. We will be supported by the Database of Genomic Variants (dgv.tcag.ca) to host this rich dataset and serve it to the research community through a web-based browser for Qatari SVs [cite: 24174537]. 3) We will run a genome-wide association analysis to investigate the contribution of SVs to Cardiovascular Disease (CVD) risk factors in the Qatari population (which have been extensively collected by the Qatar Biobank). We will then attempt to replicate highly significantly associated SVs in the TwinsUK cohort –an independent cohort of >3,000 individuals from a different genetic background under different environmental pressures for whom WGS and similar CVD-related traits are already available. Replicated SVs will be validated by wet lab analysis (qPCR/long range PCR) and their breakpoints confirmed. Altogether, we believe our project will have high impact for both basic science and translational medicine. First, we will provide a proof-of-concept for the use of the ROOT framework as a shared platform for the management and analysis of big genomics data. This could be of immense value to the community as we enter the WGS era, in which raw data will grow to unprecedented sizes and multinational collaborations become the norm. Second, we will generate the definitive database of structural variants in Qataris, an evolutionarily ancient population sharing extensive ancestry with the rest of the Arab world. Third, we will leverage the deep phenotyping on these samples to estimate the contribution of SVs to CVD and related cardio-metabolic traits, a leading cause of death and morbidity worldwide. Finally, we will involve and develop trainees to contribute to a knowledge-based economy in Qatar. Thus, the sum of our study will be greater than its individual parts, and will have a profound impact on personalized medicine and precision health in Qatar. Research Area Keywords: Cardiovascualr disease; Copy number variation; Single nucleotide polymorphism; Qatari Genome project; Rare diseases Research Area Keywords by PM: traits ; genetic heterogeneity; heterogeneity Research Type Translational Research / Experimental Development
|
Project Summary | |||||||||
---|---|---|---|---|---|---|---|---|---|
|
Personnel | |||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
Output |
---|