Large-scale genetic data with linked health records represent a major untapped resource for understanding genetic mechanisms of human health and disease. Our lab takes three general directions, each leveraging large-scale and longitudinal data to improve human health:
Most biobanks contain a substantial amount of cryptic relatedness, which can introduce bias or reduce statistical power in traditional GWAS and is often considered as a nuisance variable. However, these patterns of relatedness can also be utilized for gene mapping. Since shared segments are inherited from the same ancestor, rare, pathogenic variants within a shared region are likely to be shared. Because of this, comparison of the distribution of shared segments within cases and controls can be used for gene mapping. We are focusing on developing and implementing shared segment approaches for identifying novel disease genes in large-scale biobank data.
Traditional epidemiological studies typically have cross-sectional measurements or measurements from study visits over a short follow-up time, while biobanks’ linked longitudinal health records can contain repeated measures spanning decades. Extracting optimal values for analysis (e.g., highly heritable or predictive of disease risk) is a major challenge in biobank studies. Longitudinal data can also be used to study phenotypic trajectories across the lifespan. As a result, my group's goal is to create algorithms that capture powerful features of repeated measurements and use them to further our understanding of disease genetic mechanisms.
Gene expression and other omics are the product of complex interactions between genetic regulation and environmental stimuli. Analysis of these longitudinal measures will increase understanding of genetic effects and response to environmental changes during disease pathogenesis. My lab works with existing cohorts to develop multi-omic profiling approaches for studying correlations between omics patterns and disease incidence.