Yann C Klimentidis

Yann C Klimentidis

Associate Professor, Public Health
Assistant Professor, Genetics - GIDP
Associate Professor, BIO5 Institute
Primary Department
Contact
(520) 621-0147

Work Summary

I use human genetic data to find associations of genetic markers with complex traits and diseases, to shed light on disease pathophysiology, causal pathways, and health disparities, and to inform precision medicine.

Research Interest

Yann C. Klimentidis, PhD, is an Associate Professor in the Department of Epidemiology and Biostatistics in the Mel and Enid Zuckerman College of Public Health at the University of Arizona. His research centers on improving our understanding of the links between genetic variation, lifestyle factors, metabolic disease, and health disparities. In the past, he has used measures of genetic admixture and genomic tests of natural selection to understand the genetic basis of population differences in disease susceptibility. His most recent work examines the use various statistical approaches for the analysis of high-dimensional genetic data for improving prediction of genetic susceptibility to type-2 diabetes. In addition, his work examines gene-by-lifestyle interactions in type-2 diabetes, as well as understanding the causal links between metabolic traits such as dyslipidemia and type-2 diabetes. Keywords: Genetics, epidemiology, Cardiometabolic disease, Physical activity

Publications

de Los Campos, G., Vazquez, A. I., Fernando, R., Klimentidis, Y. C., & Sorensen, D. (2013). Prediction of complex human traits using the genomic best linear unbiased predictor. PLoS genetics, 9(7).

Despite important advances from Genome Wide Association Studies (GWAS), for most complex human traits and diseases, a sizable proportion of genetic variance remains unexplained and prediction accuracy (PA) is usually low. Evidence suggests that PA can be improved using Whole-Genome Regression (WGR) models where phenotypes are regressed on hundreds of thousands of variants simultaneously. The Genomic Best Linear Unbiased Prediction (G-BLUP, a ridge-regression type method) is a commonly used WGR method and has shown good predictive performance when applied to plant and animal breeding populations. However, breeding and human populations differ greatly in a number of factors that can affect the predictive performance of G-BLUP. Using theory, simulations, and real data analysis, we study the performance of G-BLUP when applied to data from related and unrelated human subjects. Under perfect linkage disequilibrium (LD) between markers and QTL, the prediction R-squared (R(2)) of G-BLUP reaches trait-heritability, asymptotically. However, under imperfect LD between markers and QTL, prediction R(2) based on G-BLUP has a much lower upper bound. We show that the minimum decrease in prediction accuracy caused by imperfect LD between markers and QTL is given by (1-b)(2), where b is the regression of marker-derived genomic relationships on those realized at causal loci. For pairs of related individuals, due to within-family disequilibrium, the patterns of realized genomic similarity are similar across the genome; therefore b is close to one inducing small decrease in R(2). However, with distantly related individuals b reaches very low values imposing a very low upper bound on prediction R(2). Our simulations suggest that for the analysis of data from unrelated individuals, the asymptotic upper bound on R(2) may be of the order of 20% of the trait heritability. We show how PA can be enhanced with use of variable selection or differential shrinkage of estimates of marker effects.

Klimentidis, Y. C., Arora, A., Zhou, J., Kittles, R., & Allison, D. B. (2016). The Genetic Contribution of West-African Ancestry to Protection against Central Obesity in African-American Men but Not Women: Results from the ARIC and MESA Studies. Frontiers in genetics, 7, 89.

Over 80% of African-American (AA) women are overweight or obese. A large racial disparity between AA and European-Americans (EA) in obesity rates exists among women, but curiously not among men. Although socio-economic and/or cultural factors may partly account for this race-by-sex interaction, the potential involvement of genetic factors has not yet been investigated. Among 2814 self-identified AA in the Atherosclerosis Risk in Communities study, we estimated each individual's degree of West-African genetic ancestry using 3437 ancestry informative markers. We then tested whether sex modifies the association between West-African genetic ancestry and body mass index (BMI), waist-circumference (WC), and waist-to-hip ratio (WHR), adjusting for income and education levels, and examined associations of ancestry with the phenotypes separately in males and females. We replicated our findings in the Multi-Ethnic Study of Atherosclerosis (n = 1611 AA). In both studies, we find that West-African ancestry is negatively associated with obesity, especially central obesity, among AA men, but not among AA women (pinteraction = 4.14 × 10(-5) in pooled analysis of WHR). In conclusion, our results suggest that the combination of male gender and West-African genetic ancestry is associated with protection against central adiposity, and suggest that the large racial disparity that exists among women, but not men, may be at least partly attributed to genetic factors.

Chen, G., Liu, N., Klimentidis, Y. C., Zhu, X., Zhi, D., Wang, X., & Lou, X. (2013). A unified GMDR method for detecting gene-gene interactions in family and unrelated samples with application to nicotine dependence. Human genetics.

Gene-gene and gene-environment interactions govern a substantial portion of the variation in complex traits and diseases. In convention, a set of either unrelated or family samples are used in detection of such interactions; even when both kinds of data are available, the unrelated and the family samples are analyzed separately, potentially leading to loss in statistical power. In this report, to detect gene-gene interactions we propose a generalized multifactor dimensionality reduction method that unifies analyses of nuclear families and unrelated subjects within the same statistical framework. We used principal components as genetic background controls against population stratification, and when sibling data are included, within-family control were used to correct for potential spurious association at the tested loci. Through comprehensive simulations, we demonstrate that the proposed method can remarkably increase power by pooling unrelated and offspring's samples together as compared with individual analysis strategies and the Fisher's combining p value method while it retains a controlled type I error rate in the presence of population structure. In application to a real dataset, we detected one significant tetragenic interaction among CHRNA4, CHRNB2, BDNF, and NTRK2 associated with nicotine dependence in the Study of Addiction: Genetics and Environment sample, suggesting the biological role of these genes in nicotine dependence development.

Lebrón-Aldea, D., Dhurandhar, E. J., Pérez-Rodríguez, P., Klimentidis, Y. C., Tiwari, H. K., & Vazquez, A. I. (2015). Integrated genomic and BMI analysis for type 2 diabetes risk assessment. Frontiers in genetics, 6, 75.

Type 2 Diabetes (T2D) is a chronic disease arising from the development of insulin absence or resistance within the body, and a complex interplay of environmental and genetic factors. The incidence of T2D has increased throughout the last few decades, together with the occurrence of the obesity epidemic. The consideration of variants identified by Genome Wide Association Studies (GWAS) into risk assessment models for T2D could aid in the identification of at-risk patients who could benefit from preventive medicine. In this study, we build several risk assessment models, evaluated with two different classification approaches (Logistic Regression and Neural Networks), to measure the effect of including genetic information in the prediction of T2D. We used data from to the Original and the Offspring cohorts of the Framingham Heart Study, which provides phenotypic and genetic information for 5245 subjects (4306 controls and 939 cases). Models were built by using several covariates: gender, exposure time, cohort, body mass index (BMI), and 65 SNPs associated to T2D. We fitted Logistic Regressions and Bayesian Regularized Neural Networks and then assessed their predictive ability by using a ten-fold cross validation. We found that the inclusion of genetic information into the risk assessment models increased the predictive ability by 2%, when compared to the baseline model. Furthermore, the models that included BMI at the onset of diabetes as a possible effector, gave an improvement of 6% in the area under the curve derived from the ROC analysis. The highest AUC achieved (0.75) belonged to the model that included BMI, and a genetic score based on the 65 established T2D-associated SNPs. Finally, the inclusion of SNPs and BMI raised predictive ability in all models as expected; however, results from the AUC in Neural Networks and Logistic Regression did not differ significantly in their prediction accuracy.

Klimentidis, Y. C., Chougule, A., Arora, A., Frazier-Wood, A. C., & Hsu, C. (2015). Triglyceride-Increasing Alleles Associated with Protection against Type-2 Diabetes. PLoS genetics, 11(5), e1005204.

Elevated plasma triglyceride (TG) levels are an established risk factor for type-2 diabetes (T2D). However, recent studies have hinted at the possibility that genetic risk for TG may paradoxically protect against T2D. In this study, we examined the association of genetic risk for TG with incident T2D, and the interaction of baseline TG with TG genetic risk on incident T2D in 13,247 European-Americans (EA) and 3,238 African-Americans (AA) from three prospective cohort studies. A TG genetic risk score (GRS) was calculated based on 31 validated single nucleotide polymorphisms (SNPs). We considered several baseline covariates, including body- mass index (BMI) and lipid traits. Among EA and AA, we find, as expected, that baseline levels of TG are strongly positively associated with incident T2D (p2 x 10-(10)). However, the TG GRS is negatively associated with T2D (p=0.013), upon adjusting for only race, in the full dataset. Upon additionally adjusting for age, sex, BMI, high-density lipoprotein cholesterol and TG, the TG GRS is significantly and negatively associated with T2D incidence (p=7.0 x 10(-8)), with similar trends among both EA and AA. No single SNP appears to be driving this association. We also find a significant statistical interaction of the TG GRS with TG (pi(nteraction) = 3.3 x 10-(4)), whereby the association of TG with incident T2D is strongest among those with low genetic risk for TG. Further research is needed to understand the likely pleiotropic mechanisms underlying these findings, and to clarify the causal relationship between T2D and TG.