Ryan N Gutenkunst

Ryan N Gutenkunst

Associate Department Head, Molecular and Cellular Biology
Associate Professor, Applied BioSciences - GIDP
Associate Professor, Applied Mathematics - GIDP
Associate Professor, Cancer Biology -
Associate Professor, Ecology and Evolutionary Biology
Associate Professor, Genetics - GIDP
Associate Professor, Molecular and Cellular Biology
Associate Professor, Public Health
Associate Professor, Statistics-GIDP
Associate Professor, BIO5 Institute
Member of the Graduate Faculty
Director, Graduate Studies
Primary Department
(520) 626-0569

Work Summary

We learn history from the genomes of humans, tumors, and other species. Our studies reveal how evolution works at the molecular level, offering fundamental insight into how humans and pathogens adapt to challenges.

Research Interest

The Gutenkunst group studies the function and evolution of the complex molecular networks that comprise life. To do so, they integrate computational population genomics, bioinformatics, and molecular evolution. They focus on developing new computational methods to extract biological insight from genomic data and applying those methods to understand population history and natural selection.


Locke, D. P., Hillier, L. W., Warren, W. C., Worley, K. C., Nazareth, L. V., Muzny, D. M., Yang, S., Wang, Z., Chinwalla, A. T., Minx, P., Mitreva, M., Cook, L., Delehaunty, K. D., Fronick, C., Schmidt, H., Fulton, L. A., Fulton, R. S., Nelson, J. O., Magrini, V., , Pohl, C., et al. (2011). Comparative and demographic analysis of orang-utan genomes. Nature, 469(7331), 529-33.

'Orang-utan' is derived from a Malay term meaning 'man of the forest' and aptly describes the southeast Asian great apes native to Sumatra and Borneo. The orang-utan species, Pongo abelii (Sumatran) and Pongo pygmaeus (Bornean), are the most phylogenetically distant great apes from humans, thereby providing an informative perspective on hominid evolution. Here we present a Sumatran orang-utan draft genome assembly and short read sequence data from five Sumatran and five Bornean orang-utan genomes. Our analyses reveal that, compared to other primates, the orang-utan genome has many unique features. Structural evolution of the orang-utan genome has proceeded much more slowly than other great apes, evidenced by fewer rearrangements, less segmental duplication, a lower rate of gene family turnover and surprisingly quiescent Alu repeats, which have played a major role in restructuring other primate genomes. We also describe a primate polymorphic neocentromere, found in both Pongo species, emphasizing the gradual evolution of orang-utan genome structure. Orang-utans have extremely low energy usage for a eutherian mammal, far lower than their hominid relatives. Adding their genome to the repertoire of sequenced primates illuminates new signals of positive selection in several pathways including glycolipid metabolism. From the population perspective, both Pongo species are deeply diverse; however, Sumatran individuals possess greater diversity than their Bornean counterparts, and more species-specific variation. Our estimate of Bornean/Sumatran speciation time, 400,000 years ago, is more recent than most previous studies and underscores the complexity of the orang-utan speciation process. Despite a smaller modern census population size, the Sumatran effective population size (N(e)) expanded exponentially relative to the ancestral N(e) after the split, while Bornean N(e) declined over the same period. Overall, the resources and analyses presented here offer new opportunities in evolutionary genomics, insights into hominid biology, and an extensive database of variation for conservation efforts.

Gutenkunst, R., Newlands, N., Lutcavage, M., & Edelstein-Keshet, L. (2007). Inferring resource distributions from Atlantic bluefin tuna movements: An analysis based on net displacement and length of track. Journal of Theoretical Biology, 245(2), 243-257.

PMID: 17140603;Abstract:

We use observed movement tracks of Atlantic bluefin tuna in the Gulf of Maine and mathematical modeling of this movement to identify possible resource patches. We infer bounds on the overall sizes and distribution of such patches, even though they are difficult to quantify by direct observation in situ. To do so, we segment individual fish tracks into intervals of distinct motion types based on the ratio of net displacement to length of track (Δ D / Δ L) over a time window Δ t. To find the best segmentation, we optimize the fit of a random-walk movement model to each motion type. We compare results from two distinct movement models: biased turning and biased speed, to check the model-dependence of our inferences, and find that uncertainty in choice of movement model dominates the uncertainties of our conclusions. We find that our data are best described using two motion types: "localized" (Δ D / Δ L small) and "long-ranged" (Δ D / Δ L large). The biased turning model leads to significantly better resolution of localized movement intervals than the biased speed model. We hypothesize that localized movement corresponds to exploitation of resource patches. Comparison with visual behavior observations made during tracking suggests that many inferred intervals of localized motion do indeed correspond to feeding activity. From our analysis, we estimate that, on average, bluefin tuna in the Gulf of Maine encounter a resource patch every 2 h, that those patches have an average radius of 0.7-1.2 km, and that, overall, there are at most 5-9 such patches per 100 km2 in the region studied. © 2006 Elsevier Ltd. All rights reserved.

Ragsdale, A. P., Coffman, A. J., Hsieh, P., Struck, T. J., & Gutenkunst, R. N. (2016). Triallelic Population Genomics for Inferring Correlated Fitness Effects of Same Site Nonsynonymous Mutations. Genetics, 203(1), 513-23.

The distribution of mutational effects on fitness is central to evolutionary genetics. Typical univariate distributions, however, cannot model the effects of multiple mutations at the same site, so we introduce a model in which mutations at the same site have correlated fitness effects. To infer the strength of that correlation, we developed a diffusion approximation to the triallelic frequency spectrum, which we applied to data from Drosophila melanogaster We found a moderate positive correlation between the fitness effects of nonsynonymous mutations at the same codon, suggesting that both mutation identity and location are important for determining fitness effects in proteins. We validated our approach by comparing it to biochemical mutational scanning experiments, finding strong quantitative agreement, even between different organisms. We also found that the correlation of mutational fitness effects was not affected by protein solvent exposure or structural disorder. Together, our results suggest that the correlation of fitness effects at the same site is a previously overlooked yet fundamental property of protein evolution.

Mannakee, B. K., Balaji, U., Witkiewicz, A. K., Gutenkunst, R. N., & Knudsen, E. S. (2018). Sensitive and specific post-call filtering of genetic variants in xenograft and primary tumors. Bioinformatics.

Tumor genome sequencing offers great promise for guiding research and therapy, but spurious variant calls can arise from multiple sources. Mouse contamination can generate many spurious calls when sequencing patient-derived xenografts (PDXs). Paralogous genome sequences can also generate spurious calls when sequencing any tumor. We developed a BLAST-based algorithm, MAPEX, to identify and filter out spurious calls from both these sources.

Gravel, S., Henn, B. M., Gutenkunst, R. N., Indap, A. R., Marth, G. T., Clark, A. G., Fuli, Y. u., Gibbs, R. A., & Bustamante, C. D. (2011). Demographic history and rare allele sharing among human populations. Proceedings of the National Academy of Sciences of the United States of America, 108(29), 11983-11988.

PMID: 21730125;PMCID: PMC3142009;Abstract:

High-throughput sequencing technology enables population-level surveys of human genomic variation. Here, we examine the joint allele frequency distributions across continental human populations and present an approach for combining complementary aspects of whole-genome, low-coverage data and targeted highcoverage data. We apply this approach to data generated by the pilot phase of the Thousand Genomes Project, including wholegenome 2-4x coverage data for 179 samples from HapMap European, Asian, and African panels as well as high-coverage target sequencing of the exons of 800 genes from 697 individuals in seven populations. We use the site frequency spectra obtained from these data to infer demographic parameters for an Out-of-Africa model for populations of African, European, and Asian descent and to predict, by a jackknife-based approach, the amount of genetic diversity that will be discovered as sample sizes are increased. We predict that the number of discovered nonsynonymous coding variants will reach 100,000 in each population after ∼1,000 sequenced chromosomes per population, whereas ∼2,500 chromosomes will be needed for the same number of synonymous variants. Beyond this point, the number of segregating sites in the European and Asian panel populations is expected to overcome that of the African panel because of faster recent population growth. Overall, we find that the majority of human genomic variable sites are rare and exhibit little sharing among diverged populations. Our results emphasize that replication of disease association for specific rare genetic variants across diverged populations must overcome both reduced statistical power because of rarity and higher population divergence.