Ryan N Gutenkunst

Ryan N Gutenkunst

Associate Department Head, Molecular and Cellular Biology
Associate Professor, Applied BioSciences - GIDP
Associate Professor, Applied Mathematics - GIDP
Associate Professor, Cancer Biology -
Associate Professor, Ecology and Evolutionary Biology
Associate Professor, Genetics - GIDP
Associate Professor, Molecular and Cellular Biology
Associate Professor, Public Health
Associate Professor, Statistics-GIDP
Associate Professor, BIO5 Institute
Member of the Graduate Faculty
Director, Graduate Studies
Primary Department
Contact
(520) 626-0569

Work Summary

We learn history from the genomes of humans, tumors, and other species. Our studies reveal how evolution works at the molecular level, offering fundamental insight into how humans and pathogens adapt to challenges.

Research Interest

The Gutenkunst group studies the function and evolution of the complex molecular networks that comprise life. To do so, they integrate computational population genomics, bioinformatics, and molecular evolution. They focus on developing new computational methods to extract biological insight from genomic data and applying those methods to understand population history and natural selection.

Publications

Hsieh, P., Hallmark, B., Watkins, J. C., Karafet, T. C., Osipova, L. P., Gutenkunst, R. N., & Hammer, M. F. (2017). Exome sequencing provides evidence of polygenic adaptation to a fat-rich animal diet in indigenous Siberian populations. Molecular Biology and Evolution, 34, 2914.
Robinson, J. D., Coffman, A. J., Hickerson, M. J., & Gutenkunst, R. N. (2014). Sampling strategies for frequency spectrum-based population genomic inference. BMC evolutionary biology, 14(1), 254.

BackgroundThe allele frequency spectrum (AFS) consists of counts of the number of single nucleotide polymorphism (SNP) loci with derived variants present at each given frequency in a sample. Multiple approaches have recently been developed for parameter estimation and calculation of model likelihoods based on the joint AFS from two or more populations. We conducted a simulation study of one of these approaches, implemented in the Python module ¿a¿i, to compare parameter estimation and model selection accuracy given different sample sizes under one- and two-population models.ResultsOur simulations included a variety of demographic models and two parameterizations that differed in the timing of events (divergence or size change). Using a number of SNPs reasonably obtained through next-generation sequencing approaches (10,000 - 50,000), accurate parameter estimates and model selection were possible for models with more ancient demographic events, even given relatively small numbers of sampled individuals. However, for recent events, larger numbers of individuals were required to achieve accuracy and precision in parameter estimates similar to that seen for models with older divergence or population size changes. We quantify i) the uncertainty in model selection, using tools from information theory, and ii) the accuracy and precision of parameter estimates, using the root mean squared error, as a function of the timing of demographic events, sample sizes used in the analysis, and complexity of the simulated models.ConclusionsHere, we illustrate the utility of the genome-wide AFS for estimating demographic history and provide recommendations to guide sampling in population genomics studies that seek to draw inference from the AFS. Our results indicate that larger samples of individuals (and thus larger AFS) provide greater power for model selection and parameter estimation for more recent demographic events.

Altshuler, D. L., Durbin, R. M., Abecasis, G. R., Bentley, D. R., Chakravarti, A., Clark, A. G., Collins, F. S., M., F., Donnelly, P., Egholm, M., Flicek, P., Gabriel, S. B., Gibbs, R. A., Knoppers, B. M., Lander, E. S., Lehrach, H., Mardis, E. R., McVean, G. A., Nickerson, D. A., , Peltonen, L., et al. (2010). A map of human genome variation from population-scale sequencing. Nature, 467(7319), 1061-1073.

PMID: 20981092;PMCID: PMC3042601;Abstract:

The 1000 Genomes Project aims to provide a deep characterization of human genome sequence variation as a foundation for investigating the relationship between genotype and phenotype. Here we present results of the pilot phase of the project, designed to develop and compare different strategies for genome-wide sequencing with high-throughput platforms. We undertook three projects: low-coverage whole-genome sequencing of 179 individuals from four populations; high-coverage sequencing of two mother-father-child trios; and exon-targeted sequencing of 697 individuals from seven populations. We describe the location, allele frequency and local haplotype structure of approximately 15 million single nucleotide polymorphisms, 1 million short insertions and deletions, and 20,000 structural variants, most of which were previously undescribed. We show that, because we have catalogued the vast majority of common variation, over 95% of the currently accessible variants found in any individual are present in this data set. On average, each person is found to carry approximately 250 to 300 loss-of-function variants in annotated genes and 50 to 100 variants previously implicated in inherited disorders. We demonstrate how these results can be used to inform association and functional studies. From the two trios, we directly estimate the rate of de novo germline base substitution mutations to be approximately 10 g-8 per base pair per generation. We explore the data with regard to signatures of natural selection, and identify a marked reduction of genetic variation in the neighbourhood of genes, due to selection at linked sites. These methods and public data will support the next phase of human genetic research. © 2010 Macmillan Publishers Limited. All rights reserved. © 2010 Macmillan Publishers Limited. All rights reserved.

Hermansen, R. A., Mannakee, B. K., Knecht, W., Liberles, D. A., & Gutenkunst, R. N. (2015). Characterizing selective pressures on the pathway for de novo biosynthesis of pyrimidines in yeast. BMC Evolutionary Biology, 15.
Colvin, J., Monine, M. I., Gutenkunst, R. N., Hlavacek, W. S., D., D., & Posner, R. G. (2010). RuleMonkey: Software for stochastic simulation of rule-based models. BMC Bioinformatics, 11.

PMID: 20673321;PMCID: PMC2921409;Abstract:

Background: The system-level dynamics of many molecular interactions, particularly protein-protein interactions, can be conveniently represented using reaction rules, which can be specified using model-specification languages, such as the BioNetGen language (BNGL). A set of rules implicitly defines a (bio)chemical reaction network. The reaction network implied by a set of rules is often very large, and as a result, generation of the network implied by rules tends to be computationally expensive. Moreover, the cost of many commonly used methods for simulating network dynamics is a function of network size. Together these factors have limited application of the rule-based modeling approach. Recently, several methods for simulating rule-based models have been developed that avoid the expensive step of network generation. The cost of these "network-free" simulation methods is independent of the number of reactions implied by rules. Software implementing such methods is now needed for the simulation and analysis of rule-based models of biochemical systems.Results: Here, we present a software tool called RuleMonkey, which implements a network-free method for simulation of rule-based models that is similar to Gillespie's method. The method is suitable for rule-based models that can be encoded in BNGL, including models with rules that have global application conditions, such as rules for intramolecular association reactions. In addition, the method is rejection free, unlike other network-free methods that introduce null events, i.e., steps in the simulation procedure that do not change the state of the reaction system being simulated. We verify that RuleMonkey produces correct simulation results, and we compare its performance against DYNSTOC, another BNGL-compliant tool for network-free simulation of rule-based models. We also compare RuleMonkey against problem-specific codes implementing network-free simulation methods.Conclusions: RuleMonkey enables the simulation of rule-based models for which the underlying reaction networks are large. It is typically faster than DYNSTOC for benchmark problems that we have examined. RuleMonkey is freely available as a stand-alone application http://public.tgen.org/rulemonkey. It is also available as a simulation engine within GetBonNie, a web-based environment for building, analyzing and sharing rule-based models. © 2010 Colvin et al; licensee BioMed Central Ltd.