John D Kececioglu

Professor, Computer Science

Associate Professor, Applied Mathematics - GIDP

Associate Professor, Genetics - GIDP

Associate Professor, BIO5 Institute

Primary Department

Computer Science

Department Affiliations

Computer Science

BIO5 Institute

Contact

(520) 621-4526

kece@cs.arizona.edu

Work Summary

John Kececioglu's research is in applied algorithms, with an emphasis on bioinformatics and computational biology, including: multiple sequence alignment, inverse parametric alignment, sequence assembly, and genome rearrangement. Software developed by his group includes Opal, a tool for multiple sequence alignment, Facet, a tool for alignment accuracy estimation, InverseOpt, a library for inverse parametric optimization, Ipa, a tool for inverse sequence alignment, and AlignAlign, a tool for optimally aligning alignments.

Research Interest

John Kececioglu is an Associate Professor in the Department of Computer Science, and the BIO5 Institute. His research is in applied algorithms, especially for areas of bioinformatics and computational biology such as: multiple alignment, inverse alignment, sequence assembly, and genome rearrangement. Software developed by his group includes Opal, a tool for multiple sequence alignment; Facet, a tool for alignment accuracy estimation; Ipa, a tool for inverse sequence alignment; AlignAlign, a tool for optimally aligning alignments, and Ninja, a tool for evolutionary tree construction. John is a recipient of a National Science Foundation CAREER Award, serves as an Associate Editor for IEEE/ACM Transactions on Computational Biology and Bioinformatics, and is on the Editorial Board of Algorithms for Molecular Biology. He was Conference Chair for RECOMB 2009, and Program Committee Co-Chair in the area of Sequence Analysis for ISMB 2011 and BCB 2012. John served as Associate Head of the Department of Computer Science during 2012.

Publications

Improving the practical space and time efficiency of the shortest-paths approach to sum-of-pairs multiple sequence alignment.

Gupta, S. K., Kececioglu, J. D., & Schäffer, A. (1995). Improving the practical space and time efficiency of the shortest-paths approach to sum-of-pairs multiple sequence alignment.. Journal of computational biology : a journal of computational molecular cell biology, 2(3), 459-472.

PMID: 8521275;Abstract:

The MSA program, written and distributed in 1989, is one of the few existing programs that attempts to find optimal alignments of multiple protein or DNA sequences. The MSA program implements a branch-and-bound technique together with a variant of Dijkstra's shortest paths algorithm to prune the basic dynamic programming graph. We have made substantial improvements in the time and space usage of MSA. The improvements make feasible a variety of problem instances that were not feasible previously. On some runs we achieve an order of magnitude reduction in space usage and a significant multiplicative factor speedup in running time. To explain how these improvements work, we give a much more detailed description of MSA than has been previously available. In practice, MSA rarely produces a provably optimal alignment and we explain why.

Inverse sequence alignment from partial examples

Kim, E., & Kececioglu, J. (2007). Inverse sequence alignment from partial examples. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 4645 LNBI, 359-370.

Abstract:

When aligning biological sequences, the choice of parameter values for the alignment scoring function is critical. Small changes in gap penalties, for example, can yield radically different alignments. A rigorous way to compute parameter values that are appropriate for biological sequences is inverse parametric sequence alignment. Given a collection of examples of biologically correct alignments, this is the problem of finding parameter values that make the example alignments score close to optimal. We extend prior work on inverse alignment to partial examples and to an improved model based on minimizing the average error of the examples. Experiments on benchmark biological alignments show we can find parameters that generalize across protein families and that boost the recovery rate for multiple sequence alignment by up to 25%. © Springer-Verlag Berlin Heidelberg 2007.

Learning parameter-advising sets for multiple sequence alignment

DeBlasio, D. F., & Kececioglu, J. D. (2017). Learning parameter-advising sets for multiple sequence alignment. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 14(5), 1028-1041.

A tool for multiple sequence alignment

Lipman, D. J., Altschul, S. F., & Kececioglu, J. D. (1989). A tool for multiple sequence alignment. Proceedings of the National Academy of Sciences of the United States of America, 86(12), 4412-4415.

PMID: 2734293;PMCID: PMC287279;Abstract:

Multiple sequence alignment can be a useful technique for studying molecular evolution and analyzing sequence-structure relationships. Until recently, it has been impractical to apply dynamic programming, the most widely accepted method for producing pairwise alignments, to comparisons of more than three sequences. We describe the design and application of a tool for multiple alignment of amino acid sequences that implements a new algorithm that greatly reduces the computational demands of dynamic programming. This tool is able to align in reasonable time as many as eight sequences the length of an average protein.

Computing physical maps of chromosomes with nonoverlapping probes by branch-and-cut

Christof, T., & Kececioglu, J. (1999). Computing physical maps of chromosomes with nonoverlapping probes by branch-and-cut. Proceedings of the Annual International Conference on Computational Molecular Biology, RECOMB, 115-123.

Abstract:

We introduce a new combinatorial formulation of chromosome physical-mapping by the sampling-without-replacement protocol. In this protocol, which is simple, inexpensive, and has been used to successfully map several organisms, equal-length clones are hybridized against a subset of the clones called probes, which are designed to form a maximal nonoverlapping clone-subset. The output of the protocol is the clone-probe hybridization matrix H. The problem of finding a maximum-likelihood reconstruction of the order of the probes along the chromosome in the presence of false positive and negative hybridization error is equivalent to finding the minimum number of entries of H to change to zeros so that the resulting matrix has at most 2 ones per row, and the consecutive-ones property across rows. This combinatorial problem, which we call 2-Consecutive-Ones Mapping, has a concise integer linear-programming formulation, to which we apply techniques from polyhedral combinatorics. The formulation is unique in that it does not explicitly represent the probe permutation, and in contrast to prior linear-programming approaches, the number of variables is small: in practice, linear in the number of clones. We derive a large class of facet-defining inequalities for the 2-consecutive-ones polytope that we call the augmented k-degree inequalities, and we show that the basic k-degree class can be efficiently separated using bipartite matchings. Computational results with an implementation of the resulting branch-and-cut algorithm applied to both simulated and real data from the complete genome of Aspergillus nidulans show that we can solve many problems to provable optimality and find maps of higher quality than previously possible.

John D Kececioglu