## John D Kececioglu

**Primary Department**

**Department Affiliations**

Associate Professor, Applied Mathematics - GIDP

Associate Professor, BIO5 Institute

Associate Professor, Genetics - GIDP

Professor, Computer Science

Work Summary

John Kececioglu's research is in applied algorithms, with an emphasis on bioinformatics and computational biology, including: multiple sequence alignment, inverse parametric alignment,
sequence assembly, and
genome rearrangement. Software developed by his group includes Opal, a tool for multiple sequence alignment,
Facet, a tool for alignment accuracy estimation, InverseOpt, a library for inverse parametric optimization, Ipa, a tool for inverse sequence alignment, and AlignAlign, a tool for optimally aligning alignments.

Kececioglu, J., & Gusfield, D. (1994). Reconstructing a history of recombinations from a set of sequences. Proceedings of the Annual ACM SIAM Symposium on Discrete Algorithms, 471-480.

Abstract:

One of the classic problems in computational biology is the reconstruction of evolutionary histories. A recent trend is toward increasing the explanatory power of the models by incorporating higher-order evolutionary events that more accurately reflect the full range of mutation at the molecular level. In this paper, we take a step in this direction by considering the problem of reconstructing an evolutionary history for a set of genetic sequences that have evolved by recombination. Recombination produces a new sequence by crossing two parent sequences, and is among the most important mechanisms of higher-order molecular mutation.

Christof, T., Juenger, M., Kececioglu, J., Mutzel, P., & Reinelt, G. (1997). Branch-and-cut approach to physical mapping with end-probes. Proceedings of thr Annual International Conference on Computational Molecular Biology, RECOMB, 84-92.

Abstract:

A fundamental problem in computational biology is the construction of physical maps of chromosomes from hybridization experiments between unique probes and clones of chromosome fragments in the presence of error. Alizadeh, Karp, Weisser and Zweig [AKWZ94] first considered a maximum-likelihood model of the problem that is equivalent to finding an ordering of the probes that minimizes a weighted sum of errors, and developed several effective heuristics. We show that by exploiting information about the end-probes of clones, this model can be formulated as a weighted Betweenness Problem. This affords the significant advantage of allowing the well-developed tools of integer linear-programming and branch-and-cut algorithms to be brought to bear on physical mapping, enabling us for the first time to solve small mapping instances to optimality even in the presence of high error. We also show that by combining the optimal solution of many small overlapping Betweenness Problems, one can effectively screen errors from larger instances, and solve the edited instance to optimality as a Hamming-Distance Traveling Salesman Problem. This suggests a new combined approach to physical map construction.

Ravi, R., & Kececioglu, J. D. (1998). Approximation algorithms for multiple sequence alignment under a fixed evolutionary tree. Discrete Applied Mathematics, 88(1-3), 355-366.

Abstract:

We consider the problem of multiple sequence alignment under a fixed evolutionary tree: given a tree whose leaves are labeled by sequences, find ancestral sequences to label its internal nodes so as to minimize the total length of the tree, where the length of an edge is the edit distance between the sequences labeling its endpoints. We present a new polynomial-time approximation algorithm for this problem, and analyze its performance on regular d-ary trees with d a constant. On such a tree, the algorithm finds a solution within a factor (d + 1)/(d - 1) of the minimum in Q(k^{d}T(d, n)+k^{2d}n^{2}) time, where k is the number of leaves in the tree, n is the length of the longest sequence labeling a leaf, and T(d, n) is the time to compute a Steiner point for d sequences of length at most n. (A Steiner point for a set script capital L sign of sequences is a sequence P that minimizes the sum of the edit distances from P to each sequence in script capital L sign. The time T(d, n) is O(d2^{d}n^{d}), given O(ds^{d+l})-time preprocessing for an alphabet of size s.) The approximation algorithm is conceptually simple and easy to implement, and actually applies to any metric space in which a Steiner point for any fixed-sized set can be computed in polynomial time. We also introduce a new problem, bottleneck tree-alignment, in which the objective is to label the internal nodes of the tree so as to minimize the length of the longest edge. We describe an exponential-time exact algorithm for the case of unit-cost edit operations, and show there is a simple linear-time approximation algorithm for the general case that finds a solution within a factor O(log k) of the minimum. 1998 Published by Elsevier Science B.V. All rights reserved.

Kececioglu, J. D., Lenhof, H., Mehlhorn, K., Mutzel, P., Reinert, K., & Vingron, M. (2000). A polyhedral approach to sequence alignment problems. Discrete Applied Mathematics, 104(1-3), 143-186.

Abstract:

We study two new problems in sequence alignment both from a practical and a theoretical view, using tools from combinatorial optimization to develop branch-and-cut algorithms. The generalized maximum trace formulation captures several forms of multiple sequence alignment problems in a common framework, among them the original formulation of maximum trace. The RNA sequence alignment problem captures the comparison of RNA molecules on the basis of their primary sequence and their secondary structure. Both problems have a characterization in terms of graphs which we reformulate in terms of integer linear programming. We then study the polytopes (or convex hulls of all feasible solutions) associated with the integer linear program for both problems. For each polytope we derive several classes of facet-defining inequalities and show that for some of these classes the corresponding separation problem can be solved in polynomial time. This leads to a polynomial-time algorithm for pairwise sequence alignment that is not based on dynamic programming. Moreover, for multiple sequences the branch-and-cut algorithms for both sequence alignment problems are able to solve to optimality instances that are beyond the range of present dynamic programming approaches. © 2000 Elsevier Science B.V.

Kececioglu, J., Ming, L. i., & Tromp, J. (1997). Inferring a DNA sequence from erroneous copies. Theoretical Computer Science, 185(1), 3-13.

Abstract:

We suggest a novel approach for efficiently reconstructing an original DNA sequence from erroneous copies.