Deblasio, D. F., Wheeler, T. J., & Kececioglu, J. D. (2012). Estimating the accuracy of multiple alignments and its use in parameter advising. Proceedings of RECOMB 2012 (Springer-Verlag Lecture Notes in Bioinformatics), 7262 LNBI, 45-59.
Abstract:
We develop a novel and general approach to estimating the accuracy of protein multiple sequence alignments without knowledge of a reference alignment, and use our approach to address a new problem that we call parameter advising. For protein alignments, we consider twelve independent features that contribute to a quality alignment. An accuracy estimator is learned that is a polynomial function of these features; its coefficients are determined by minimizing its error with respect to true accuracy using mathematical optimization. We evaluate this approach by applying it to the task of parameter advising: the problem of choosing alignment scoring parameters from a collection of parameter values to maximize the accuracy of a computed alignment. Our estimator, which we call Facet (for "feature-based accuracy estimator"), yields a parameter advisor that on the hardest benchmarks provides more than a 20% improvement in accuracy over the best default parameter choice, and outperforms the best prior approaches to selecting good alignments for parameter advising. © 2012 Springer-Verlag Berlin Heidelberg.
Taylor, E. W., Bhat, A., Nadimpalli, R. G., Zhang, W., & Kececioglu, J. (1997). HIV-1 encodes a sequence overlapping env gp41 with highly significant similarity to selenium-dependent glutathione peroxidases.. Journal of acquired immune deficiency syndromes and human retrovirology : official publication of the International Retrovirology Association, 15(5), 393-394.
Reinert, K., Lenhof, H. -., Mutzel, P., Mehlhorn, K., & Kececioglu, J. D. (1997). Branch-and-cut algorithm for multiple sequence alignment. Proceedings of thr Annual International Conference on Computational Molecular Biology, RECOMB, 241-250.
Abstract:
Multiple sequence alignment is an important problem in computational biology. We study the Maximum Trace formulation introduced by Kececioglu [Kec91]. We first phrase the problem in terms of forbidden subgraphs, which enables us to express Maximum Trace as an integer linear-programming problem, and then solve the integer linear program using methods from polyhedral combinatorics. The trace polytope is the convex hull of all feasible solutions to the Maximum Trace problem; for the case of two sequences, we give a complete characterization of this polytope. This yields a polynomial-time algorithm for a general version of pairwise sequence alignment that, perhaps suprisingly, does not use dynamic programming; this yields, for instance, a non-dynamic-programming algorithm for sequence comparison under the 0-1 metric, which gives another answer to a long-open question in the area of string algorithms [PW93]. For the multiple-sequence case, we derive several classes of facet-defining inequalities and show that for all but one class, the corresponding separation problem can be solved in polynomial time. This leads to a branch-and-cut algorithm for multiple sequence alignment, and we report on our first computational experience. It appears that a polyhedral approach to multiple sequence alignment can solve instances that are beyond present dynamic-programming approaches.
Kececioglu, J., & Deblasio, D. (2013). Accuracy estimation and parameter advising for protein multiple sequence alignment. Journal of Computational Biology, 20(4), 259-279.
PMID: 23489379;PMCID: PMC3619150;Abstract:
We develop a novel and general approach to estimating the accuracy of multiple sequence alignments without knowledge of a reference alignment, and use our approach to address a new task that we call parameter advising: the problem of choosing values for alignment scoring function parameters from a given set of choices to maximize the accuracy of a computed alignment. For protein alignments, we consider twelve independent features that contribute to a quality alignment. An accuracy estimator is learned that is a polynomial function of these features; its coefficients are determined by minimizing its error with respect to true accuracy using mathematical optimization. Compared to prior approaches for estimating accuracy, our new approach (a) introduces novel feature functions that measure nonlocal properties of an alignment yet are fast to evaluate, (b) considers more general classes of estimators beyond linear combinations of features, and (c) develops new regression formulations for learning an estimator from examples; in addition, for parameter advising, we (d) determine the optimal parameter set of a given cardinality, which specifies the best parameter values from which to choose. Our estimator, which we call Facet (for "feature-based accuracy estimator"), yields a parameter advisor that on the hardest benchmarks provides more than a 27% improvement in accuracy over the best default parameter choice, and for parameter advising significantly outperforms the best prior approaches to assessing alignment quality. © Copyright 2013, Mary Ann Liebert, Inc. 2013.
Kececioglu, J., Shete, S., & Arnold, J. (2000). Reconstructing distances in physical maps of chromosomes with nonoverlapping probes. Proceedings of the Annual International Conference on Computational Molecular Biology, RECOMB, 183-192.
Abstract:
We present a new method for reconstructing the distances between probes in physical maps of chromosomes constructed by hybridizing pairs of clones under the so-called sampling-without-replacement protocol. In this protocol, which is simple, inexpensive, and has been used to successfully map several organisms, equal-length clones are hybridized against a clone-subset called the probes. The probes are chosen by a sequential process that is designed to generate a pairwise-nonoverlapping subset of the clones. We derive a likelihood function on probe spacings and orders for this protocol under a natural model of hybridization error, and describe how to reconstruct the most likely spacing for a given order under this objective using continuous optimization. The approach is tested on simulated data and real data from chromosome VI of Aspergillus nidulans. On simulated data we recover the true order and close to the true spacing; on the real data, for which the true order and spacing is unknown, we recover a probe order differing significantly from the published one. To our knowledge this is the first practical approach for computing a globally-optimal maximum-likelihood reconstruction of interprobe distances from clone-probe hybridization data.