Skip to main content
Log in

Exact and approximation algorithms for sorting by reversals, with application to genome rearrangement

  • Published:
Algorithmica Aims and scope Submit manuscript

Abstract

Motivated by the problem in computational biology of reconstructing the series of chromosome inversions by which one organism evolved from another, we consider the problem of computing the shortest series of reversals that transform one permutation to another. The permutations describe the order of genes on corresponding chromosomes, and areversal takes an arbitrary substring of elements, and reverses their order.

For this problem, we develop two algorithms: a greedy approximation algorithm, that finds a solution provably close to optimal inO(n 2) time and0(n) space forn-element permutations, and a branch- and-bound exact algorithm, that finds an optimal solution in0(mL(n, n)) time and0(n 2) space, wherem is the size of the branch- and-bound search tree, andL(n, n) is the time to solve a linear program ofn variables andn constraints. The greedy algorithm is the first to come within a constant factor of the optimum; it guarantees a solution that uses no more than twice the minimum number of reversals. The lower and upper bounds of the branch- and-bound algorithm are a novel application of maximum-weight matchings, shortest paths, and linear programming.

In a series of experiments, we study the performance of an implementation on random permutations, and permutations generated by random reversals. For permutations differing byk random reversals, we find that the average upper bound on reversal distance estimatesk to within one reversal fork<1/2n andn<100. For the difficult case of random permutations, we find that the average difference between the upper and lower bounds is less than three reversals forn<50. Due to the tightness of these bounds, we can solve, to optimality, problems on 30 elements in a few minutes of computer time. This approaches the scale of mitochondrial genomes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Aigner, M., and D. B. West. Sorting by insertion of leading elements.Journal of Combinatorial Theory, Series A,45, 306–309, 1987.

    Article  MATH  MathSciNet  Google Scholar 

  2. Amato, N., M. Blum, S. Irani, and R. Rubinfeld. Reversing trains: a turn of the century sorting problem.Journal of Algorithms,10, 413–428, 1989.

    Article  MATH  MathSciNet  Google Scholar 

  3. Bafna, V., and P. A. Pevzner. Genome rearrangements and sorting by reversals.Proceedings of the 34th Symposium on Foundations of Computer Science, November 1993, pp. 148–157.

  4. Bibb, M. J., R. A. van Etten, C. T. Wright, M. W. Walberg, and D. A. Clayton. Sequence and gene organization of mouse mitochondrial DNA.Cell,26, 167–180, 1981.

    Article  Google Scholar 

  5. Dobzhansky, T.Genetics of the Evolutionary Process. Columbia University Press, New York, 1970.

    Google Scholar 

  6. Driscoll, J. R., and M. L. Furst. Computing short generator sequences.Information and Computation,72, 117–132, 1987.

    Article  MATH  MathSciNet  Google Scholar 

  7. Even, S., and O. Goldreich. The minimum-length generator sequence problem is NP-hard.Journal of Algorithms,2, 311–313, 1981.

    Article  MATH  MathSciNet  Google Scholar 

  8. Furst, M., J. Hopcroft, and E. Luks. Polynomial-time algorithms for permutation groups.Proceedings of the 21st Symposium on Foundations of Computer Science, 1980, pp. 36–41.

  9. Garey, M. R., and D. S. Johnson.Computers and Intractability: A Guide to The Theory of NP-Completeness. Freeman, New York, 1979.

    MATH  Google Scholar 

  10. Gates, W. H., and C. H. Papadimitriou. Bounds for sorting by prefix reversal.Discrete Mathematics,27, 47–57, 1979.

    Article  MathSciNet  Google Scholar 

  11. Golan, H. Personal communication, 1991.

  12. Jerrum, M. R. The complexity of finding minimum-length generator sequences.Theoretical Computer Science,36, 265–289, 1985.

    Article  MATH  MathSciNet  Google Scholar 

  13. Johnson, D. B. Finding all the elementary circuits of a directed graph.SIAM Journal on Computing,4(1), 77–84, 1975.

    Article  MATH  MathSciNet  Google Scholar 

  14. Kececioglu, J., and D. Sankoff. Exact and approximation algorithms for the inversion distance between two chromosomes.Proceedings of the 4th Symposium on Combinatorial Pattern Matching, Lecture Notes in Computer Science, vol. 684, Springer-Verlag, Berlin, June 1993, pp. 87–105. (An earlier version appeared as “Exact and approximation algorithms for sorting by reversals,” Technical Report 1824, Centre de recherches mathématiques, Université de Montréal, July 1992).

    Google Scholar 

  15. Kececioglu, J., and D. Sankoff. Efficient bounds for oriented chromosome-inversion distance.Proceedings of the 5th Symposium on Combinatorial Pattern Matching, Lecture Notes in Computer Science, vol. 807, Springer-Verlag, Berlin, June 1994, pp. 307–325.

    Google Scholar 

  16. Knuth, D. E.The Art of Computer Programming, Vol. 3. Addison-Wesley, Reading, MA, 1973.

    Google Scholar 

  17. Mannila, H. Measures of presortedness and optimal sorting algorithms,IEEE Transactions on Computers, 34, 318–325, 1985.

    Article  MATH  MathSciNet  Google Scholar 

  18. Micali, S. and V. Vazirani. Ano(√¦V¦·¦E¦) algorithm for finding maximum matchings in general graphs.Proceedings of the 21st Symposium on Foundations of Computer Science, 1980, pp. 17–27.

  19. Nadeau, J. H., and B. A. Taylor. Lengths of chromosomal segments conserved since divergence of man and mouse.Proceedings of the National Academy of Sciences of the USA,81, 814, 1984.

    Article  Google Scholar 

  20. O'Brien, S. J., ed.Genetic Maps: Locus Maps of Complex Genomes. 6th edition. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY, 1993.

    Google Scholar 

  21. Palmer, J. D., B. Osorio, and W. F. Thompson. Evolutionary significance of inversions in legume chloroplast DNAs.Current Genetics,14, 65–74, 1988.

    Article  Google Scholar 

  22. Sankoff, D., G. Leduc, N. Antoine, B. Paquin, B. F. Lang, and R. Cedergren. Gene order comparisons for phylogenetic inference: evolution of the mitochondrial genome.Proceedings of the National Academy of Sciences of the USA,89, 6575–6579, 1992.

    Article  Google Scholar 

  23. Schöniger, M., and M. S. Waterman. A local algorithm for DNA sequence alignment with inversions.Bulletin of Mathematical Biology,54, 521–536, 1992.

    MATH  Google Scholar 

  24. Sessions, S. K. Chromosomes: molecular cytogenetics. InMolecular Systematics, D. M. Hillis and C. Moritz, eds., Sinauer, Sunderland, MA, 1990, pp. 156–204.

    Google Scholar 

  25. Tichy, W. F. The string-to-string correction problem with block moves.ACM Transactions on Computer Systems,2(4), 309–321, 1984.

    Article  MathSciNet  Google Scholar 

  26. Wagner, R. A. On the complexity of the extended string-to-string correction problem. InTime Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison, D. Sankoft and J. B. Kruskal, eds., Addison-Wesley, Reading, MA, 1983, pp. 215–235.

    Google Scholar 

  27. Watterson, G. A., W. J. Ewens, T. E. Hall, and A. Morgan. The chromosome inversion problem.Journal of Theoretical Biology,99, 1–7, 1982.

    Article  Google Scholar 

  28. Wolstenholme, D. R., J. L. MacFarlane, R. Okimoto, D. O. Clary, and J. A. Wahieithner. Bizarre tRNAs inferred from DNA sequences of mitochondrial genomes of nematode worms.Proceedings of the National Academy of Sciences of the USA,84, 1324–1328, 1987.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Additional information

Communicated by E. W. Myers.

This research was supported by a postdoctoral fellowship from the Program in Mathematics and Molecular Biology of the University of California at Berkeley under National Science Foundation Grant DMS-8720208, and by a fellowship from the Centre de recherches mathématiques of the Université de Montréal.

This research was supported by grants from the Natural Sciences and Engineering Research Council of Canada, and the Fonds pour la formation de chercheurs et l'aide à la recherche (Québec). The author is a fellow of the Canadian Institute for Advanced Research.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kececioglu, J., Sankoff, D. Exact and approximation algorithms for sorting by reversals, with application to genome rearrangement. Algorithmica 13, 180–210 (1995). https://doi.org/10.1007/BF01188586

Download citation

  • Received:

  • Revised:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF01188586

Key words

Navigation