Bioinformatics Vol. 17 no. 2 2001
Pages 149-154
© 2001 Oxford University Press
Original Paper |
An information-based sequence distance and its application to whole mitochondrial genome phylogeny
1 Bioinformatics Laboratory, Computer
Science Department, University of Waterloo, N2L 3G1, Canada
2 Department of Computer Science, City
University of Hong Kong, Kowloon, Hong Kong
Received on July 19, 2000
; revised on October 5, 2000
; accepted on October 11, 2000
Motivation: Traditional sequence distances require an alignment and therefore are not directly applicable to the problem of whole genome phylogeny where events such as rearrangements make full length alignments impossible. We present a sequence distance that works on unaligned sequences using the information theoretical concept of Kolmogorov complexity and a program to estimate this distance.
Results: We establish the mathematical foundations of our distance and illustrate its use by constructing a phylogeny of the Eutherian orders using complete unaligned mitochondrial genomes. This phylogeny is consistent with the commonly accepted one for the Eutherians. A second, larger mammalian dataset is also analyzed, yielding a phylogeny generally consistent with the commonly accepted one for the mammals.
Availability: The program to estimate our sequence distance, is available at http://www.cs.cityu.edu.hk/~cssamk/gencomp/GenCompress1.htm. The distance matrices used to generate our phylogenies are available at http://www.math.uwaterloo.ca/~mli/distance.html
Contact: mli{at}wh.math.uwaterloo.ca
* To whom correspondence should be addressed.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
A. Kocsor, A. Kertesz-Farkas, L. Kajan, and S. Pongor Application of compression-based distance measures to protein sequence classification: a methodological study Bioinformatics, February 15, 2006; 22(4): 407 - 412. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. S. Scott, T. Perkins, S. Bunnell, F. Pepin, D. Y. Thomas, and M. Hallett Identifying Regulatory Subnetworks for a Set of Genes Mol. Cell. Proteomics, May 1, 2005; 4(5): 683 - 692. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. L. K. Pond, S. D. W. Frost, and S. V. Muse HyPhy: hypothesis testing using phylogenies Bioinformatics, March 1, 2005; 21(5): 676 - 679. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Qi, H. Luo, and B. Hao CVTree: a phylogenetic tree reconstruction tool based on whole genomes Nucleic Acids Res., July 1, 2004; 32(suppl_2): W45 - W47. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. H. Chu, J. Qi, Z.-G. Yu, and V. Anh Origin and Phylogeny of Chloroplasts Revealed by a Simple Correlation Analysis of Complete Genomes Mol. Biol. Evol., January 1, 2004; 21(1): 200 - 206. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. W. Stuart, K. Moffett, and J. J. Leader A Comprehensive Vertebrate Phylogeny Using Vector Representations of Protein Sequences from Whole Genomes Mol. Biol. Evol., April 1, 2002; 19(4): 554 - 562. [Abstract] [Full Text] [PDF] |
||||



