Skip Navigation

This Article
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow FREE Full Text (Screen PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (89)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Li, M.
Right arrow Articles by Zhang, H.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Li, M.
Right arrow Articles by Zhang, H.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Bioinformatics Vol. 17 no. 2 2001
Pages 149-154
© 2001 Oxford University Press


Original Paper

An information-based sequence distance and its application to whole mitochondrial genome phylogeny

Ming Li 1,*, Jonathan H. Badger 1, Xin Chen 2, Sam Kwong 2, Paul Kearney 1 and Haoyong Zhang 1

1 Bioinformatics Laboratory, Computer Science Department, University of Waterloo, N2L 3G1, Canada
2 Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong

Received on July 19, 2000 ; revised on October 5, 2000 ; accepted on October 11, 2000

Motivation: Traditional sequence distances require an alignment and therefore are not directly applicable to the problem of whole genome phylogeny where events such as rearrangements make full length alignments impossible. We present a sequence distance that works on unaligned sequences using the information theoretical concept of Kolmogorov complexity and a program to estimate this distance.

Results: We establish the mathematical foundations of our distance and illustrate its use by constructing a phylogeny of the Eutherian orders using complete unaligned mitochondrial genomes. This phylogeny is consistent with the commonly accepted one for the Eutherians. A second, larger mammalian dataset is also analyzed, yielding a phylogeny generally consistent with the commonly accepted one for the mammals.

Availability: The program to estimate our sequence distance, is available at http://www.cs.cityu.edu.hk/~cssamk/gencomp/GenCompress1.htm. The distance matrices used to generate our phylogenies are available at http://www.math.uwaterloo.ca/~mli/distance.html

Contact: mli{at}wh.math.uwaterloo.ca

* To whom correspondence should be addressed.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
BioinformaticsHome page
A. Kocsor, A. Kertesz-Farkas, L. Kajan, and S. Pongor
Application of compression-based distance measures to protein sequence classification: a methodological study
Bioinformatics, February 15, 2006; 22(4): 407 - 412.
[Abstract] [Full Text] [PDF]


Home page
Mol. Cell. ProteomicsHome page
M. S. Scott, T. Perkins, S. Bunnell, F. Pepin, D. Y. Thomas, and M. Hallett
Identifying Regulatory Subnetworks for a Set of Genes
Mol. Cell. Proteomics, May 1, 2005; 4(5): 683 - 692.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
S. L. K. Pond, S. D. W. Frost, and S. V. Muse
HyPhy: hypothesis testing using phylogenies
Bioinformatics, March 1, 2005; 21(5): 676 - 679.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
J. Qi, H. Luo, and B. Hao
CVTree: a phylogenetic tree reconstruction tool based on whole genomes
Nucleic Acids Res., July 1, 2004; 32(suppl_2): W45 - W47.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
K. H. Chu, J. Qi, Z.-G. Yu, and V. Anh
Origin and Phylogeny of Chloroplasts Revealed by a Simple Correlation Analysis of Complete Genomes
Mol. Biol. Evol., January 1, 2004; 21(1): 200 - 206.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
G. W. Stuart, K. Moffett, and J. J. Leader
A Comprehensive Vertebrate Phylogeny Using Vector Representations of Protein Sequences from Whole Genomes
Mol. Biol. Evol., April 1, 2002; 19(4): 554 - 562.
[Abstract] [Full Text] [PDF]



Disclaimer:
Please note that abstracts for content published before 1996 were created through digital scanning and may therefore not exactly replicate the text of the original print issues. All efforts have been made to ensure accuracy, but the Publisher will not be held responsible for any remaining inaccuracies. If you require any further clarification, please contact our Customer Services Department.