Skip Navigation

This Article
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow FREE Full Text (Screen PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (190)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Yeung, K. Y.
Right arrow Articles by Ruzzo, W. L.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Yeung, K. Y.
Right arrow Articles by Ruzzo, W. L.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Bioinformatics Vol. 17 no. 10 2001
Pages 977-987
© 2001 Oxford University Press

Model-based clustering and data transformations for gene expression data

K. Y. Yeung 1,*, C. Fraley 2, A. Murua 3, A. E. Raftery 2 and W. L. Ruzzo 1

1 Computer Science and Engineering, Box 352350
2 Statistics, Box 354322, University of Washington, Seattle, WA 98195, USA
3 Insightful Corporation, 1700 Westlake Avenue North, Suite 500, Seattle, WA 98109, USA

Received on April 20, 2001 ; accepted on July 6, 2001

Motivation: Clustering is a useful exploratory technique for the analysis of gene expression data. Many different heuristic clustering algorithms have been proposed in this context. Clustering algorithms based on probability models offer a principled alternative to heuristic algorithms. In particular, model-based clustering assumes that the data is generated by a finite mixture of underlying probability distributions such as multivariate normal distributions. The issues of selecting a ‘good’ clustering method and determining the ‘correct’ number of clusters are reduced to model selection problems in the probability framework. Gaussian mixture models have been shown to be a powerful tool for clustering in many applications.

Results: We benchmarked the performance of model-based clustering on several synthetic and real gene expression data sets for which external evaluation criteria were available. The model-based approach has superior performance on our synthetic data sets, consistently selecting the correct model and the number of clusters. On real expression data, the model-based approach produced clusters of quality comparable to a leading heuristic clustering algorithm, but with the key advantage of suggesting the number of clusters and an appropriate model. We also explored the validity of the Gaussian mixture assumption on different transformations of real data. We also assessed the degree to which these real gene expression data sets fit multivariate Gaussian distributions both before and after subjecting them to commonly used data transformations. Suitably chosen transformations seem to result in reasonable fits.

Availability: MCLUST is available at http://www.stat.washington.edu/fraley/mclust. The software for the diagonal model is under development.

Contact: kayee{at}cs.washington.edu

Supplementary information: http://www.cs.washington.edu/homes/kayee/model

* To whom all correspondence should be addressed.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
BiostatisticsHome page
G. Nowak and R. Tibshirani
Complementary hierarchical clustering
Biostat., July 1, 2008; 9(3): 467 - 483.
[Abstract] [Full Text] [PDF]


Home page
BiostatisticsHome page
J. Tuke, G. F. V. Glonek, and P. J. Solomon
Gene profiling for determining pluripotent genes in a time course microarray experiment
Biostat., June 18, 2008; (2008) kxn017v1.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
L. Brehelin, O. Gascuel, and O. Martin
Using repeated measurements to validate hierarchical gene clusters
Bioinformatics, March 1, 2008; 24(5): 682 - 688.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
J. Kim and H. Kim
Clustering of change patterns using Fourier coefficients
Bioinformatics, January 15, 2008; 24(2): 184 - 191.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
A. Joshi, Y. Van de Peer, and T. Michoel
Analysis of a Gibbs sampler method for model-based clustering of gene expression data
Bioinformatics, January 15, 2008; 24(2): 176 - 183.
[Abstract] [Full Text] [PDF]


Home page
BiostatisticsHome page
J.-L. Dortet-Bernadet and N. Wicker
Model-based clustering on the unit sphere with an illustration using gene expression profiles
Biostat., January 1, 2008; 9(1): 66 - 80.
[Abstract] [Full Text] [PDF]


Home page
Stat Methods Med ResHome page
Seo Young Kim and J. Won Lee
Ensemble clustering method based on the resampling similarity measure for gene expression data
Statistical Methods in Medical Research, December 1, 2007; 16(6): 539 - 564.
[Abstract] [PDF]


Home page
BioinformaticsHome page
S. Yuan and K.-C. Li
Context-dependent clustering for dynamic cellular state modeling of microarray gene expression
Bioinformatics, November 15, 2007; 23(22): 3039 - 3047.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
Z. Yu, H.-S. Wong, and H. Wang
Graph-based consensus clustering for class discovery from gene expression data
Bioinformatics, November 1, 2007; 23(21): 2888 - 2896.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
D. Zhu, Y. Li, and H. Li
Multivariate correlation estimator for inferring functional relationships from replicated genome-wide data
Bioinformatics, September 1, 2007; 23(17): 2298 - 2305.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
G. C. Tseng
Penalized and weighted K-means for clustering with scattered objects and prior information in high-throughput biological data
Bioinformatics, September 1, 2007; 23(17): 2247 - 2255.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
Z. Xiang, Z. S. Qin, and Y. He
CRCView: a web server for analyzing and visualizing microarray gene expression data using model-based clustering
Bioinformatics, July 15, 2007; 23(14): 1843 - 1845.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
Y. Lu, X. He, and S. Zhong
Cross-species microarray analysis with the OSCAR system suggests an INSR->Pax6->NQO1 neuro-protective pathway in aging and Alzheimer's disease
Nucleic Acids Res., July 13, 2007; 35(suppl_2): W105 - W114.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
D. S. V. Wong, F. K. Wong, and G. R. Wood
A multi-stage approach to clustering and imputation of gene expression profiles
Bioinformatics, April 15, 2007; 23(8): 998 - 1005.
[Abstract] [Full Text] [PDF]


Home page
Brief BioinformHome page
D. J. Wilkinson
Bayesian methods in bioinformatics and computational systems biology
Brief Bioinform, April 12, 2007; (2007) bbm007v1.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
S. Takitoh, S. Fujii, Y. Mase, J. Takasaki, T. Yamazaki, Y. Ohnishi, M. Yanagisawa, Y. Nakamura, and N. Kamatani
Accurate automated clustering of two-dimensional data for single-nucleotide polymorphism genotyping by a combination of clustering methods: evaluation by large-scale real data
Bioinformatics, February 15, 2007; 23(4): 408 - 413.
[Abstract] [Full Text] [PDF]


Home page
JCOHome page
X.-J. Ma, S. G. Hilsenbeck, W. Wang, L. Ding, D. C. Sgroi, R. A. Bender, C. K. Osborne, D. C. Allred, and M. G. Erlander
The HOXB13:IL17BR Expression Index Is a Prognostic Factor in Early-Stage Breast Cancer
J. Clin. Oncol., October 1, 2006; 24(28): 4611 - 4619.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
A. Thalamuthu, I. Mukhopadhyay, X. Zheng, and G. C. Tseng
Evaluation and comparison of gene clustering methods in microarray analysis
Bioinformatics, October 1, 2006; 22(19): 2405 - 2412.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
A. E. Teschendorff, A. Naderi, N. L. Barbosa-Morais, and C. Caldas
PACK: Profile Analysis using Clustering and Kurtosis to find molecular classifiers in cancer
Bioinformatics, September 15, 2006; 22(18): 2269 - 2275.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
Z. S. Qin
Clustering microarray gene expression data using weighted Chinese restaurant process
Bioinformatics, August 15, 2006; 22(16): 1988 - 1997.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
Y. Qu and S. Xu
Quantitative Trait Associated Microarray Gene Expression Data Analysis
Mol. Biol. Evol., August 1, 2006; 23(8): 1558 - 1573.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
S. K. Ng, G. J. McLachlan, K. Wang, L. Ben-Tovim Jones, and S.-W. Ng
A Mixture model with random-effects components for clustering correlated gene-expression profiles
Bioinformatics, July 15, 2006; 22(14): 1745 - 1752.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
D. Huang and W. Pan
Incorporating biological knowledge into distance-based clustering analysis of microarray gene expression data
Bioinformatics, May 15, 2006; 22(10): 1259 - 1268.
[Abstract] [Full Text] [PDF]


Home page
BiostatisticsHome page
M. R. Segal
Microarray gene expression data with linked survival phenotypes: diffuse large-B-cell lymphoma revisited
Biostat., April 1, 2006; 7(2): 268 - 285.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
W. Pan
Incorporating gene functions as priors in model-based clustering of microarray gene expression data
Bioinformatics, April 1, 2006; 22(7): 795 - 801.
[Abstract] [Full Text] [PDF]


Home page
Cancer Epidemiol. Biomarkers Prev.Home page
K. D. Siegmund, A. J. Levine, J. Chang, and P. W. Laird
Modeling exposures for DNA methylation profiles.
Cancer Epidemiol. Biomarkers Prev., March 1, 2006; 15(3): 567 - 572.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
R. Amato, A. Ciaramella, N. Deniskina, C. D. Mondo, D. di Bernardo, C. Donalek, G. Longo, G. Mangano, G. Miele, G. Raiconi, et al.
A multi-step approach to time series analysis and gene expression clustering
Bioinformatics, March 1, 2006; 22(5): 589 - 596.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
F. Martella
Classification of microarray data with factor mixture models
Bioinformatics, January 15, 2006; 22(2): 202 - 208.
[Abstract] [Full Text] [PDF]


Home page
BiostatisticsHome page
R. Gottardo, J. Besag, M. Stephens, and A. Murua
Probabilistic segmentation and intensity estimation for microarray images
Biostat., January 1, 2006; 7(1): 85 - 99.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
N. A. Heard, C. C. Holmes, D. A. Stephens, D. J. Hand, and G. Dimopoulos
Bayesian coclustering of Anopheles gene expression time series: Study of immune defense response to multiple experimental challenges
PNAS, November 22, 2005; 102(47): 16939 - 16944.
[Abstract] [Full Text] [PDF]


Home page
Statistical ModellingHome page
G. Celeux, O. Martin, and C. Lavergne
Mixture of linear mixed models for clustering gene expression profiles from repeated microarray experiments
Statistical Modeling, October 1, 2005; 5(3): 243 - 267.
[Abstract] [PDF]


Home page
Molecular Cancer TherapeuticsHome page
D. W. Mount and R. Pandey
Using bioinformatics and genome analysis for new therapeutic interventions
Mol. Cancer Ther., October 1, 2005; 4(10): 1636 - 1643.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
G. Sanguinetti, M. Milo, M. Rattray, and N. D. Lawrence
Accounting for probe-level noise in principal component analysis of microarray data
Bioinformatics, October 1, 2005; 21(19): 3748 - 3754.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
J. Handl, J. Knowles, and D. B. Kell
Computational cluster validation in post-genomic data analysis
Bioinformatics, August 1, 2005; 21(15): 3201 - 3212.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
A. E. Teschendorff, Y. Wang, N. L. Barbosa-Morais, J. D. Brenton, and C. Caldas
A variational Bayesian mixture modelling framework for cluster analysis of gene-expression data
Bioinformatics, July 1, 2005; 21(13): 3025 - 3033.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
H. Bensmail, J. Golek, M. M. Moody, J. O. Semmes, and A. Haoudi
A novel approach for clustering proteomics data using Bayesian fast Fourier transform
Bioinformatics, May 15, 2005; 21(10): 2210 - 2224.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
R. Guthke, U. Moller, M. Hoffmann, F. Thies, and S. Topfer
Dynamic network reconstruction from gene expression data applied to immune response during bacterial infection
Bioinformatics, April 15, 2005; 21(8): 1626 - 1634.
[Abstract] [Full Text] [PDF]


Home page
Plant Physiol.Home page
Y. Soeda, M. C.J.M. Konings, O. Vorst, A. M.M.L. van Houwelingen, G. M. Stoopen, C. A. Maliepaard, J. Kodde, R. J. Bino, S. P.C. Groot, and A. H.M. van der Geest
Gene Expression Programs during Brassica oleracea Seed Maturation, Osmopriming, and Germination Are Indicators of Progression of the Germination Process and the Stress Tolerance Level
Plant Physiology, January 1, 2005; 137(1): 354 - 368.
[Abstract] [Full Text] [PDF]


Home page
J ANIM SCIHome page
A. Reverter, Y. H. Wang, K. A. Byrne, S. H. Tan, G. S. Harper, and S. A. Lehnert
Joint analysis of multiple cDNA microarray studies via multivariate mixed models applied to genetic improvement of beef cattle
J Anim Sci, December 1, 2004; 82(12): 3430 - 3439.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
K. K. Lin, D. Chudova, G. W. Hatfield, P. Smyth, and B. Andersen
Identification of hair cycle-associated genes from time-course gene expression profile data by using replicate variance
PNAS, November 9, 2004; 101(45): 15955 - 15960.
[Abstract] [Full Text] [PDF]


Home page
J ANIM SCIHome page
R. J. Moser, A. Reverter, C. A. Kerr, K. J. Beh, and S. A. Lehnert
A mixed-model approach for the analysis of cDNA microarray gene expression data from extreme-performing pigs after infection with Actinobacillus pleuropneumoniae
J Anim Sci, May 1, 2004; 82(5): 1261 - 1271.
[Abstract] [Full Text] [PDF]


Home page
J ANIM SCIHome page
A. Reverter, K. A. Byrne, H. L. Bruce, Y. H. Wang, B. P. Dalrymple, and S. A. Lehnert
A mixture model-based cluster analysis of DNA microarray gene expression data on Brahman and Brahman composite steers fed high-, medium-, and low-quality diets
J Anim Sci, August 1, 2003; 81(8): 1900 - 1910.
[Abstract] [Full Text] [PDF]


Home page
Physiol. GenomicsHome page
H. Ressom, D. Wang, and P. Natarajan
Clustering gene expression data using adaptive double self-organizing map
Physiol Genomics, June 24, 2003; 14(1): 35 - 46.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
N. Wicker, D. Dembele, W. Raffelsberger, and O. Poch
Density of points clustering, application to transcriptomic data analysis
Nucleic Acids Res., September 15, 2002; 30(18): 3992 - 4000.
[Abstract] [Full Text] [PDF]


Home page
EndocrinologyHome page
J. A. Mong, C. Krebs, and D. W. Pfaff
Perspective: Micoarrays and Differential Display PCR--Tools for Studying Transcript Levels of Genes in Neuroendocrine Systems
Endocrinology, June 1, 2002; 143(6): 2002 - 2006.
[Abstract] [Full Text] [PDF]



Disclaimer:
Please note that abstracts for content published before 1996 were created through digital scanning and may therefore not exactly replicate the text of the original print issues. All efforts have been made to ensure accuracy, but the Publisher will not be held responsible for any remaining inaccuracies. If you require any further clarification, please contact our Customer Services Department.