Bioinformatics Vol. 16 no. 10 2000
Pages 906-914
© 2000 Oxford University Press
Original Paper |
Support vector machine classification and validation of cancer tissue samples using microarray expression data
1 Department of Computer Science, University
of California, Santa Cruz, Santa Cruz, CA 95064, USA
2 Department of Engineering Mathematics,
University of Bristol, Bristol, BS8 ITH, UK
3 Department of Molecular Biotechnology,
University of Washington, Seattle, WA 98195, USA
Received on April 4, 2000
; accepted on May 19, 2000
Motivation: DNA microarray experiments generating thousands of gene expression measurements, are being used to gather information from tissue and cell samples regarding gene expression differences that will be useful in diagnosing disease. We have developed a new method to analyse this kind of data using support vector machines (SVMs). This analysis consists of both classification of the tissue samples, and an exploration of the data for mis-labeled or questionable tissue results.
Results: We demonstrate the method in detail on samples consisting of ovarian cancer tissues, normal ovarian tissues, and other normal tissues. The dataset consists of expression experiment results for 97802 cDNAs for each tissue. As a result of computational analysis, a tissue sample is discovered and confirmed to be wrongly labeled. Upon correction of this mistake and the removal of an outlier, perfect classification of tissues is achieved, but not with high confidence. We identify and analyse a subset of genes from the ovarian dataset whose expression is highly differentiated between the types of tissues. To show robustness of the SVM method, two previously published datasets from other types of tissues or cells are analysed. The results are comparable to those previously obtained. We show that other machine learning methods also perform comparably to the SVM on many of those datasets.
Availability: The SVM software is available at http://www.cs.columbia.edu/~bgrundy/svm.
Contact: booch{at}cse.ucsc.edu
To whom correspondence should be addressed.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
B. Hanczar and E. R. Dougherty Classification with reject option in gene expression data Bioinformatics, September 1, 2008; 24(17): 1889 - 1895. [Abstract] [Full Text] [PDF] |
||||
![]() |
T.-h. Lin, N. Kaminski, and Z. Bar-Joseph Alignment and classification of time series gene expression in clinical studies Bioinformatics, July 1, 2008; 24(13): i147 - i155. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. B. Tchagang, A. H. Tewfik, M. S. DeRycke, K. M. Skubitz, and A. P.N. Skubitz Early detection of ovarian cancer using group biomarkers Mol. Cancer Ther., January 1, 2008; 7(1): 27 - 37. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Charaniya, S. Mehra, W. Lian, K. P. Jayapal, G. Karypis, and W.-S. Hu Transcriptome dynamics-based operon prediction and verification in Streptomyces coelicolor Nucleic Acids Res., December 18, 2007; 35(21): 7222 - 7236. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Duong, D. M. Greenawalt, A. Kowalczyk, M. L. Ciavarella, G. Raskutti, W. K. Murray, W. A. Phillips, and R. J. S. Thomas Pretreatment Gene Expression Profiles Can Be Used to Predict Response to Neoadjuvant Chemoradiotherapy in Esophageal Cancer. Ann. Surg. Oncol., December 1, 2007; 14(12): 3602 - 3609. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. Hanczar, J.-D. Zucker, C. Henegar, and L. Saitta Feature construction from synergic pairs to improve microarray-based classification Bioinformatics, November 1, 2007; 23(21): 2866 - 2872. [Abstract] [Full Text] [PDF] |
||||
![]() |
Z. Q. Tang, L. Y. Han, H. H. Lin, J. Cui, J. Jia, B. C. Low, B. W. Li, and Y. Z. Chen Derivation of Stable Microarray Cancer-Differentiating Signatures Using Consensus Scoring of Multiple Random Sampling and Gene-Ranking Consistency Evaluation Cancer Res., October 15, 2007; 67(20): 9996 - 10003. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Ahn, J. S. Marron, K. M. Muller, and Y.-Y. Chi The high-dimension, low-sample-size geometric representation holds under mild conditions Biometrika, August 5, 2007; (2007) asm050v1. [Abstract] [PDF] |
||||
![]() |
L. Kong, Y. Zhang, Z.-Q. Ye, X.-Q. Liu, S.-Q. Zhao, L. Wei, and G. Gao CPC: assess the protein-coding potential of transcripts using sequence features and support vector machine Nucleic Acids Res., July 13, 2007; 35(suppl_2): W345 - W349. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Gramantieri, M. Ferracin, F. Fornari, A. Veronese, S. Sabbioni, C.-G. Liu, G. A. Calin, C. Giovannini, E. Ferrazzi, G. L. Grazi, et al. Cyclin G1 Is a Target of miR-122a, a MicroRNA Frequently Down-regulated in Human Hepatocellular Carcinoma Cancer Res., July 1, 2007; 67(13): 6092 - 6099. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Y. Tao, J. Hoyt, and Yan Feng A Support Vector Machine Classifier for Recognizing Mitotic Subphases Using High-Content Screening Data J Biomol Screen, June 1, 2007; 12(4): 490 - 496. [Abstract] [PDF] |
||||
![]() |
X. Zhou and D. P. Tuck MSVM-RFE: extensions of SVM-RFE for multiclass gene selection on DNA microarray data Bioinformatics, May 1, 2007; 23(9): 1106 - 1114. [Abstract] [Full Text] [PDF] |
||||
![]() |
Z. Wei and H. Li Nonparametric pathway-based regression models for analysis of genomic data Biostat., April 1, 2007; 8(2): 265 - 284. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. M. Havaleshko, H. Cho, M. Conaway, C. R. Owens, G. Hampton, J. K. Lee, and D. Theodorescu Prediction of drug combination chemosensitivity in human bladder cancer Mol. Cancer Ther., February 1, 2007; 6(2): 578 - 586. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Qiu, Z. J. Wang, K. J. R. Liu, Z.-Z. Hu, and C. H. Wu Dependence network modeling for biomarker identification Bioinformatics, January 15, 2007; 23(2): 198 - 206. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Wang, Y. Lv, Z. Guo, X. Li, Y. Li, J. Zhu, D. Yang, J. Xu, C. Wang, S. Rao, et al. Effects of replacing the unreliable cDNA microarray measurements on the disease classification based on gene expression profiles and functional modules Bioinformatics, December 1, 2006; 22(23): 2883 - 2889. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. F. Ziober, K. R. Patel, F. Alawi, P. Gimotty, R. S. Weber, M. M. Feldman, A. A. Chalian, G. S. Weinstein, J. Hunt, and B. L. Ziober Identification of a Gene Signature for Rapid Screening of Oral Squamous Cell Carcinoma. Clin. Cancer Res., October 15, 2006; 12(20): 5960 - 5971. [Abstract] [Full Text] [PDF] |
||||
![]() |
X. Zhou and K. Z. Mao The ties problem resulting from counting-based error estimators and its impact on gene selection algorithms Bioinformatics, October 15, 2006; 22(20): 2507 - 2515. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Malossini, E. Blanzieri, and R. T. Ng Detecting potential labeling errors in microarrays by data perturbation Bioinformatics, September 1, 2006; 22(17): 2114 - 2121. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Pang, A. Lin, M. Holford, B. E. Enerson, B. Lu, M. P. Lawton, E. Floyd, and H. Zhao Pathway analysis using random forests classification and regression Bioinformatics, August 15, 2006; 22(16): 2028 - 2036. [Abstract] [Full Text] [PDF] |
||||
![]() |
H.-Q. Wang and K. Li A New Algorithm Based on Support Vectors and Penalty Strategy for Identifying Key Genes Related with Cancer Transactions of the Institute of Measurement and Control, August 1, 2006; 28(3): 263 - 273. [Abstract] [PDF] |
||||
![]() |
S. Mocellin, A. Ambrosi, M. C. Montesco, M. Foletto, G. Zavagno, D. Nitti, M. Lise, and C. R. Rossi Support Vector Machine Learning Model for the Prediction of Sentinel Node Status in Patients With Cutaneous Melanoma Ann. Surg. Oncol., August 1, 2006; 13(8): 1113 - 1122. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Rinaldi, P. Gallo, M. Calabrese, F. Ranzato, D. Luise, D. Colavito, M. Motta, A. Guglielmo, E. Del Giudice, C. Romualdi, et al. Longitudinal analysis of immune cell phenotypes in early stage multiple sclerosis: distinctive patterns characterize MRI-active patients. Brain, August 1, 2006; 129(Pt 8): 1993 - 2007. [Abstract] [Full Text] [PDF] |
||||
![]() |
D.-S. Huang and C.-H. Zheng Independent component analysis-based penalized discriminant method for tumor classification using gene expression data Bioinformatics, August 1, 2006; 22(15): 1855 - 1862. [Abstract] [Full Text] [PDF] |
||||
![]() |
J.-Y. Koo, I. Sohn, S. Kim, and J. W. Lee Structured polychotomous machine diagnosis of multiple cancer types using gene expression Bioinformatics, April 15, 2006; 22(8): 950 - 958. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. F. Basil, Y. Zhao, K. Zavaglia, P. Jin, M. C. Panelli, S. Voiculescu, S. Mandruzzato, H. M. Lee, B. Seliger, R. S. Freedman, et al. Common cancer biomarkers. Cancer Res., March 15, 2006; 66(6): 2953 - 2961. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. Lopez-Rios, S. Chuai, R. Flores, S. Shimizu, T. Ohno, K. Wakahara, P. B. Illei, S. Hussain, L. Krug, M. F. Zakowski, et al. Global Gene Expression Profiling of Pleural Mesotheliomas: Overexpression of Aurora Kinases and P16/CDKN2A Deletion as Prognostic Factors and Critical Evaluation of Microarray-Based Prognostic Prediction. Cancer Res., March 15, 2006; 66(6): 2970 - 2979. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. J. Ulintz, J. Zhu, Z. S. Qin, and P. C. Andrews Improved Classification of Mass Spectrometry Database Search Results Using Newer Machine Learning Approaches Mol. Cell. Proteomics, March 1, 2006; 5(3): 497 - 509. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Sanchez-Carbayo, N. D. Socci, J. Lozano, F. Saint, and C. Cordon-Cardo Defining Molecular Profiles of Poor Outcome in Patients With Invasive Bladder Cancer Using Oligonucleotide Microarrays J. Clin. Oncol., February 10, 2006; 24(5): 778 - 789. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Idicula-Thomas, A. J. Kulkarni, B. D. Kulkarni, V. K. Jayaraman, and P. V. Balaji A support vector machine-based method for predicting the propensity of a protein to be soluble or to form inclusion body on overexpression in Escherichia coli Bioinformatics, February 1, 2006; 22(3): 278 - 284. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. H. Zhang, J. Ahn, X. Lin, and C. Park Gene selection using support vector machines with non-convex penalty Bioinformatics, January 1, 2006; 22(1): 88 - 95. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Bild and P. G. Febbo Application of a priori established gene sets to discover biologically important differential expression in microarray data PNAS, October 25, 2005; 102(43): 15278 - 15279. [Full Text] [PDF] |
||||
![]() |
K. Willbrand, F. Radvanyi, J.-P. Nadal, J.-P. Thiery, and T. M. A. Fink Identifying genes from up-down properties of microarray expression series Bioinformatics, October 15, 2005; 21(20): 3859 - 3864. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. W. Mount and R. Pandey Using bioinformatics and genome analysis for new therapeutic interventions Mol. Cancer Ther., October 1, 2005; 4(10): 1636 - 1643. [Abstract] [Full Text] [PDF] |
||||
![]() |
T.-K. Man, M. Chintagumpala, J. Visvanathan, J. Shen, L. Perlaky, J. Hicks, M. Johnson, N. Davino, J. Murray, L. Helman, et al. Expression Profiles of Osteosarcoma That Can Predict Response to Chemotherapy Cancer Res., September 15, 2005; 65(18): 8142 - 8150. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. V. Iorio, M. Ferracin, C.-G. Liu, A. Veronese, R. Spizzo, S. Sabbioni, E. Magri, M. Pedriali, M. Fabbri, M. Campiglio, et al. MicroRNA Gene Expression Deregulation in Human Breast Cancer Cancer Res., August 15, 2005; 65(16): 7065 - 7070. [Abstract] [Full Text] [PDF] |
||||
![]() |
W. Chu, Z. Ghahramani, F. Falciani, and D. L. Wild Biomarker discovery in microarray gene expression data with Gaussian processes Bioinformatics, August 15, 2005; 21(16): 3385 - 3393. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Qiu, Z. J. Wang, and K. J. R. Liu Ensemble dependence model for classification and prediction of cancer and normal gene expression data Bioinformatics, July 15, 2005; 21(14): 3114 - 3121. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. L. M. M. Pochet, F. A. L. Janssens, F. De Smet, K. Marchal, J. A. K. Suykens, and B. L. R. De Moor M@CBETH: a microarray classification benchmarking tool Bioinformatics, July 15, 2005; 21(14): 3185 - 3186. [Abstract] [Full Text] [PDF] |
||||
![]() |
K.-H. Pan, C.-J. Lih, and S. N. Cohen Effects of threshold choice on biological conclusions reached during analysis of gene expression by DNA microarrays PNAS, June 21, 2005; 102(25): 8961 - 8965. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Chen and H.-X. Zhou Prediction of solvent accessibility and sites of deleterious mutations from protein sequence Nucleic Acids Res., June 3, 2005; 33(10): 3193 - 3199. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. F. Machado, D. Laskowski, O. Deffenderfer, T. Burch, S. Zheng, P. J. Mazzone, T. Mekhail, C. Jennings, J. K. Stoller, J. Pyle, et al. Detection of Lung Cancer by Sensor Array Analyses of Exhaled Breath Am. J. Respir. Crit. Care Med., June 1, 2005; 171(11): 1286 - 1291. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. J. Liu, G. Cutler, W. Li, Z. Pan, S. Peng, T. Hoey, L. Chen, and X. B. Ling Multiclass cancer classification and biomarker discovery using GA-based algorithms Bioinformatics, June 1, 2005; 21(11): 2691 - 2697. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. S. B. Sehgal, I. Gondal, and L. S. Dooley Collateral missing value imputation: a new robust missing value estimation algorithm for microarray data Bioinformatics, May 15, 2005; 21(10): 2417 - 2423. [Abstract] [Full Text] [PDF] |
||||
![]() |
Q. Zhang, S. Yoon, and W. J. Welsh Improved method for predicting {beta}-turn using support vector machine Bioinformatics, May 15, 2005; 21(10): 2370 - 2374. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Lottaz and R. Spang Molecular decomposition of complex clinical phenotypes using biologically structured analysis of microarray data Bioinformatics, May 1, 2005; 21(9): 1971 - 1978. [Abstract] [Full Text] [PDF] |
||||
![]() |
X. Zhou and K. Z. Mao LS Bound based gene selection for DNA microarray data Bioinformatics, April 15, 2005; 21(8): 1559 - 1564. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. Fort and S. Lambert-Lacroix Classification using partial least squares with penalized logistic regression Bioinformatics, April 1, 2005; 21(7): 1104 - 1111. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Le, K. Chi, S. Tyldesley, S. Flibotte, D. L. Diamond, M. A. Kuzyk, and M. D. Sadar Identification of Serum Amyloid A as a Biomarker to Distinguish Prostate Cancer Patients with Bone Lesions Clin. Chem., April 1, 2005; 51(4): 695 - 707. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Statnikov, C. F. Aliferis, I. Tsamardinos, D. Hardin, and S. Levy A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis Bioinformatics, March 1, 2005; 21(5): 631 - 643. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Komura, H. Nakamura, S. Tsutsumi, H. Aburatani, and S. Ihara Multidimensional support vector machines for visualization of gene expression data Bioinformatics, February 15, 2005; 21(4): 439 - 444. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Iwao-Koizumi, R. Matoba, N. Ueno, S. J. Kim, A. Ando, Y. Miyoshi, E. Maeda, S. Noguchi, and K. Kato Prediction of Docetaxel Response in Human Breast Cancer by Gene Expression Profiling J. Clin. Oncol., January 20, 2005; 23(3): 422 - 431. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Martin, D. Roe, and J.-L. Faulon Predicting protein-protein interactions using signature products Bioinformatics, January 15, 2005; 21(2): 218 - 226. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Tan, L. Shi, W. Tong, and C. Wang Multi-class cancer classification by total principal component regression (TPCR) using microarray gene expression data Nucleic Acids Res., January 7, 2005; 33(1): 56 - 65. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. L. Ebert and T. R. Golub Genomic approaches to hematologic malignancies Blood, August 15, 2004; 104(4): 923 - 932. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. Hutter, C. Schaab, S. Albrecht, M. Borgmann, N. A. Brunner, C. Freiberg, K. Ziegelbauer, C. O. Rock, I. Ivanov, and H. Loferer Prediction of Mechanisms of Action of Antibacterial Compounds by Gene Expression Profiling Antimicrob. Agents Chemother., August 1, 2004; 48(8): 2838 - 2844. [Abstract] [Full Text] [PDF] |
||||
|
|

















