Skip Navigation

This Article
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow FREE Full Text (Screen PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (111)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Karchin, R.
Right arrow Articles by Haussler, D.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Karchin, R.
Right arrow Articles by Haussler, D.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Bioinformatics Vol. 18 no. 1 2002
Pages 147-159
© 2002 Oxford University Press

Classifying G-protein coupled receptors with support vector machines

Rachel Karchin 1,*, Kevin Karplus 2 and David Haussler 1,3

1 Department of Computer Science
2 Department of Computer Engineering
3 Howard Hughes Medical Institute, University of California, Santa Cruz, CA 95064, USA

Received on June 20, 2001 ; revised on August 18, 2001 ; accepted on August 23, 2001

Motivation: The enormous amount of protein sequence data uncovered by genome research has increased the demand for computer software that can automate the recognition of new proteins. We discuss the relative merits of various automated methods for recognizing G-Protein Coupled Receptors (GPCRs), a superfamily of cell membrane proteins. GPCRs are found in a wide range of organisms and are central to a cellular signalling network that regulates many basic physiological processes. They are the focus of a significant amount of current pharmaceutical research because they play a key role in many diseases. However, their tertiary structures remain largely unsolved. The methods described in this paper use only primary sequence information to make their predictions. We compare a simple nearest neighbor approach (BLAST), methods based on multiple alignments generated by a statistical profile Hidden Markov Model (HMM), and methods, including Support Vector Machines (SVMs), that transform protein sequences into fixed-length feature vectors.

Results: The last is the most computationally expensive method, but our experiments show that, for those interested in annotation-quality classification, the results are worth the effort. In two-fold cross-validation experiments testing recognition of GPCR subfamilies that bind a specific ligand (such as a histamine molecule), the errors per sequence at the Minimum Error Point (MEP) were 13.7% for multi-class SVMs, 17.1% for our SVMtree method of hierarchical multi-class SVM classification, 25.5% for BLAST, 30% for profile HMMs, and 49% for classification based on nearest neighbor feature vector Kernel Nearest Neighbor (kernNN). The percentage of true positives recognized before the first false positive was 65% for both SVM methods, 13% for BLAST, 5% for profile HMMs and 4% for kernNN.

Availability: We have set up a web server for GPCR subfamily classification based on hierarchical multi-class SVMs at http://www.soe.ucsc.edu/research/compbio/gpcr-subclass. By scanning predicted peptides found in the human genome with the SVMtree server, we have identified a large number of genes that encode GPCRs. A list of our predictions for human GPCRs is available at http://www.soe.ucsc.edu/research/compbio/gpcr·hg/class·results. We also provide suggested subfamily classification for 18 sequences previously identified as unclassified Class A (rhodopsin-like) GPCRs in GPCRDB (Horn et al. , Nucleic Acids Res. , 26, 277–281, 1998), available at http://www.soe.ucsc.edu/research/compbio/gpcr/classA·unclassified/.

Contact: rachelk{at}soe.ucsc.edu

* To whom correspondence should be addressed.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
BioinformaticsHome page
M. T. A. Shamim, M. Anwaruddin, and H.A. Nagarajaram
Support Vector Machine-based classification of protein folds using the structural properties of amino acid residues and amino acid residue pairs
Bioinformatics, December 15, 2007; 23(24): 3320 - 3327.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
M. N. Davies, A. Secker, A. A. Freitas, M. Mendao, J. Timmis, and D. R. Flower
On the hierarchical classification of G protein-coupled receptors
Bioinformatics, December 1, 2007; 23(23): 3113 - 3118.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
J.-R. Xu, J.-X. Zhang, B.-C. Han, L. Liang, and Z.-L. Ji
CytoSVM: an advanced server for identification of cytokine-receptor interactions
Nucleic Acids Res., July 13, 2007; 35(suppl_2): W538 - W542.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
J. Gough
Genomic scale sub-family assignment of protein domains
Nucleic Acids Res., July 28, 2006; 34(13): 3625 - 3633.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
Z. R. Li, H. H. Lin, L. Y. Han, L. Jiang, X. Chen, and Y. Z. Chen
PROFEAT: a web server for computing structural and physicochemical features of proteins and peptides from amino acid sequence.
Nucleic Acids Res., July 1, 2006; 34(Web Server issue): W32 - W37.
[Abstract] [Full Text] [PDF]


Home page
Pharmacol. Rev.Home page
C. J. Zheng, L. Y. Han, C. W. Yap, Z. L. Ji, Z. W. Cao, and Y. Z. Chen
Therapeutic targets: progress of their exploration and investigation of their characteristics.
Pharmacol. Rev., June 1, 2006; 58(2): 259 - 279.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
P. M. Kasson, J. B. Huppa, M. M. Davis, and A. T. Brunger
A hybrid machine-learning approach for segmentation of protein localization data
Bioinformatics, October 1, 2005; 21(19): 3778 - 3786.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
M. Bhasin and G. P. S. Raghava
GPCRsclass: a web tool for the classification of amine type of G-protein-coupled receptors
Nucleic Acids Res., July 1, 2005; 33(suppl_2): W143 - W147.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
Y. Yabuki, T. Muramatsu, T. Hirokawa, H. Mukai, and M. Suwa
GRIFFIN: a system for predicting GPCR-G-protein coupling selectivity using a support vector machine and a hidden Markov model
Nucleic Acids Res., July 1, 2005; 33(suppl_2): W148 - W153.
[Abstract] [Full Text] [PDF]


Home page
Eukaryot CellHome page
K. J. Barwell, J. H. Boysen, W. Xu, and A. P. Mitchell
Relationship of DFG16 to the Rim101p pH Response Pathway in Saccharomyces cerevisiae and Candida albicans
Eukaryot. Cell, May 1, 2005; 4(5): 890 - 899.
[Abstract] [Full Text] [PDF]


Home page
Protein Sci.Home page
J. M. Otaki, S. Ienaka, T. Gotoh, and H. Yamamoto
Availability of short amino acid sequences in proteins
Protein Sci., March 1, 2005; 14(3): 617 - 625.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
L. Y. Han, C. Z. Cai, Z. L. Ji, Z. W. Cao, J. Cui, and Y. Z. Chen
Predicting functional family of novel enzymes irrespective of sequence similarity: a statistical learning approach
Nucleic Acids Res., December 7, 2004; 32(21): 6437 - 6444.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
E. P. Xing and R. M. Karp
MotifPrototyper: A Bayesian profile model for motif families
PNAS, July 20, 2004; 101(29): 10523 - 10528.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
M. Bhasin and G. P. S. Raghava
GPCRpred: an SVM-based method for prediction of families and subfamilies of G-protein coupled receptors
Nucleic Acids Res., July 1, 2004; 32(suppl_2): W383 - W389.
[Abstract] [Full Text] [PDF]


Home page
J. Biol. Chem.Home page
M. Bhasin and G. P. S. Raghava
Classification of Nuclear Receptors Based on Amino Acid Composition and Dipeptide Composition
J. Biol. Chem., May 28, 2004; 279(22): 23262 - 23266.
[Abstract] [Full Text] [PDF]


Home page
RNAHome page
L. Y. HAN, C. Z. CAI, S. L. LO, M. C.M. CHUNG, and Y. Z. CHEN
Prediction of RNA-binding proteins from primary sequence by a support vector machine approach
RNA, March 1, 2004; 10(3): 355 - 368.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
C.Z. Cai, L.Y. Han, Z.L. Ji, X. Chen, and Y.Z. Chen
SVM-Prot: web-based support vector machine software for functional classification of a protein from its primary sequence
Nucleic Acids Res., July 1, 2003; 31(13): 3692 - 3697.
[Abstract] [Full Text] [PDF]


Home page
Protein Sci.Home page
E. S. Huang
Construction of a sequence motif characteristic of aminergic G protein-coupled receptors
Protein Sci., July 1, 2003; 12(7): 1360 - 1367.
[Abstract] [Full Text] [PDF]



Disclaimer:
Please note that abstracts for content published before 1996 were created through digital scanning and may therefore not exactly replicate the text of the original print issues. All efforts have been made to ensure accuracy, but the Publisher will not be held responsible for any remaining inaccuracies. If you require any further clarification, please contact our Customer Services Department.