Bioinformatics Vol. 17 no. 90001 2001
Pages S13-S21
© 2001 Oxford University Press
SCOPE: a probabilistic model for scoring tandem mass spectra against a peptide database
1 Informatics Research, Celera Genomics, 45 W. Gude Drive, Rockville, MD, 20850, USA
Received on February 5, 2001
; revised on April 2, 2001
; accepted on April 2, 2001
Proteomics, or the direct analysis of the expressed protein components of a cell, is critical to our understanding of cellular biological processes in normal and diseased tissue. A key requirement for its success is the ability to identify proteins in complex mixtures. Recent technological advances in tandem mass spectrometry has made it the method of choice for high-throughput identification of proteins. Unfortunately, the software for unambiguously identifying peptide sequences has not kept pace with the recent hardware improvements in mass spectrometry instruments. Critical for reliable high-throughput protein identification, scoring functions evaluate the quality of a match between experimental spectra and a database peptide. Current scoring function technology relies heavily on ad-hoc parameterization and manual curation by experienced mass spectrometrists. In this work, we propose a two-stage stochastic model for the observed MS/MS spectrum, given a peptide. Our model explicitly incorporates fragment ion probabilities, noisy spectra, and instrument measurement error. We describe how to compute this probability based score efficiently, using a dynamic programming technique. A prototype implementation demonstrates the effectiveness of the model.
Contact: Vineet.Bafna{at}Celera.Com