Highlight
High-throughput Proteomics through Mass Spectrometry
Achievement/Results
Proteomics data is proving to be an invaluable resource for biological research. Collaborations between the Steve Briggs (Cell and Developmental Biology) and Vineet Bafna (Comp Sciences and Engineering) laboratories are applying proteomics data in many novel ways to elucidate signaling pathways and improve genome annotation. Sam Payne, an IGERT fellow, has spent the year on several projects involved in improving and utilizing the proteomics-based mass spectrometry data from the Briggs lab. His first project aims to improve the discovery of protein phosphorylation sites from mass spectrometry data. Protein phosphorylation is a key mediator of cellular signaling. Mass spectrometry is the vehicle for high throughput phosphorylation discovery, capable of identifying a thousand phosphopeptides in a single experiment. In spite of the potential to annotate thousands of phosphorylation sites, most mass spectrometry identification algorithms are poorly equipped to annotate phosphopeptides due to poor peptide fragmentation. Sam Payne’s research, in press in the Journal of Proteome Research, carefully studies the effect of phosphorylation on peptide fragmentation within a mass spectrometer. The resulting characterization was modeled in a new scoring function for the Inspect software. The new software is both more sensitive and more efficient than any current algorithm. Improving phosphopeptide recovery by up to 50%, it is also orders of magnitude faster than current workflows. After completing his analysis, the new algorithm was applied to Arabidopsis mass spectrometry data from the Briggs lab and discovered over 6900 phosphorylated peptides. Future work in this area lies in phosphoprotomics analysis of plant infection, determing how pathogens attack their host and avoid or alter defense mechanisms. A second project is the proteogenomic annotation of Arabidopsis. The Briggs lab has amassed over 20 million MS/MS spectra from Arabidopsis. In addition to the proteomics experiments or survey of the original data, these data can also be used to annotate the genome. Peptides found in mass spectrometry experiments are the direct evidence of gene transcription and translation. Often researches map peptides to know genes, however, this project searched the genome for yet unrecognized protein coding sequences. Using the Inspect software, they searched a six frame translation of the genome, along with extensive alternative splicing predictions. They discovered over 800 new genes and expanded and corrected gene models for another 800 current gene models.
Address Goals
Proteimics data sets contain much valuable information and the new tools should greatly enhance the ability to extract valuable information from these.