Assigning Putative Protein Identifications to Selected Lung Cancer Biomarkers from Surface-Enhanced Laser Desorption/Ionization Time-of-Flight Mass Spectrometry of Blood Serum
Honorable Mention - BioinformaticsJonathan, Llyle, Lustgarten ; University of Pittsburgh;
Content:
We present a computational method for assigning putative protein identifications to mass spectral features to enable selection of biomarkers (mass-to-charge values) for further experimental validation. We test the method on proteomic mass spectra derived from Surface-Enhanced Laser Desorption/Ionization Time-of-Flight analysis of blood serum samples from healthy controls and patients with Lung Cancer. We demonstrate that along with the identification, we can provide information on sequence, disease association, fragments, and other proteomic mass-to-charge peaks that may have been found using different technologies for the corresponding protein. It also could be used as a guide toward designing and prioritizing validation experiments.
Technology:
A dataset of proteomic mass spectra consisting of 322 samples from the Lung SPORE registry were scanned into the computer and then processed using our in-house Rule Learning program. Discriminative markers of lung cancer as detected by this program were provided to the Empirical Proteomics Ontology Knowledge Base (EPO-KB) (http://www.dbmi.pitt.edu/EPO-KB) to obtain putative identifications based on the literature.
Design:
We used rule learning to extract mass-to-charge values that the algorithm deemed statistically significant from each of the 10 runs of 70/30 split of the training data, where 70 percent was used for training and the remaining 30 percent was used for testing. We calculated the steadiness of each discriminative biomarker by a simple fraction of how many times it appears in the 10 different models. We then assigned identifications to those mass-to-charge values that appeared at least twice by using EPO-KB.
Results:
The results of the analysis chose ten mass-to-charge values of which four appeared in six or more models. We were able to assign nine identifications using the EPO-KB. The assigned putative identifications consisted of proteins that were well known for their appearance within cancer.
Conclusion:
We have presented a system that uses an advanced machine learning technique to choose significant biomarkers, and then assign putative identifications from prior knowledge to assist researchers in the selection and validation of biomarkers for lung cancer. This research is partly funded by grant numbers P50 CA090440 07 from the National Cancer Institute and GM071951 from the National Institute of General Medical Sciences.
