Exploring science is typically characterized by a lot of puzzles, frustrations or even failures. This weblog is mainly intended to record my working, thinking and knowledge acquisitions. I expect that some reflection would refresh my mind from time to time, and motivate me to move further, and hopefully give me a better view about even changing the landscape of bioinformatics.
You are welcome to leave some comments, good or bad, but hopefully something constructive. Enjoy your surfing!
Monday, March 28, 2011
Baking a mass-spectrometry data PIE with McMC and simulated annealing: predicting protein post-translational modifications from integrated top-down and bottom-up data
Motivation: Post-translational modifications are vital to the function of proteins, but are hard to study, especially since several modified isoforms of a protein may be present simultaneously. Mass spectrometers are a great tool for investigating modified proteins, but the data they provide is often incomplete, ambiguous and difficult to interpret. Combining data from multiple experimental techniques—especially bottom-up and top-down mass spectrometry—provides complementary information. When integrated with background knowledge this allows a human expert to interpret what modifications are present and where on a protein they are located. However, the process is arduous and for high-throughput applications needs to be automated.
Results: This article explores a data integration methodology based on Markov chain Monte Carlo and simulated annealing. Our software, the Protein Inference Engine (the PIE) applies these algorithms using a modular approach, allowing multiple types of data to be considered simultaneously and for new data types to be added as needed. Even for complicated data representing multiple modifications and several isoforms, the PIE generates accurate modification predictions, including location. When applied to experimental data collected on the L7/L12 ribosomal protein the PIE was able to make predictions consistent with manual interpretation for several different L7/L12 isoforms using a combination of bottom-up data with experimentally identified intact masses.