Protein Life: R & D experience of a bioinformatician: top-down protoemics

Monday, March 28, 2011

Baking a mass-spectrometry data PIE with McMC and simulated annealing: predicting protein post-translational modifications from integrated top-down and bottom-up data

Abstract

Motivation: Post-translational modifications are vital to the function of proteins, but are hard to study, especially since several modified isoforms of a protein may be present simultaneously. Mass spectrometers are a great tool for investigating modified proteins, but the data they provide is often incomplete, ambiguous and difficult to interpret. Combining data from multiple experimental techniques—especially bottom-up and top-down mass spectrometry—provides complementary information. When integrated with background knowledge this allows a human expert to interpret what modifications are present and where on a protein they are located. However, the process is arduous and for high-throughput applications needs to be automated.

Results: This article explores a data integration methodology based on Markov chain Monte Carlo and simulated annealing. Our software, the Protein Inference Engine (the PIE) applies these algorithms using a modular approach, allowing multiple types of data to be considered simultaneously and for new data types to be added as needed. Even for complicated data representing multiple modifications and several isoforms, the PIE generates accurate modification predictions, including location. When applied to experimental data collected on the L7/L12 ribosomal protein the PIE was able to make predictions consistent with manual interpretation for several different L7/L12 isoforms using a combination of bottom-up data with experimentally identified intact masses.

full article

Friday, December 3, 2010

MALDI MS

MALDI (matrix-assisted laser desorption/ionization), a laser-based soft ionization method has proven to be one of the most successful ionization methods for mass spectrometric analysis and investigation of large molecules.[1,2] In addition analysis of post source decay (decay of proteins following the ionization) in many cases allows rapid sequencing of proteins. Thus MALDI has gained a crucial importance for protein analysis.
Its constituting feature is that the sample is embedded in a chemical matrix (ca. 1000 x molar excess) that greatly facilitates the production of intact gas-phase ions from large, nonvolatile, and thermally labile compounds such as proteins, oligonucleotides, synthetic polymers. and large inorganic compounds. The matrix plays a key role in this technique by absorbing the laser light energy and causing a small part of the target substrate to vaporize.

The most important applications of MALDI mass spectrometry are (in decreasing order of importance): peptides and proteins, synthetic polymers, oligonucleotides, oligosaccharides, lipids, inorganics.

In its current state, MALDI is primarily based on the laser desorption of solid matrix-analyte deposits [6]. The technique suffers from some disadvantages such as low shot-to-shot reproducibility, short sample life time and strong dependence on the sample preparation method.

MALDI MS-based systems have been developed to rapidly characterize microorganisms, such as intact proteins in the 5K-20K Da range in a top-down proteomics. In such experiments, proteins are identified either by conducting a homology search of extracted sequence tags or by comparison of MS/MS spectra with fragmentations predicted from protein sequences in existing protoeme database.
read more

Thursday, December 2, 2010

top-down vs. bottom-up protoemics

Bottom-Up Proteomics

In bottom-up proteomics, the analytes introduced into the mass spectrometer are peptides generated by enzymatic cleavage of one or many proteins. The proteins can first be separated by GE or chromatography, in which case the sample will contain only one or a few proteins. Alternatively, a complex protein mixture initially can be digested to the peptide level, then separated by on-line chromatography coupled to electrospray mass spectrometry (ESI–MS). In the latter case, the digest can contain thousands to hundreds of thousands of peptides, and require separation in two or more chromatographic dimensions before MS analysis. The identity of the original protein is determined by comparison of the peptide mass spectra with theoretical peptide masses calculated from a proteomic or genomic database. There are two approaches for protein identification using the bottom-up approach, peptide mass fingerprinting and tandem MS (MS–MS).

Top-Down Proteomics

In top-down proteomics, intact protein molecular ions generated by ESI/MALDI are introduced into the mass analyzer and are subjected to gas-phase fragmentation. An obstacle to this approach is the determination of product ion masses from multiply charged product ions (1). These can vary in charge state up to that of the multiply charged protein precursor ion. This can introduce ambiguity in the interpretation of top-down MS-MS spectra. Two approaches have been used to circumvent this limitation. The first is charge state manipulation through gas phase ion–ion interactions, and the second is the use of instruments with high mass measurement accuracy (MMA). It provides an approach for large scale characterization of proteins, both types of FTMS instruments, ICR and Orbitrap have been used for this methodology. The molecular mass of the inact precursor protein along with fragment ions from MS/MS experiments enables high confidence mapping to database protein entries as well as PTM detection.

Advantages and limitations of bottom-up strategies: Bottom-up proteomics is the most mature and most widely used approach for protein identification and characterization. Reversed-phase HPLC provides high-resolution separations of peptide digests with solvents that are compatible with ESI. On-line nano-scale reversed-phase LC–ESI–MS–MS can be fully automated and is almost universally used for bottom-up proteomics. Commercial instruments with control software and bioinformatics tools optimized for bottom-up applications are available from several vendors. The bottom-up strategy using on-line multidimensional capillary HPLC–MS-MS has been most successful in the identification of proteins in digests derived from very complex mixtures such as cell lysates (6). Moreover, quantitative techniques have been developed using affinity tags and stable isotope labels for determination of up- and down-regulated proteins in expression proteomics (7).
There are several fundamental and practical limitations to the bottom-up strategy.
Most importantly, only a fraction of the total peptide population of a given protein is identified. Therefore, information on only a portion of the protein sequence is obtained. It is clear from genomic studies that each open reading frame can give rise to many protein isoforms, which can originate from alternative splicing products and varying types and locations of posttranslational modifications (PTMs). PTMs such as phosphorylation and glycosylation are known to be important in the regulation of protein function and cell metabolism. A consequence of the limited sequence coverage in bottom-up proteomics is loss of much information about PTMs. Moreover, PTMs are often labile in the CID process and require techniques such as neutral loss scanning to detect them.
Practical limitations are encountered when bottom-up methods are used for protein identification from very complex peptide mixtures. On-line multidimensional LC–MS-MS analyses using ion-exchange coupled to reversed-phase columns require extended run times of as long as 15 h or more. Although this can be automated, the throughput of multidimensional LC–MS-MS is quite limited. Other problems include the loss of information about low-abundance peptides in mass spectra dominated by high-abundance species. Finally, narrow chromatographic peak widths can compromise acquisition of adequate MS–MS information during elution.

Advantages and limitations of top-down strategies: The two major advantages of the top-down strategy are the potential access to the complete protein sequence and the ability to locate and characterize PTMs. In addition, the time-consuming protein digestion required for bottom-up methods is eliminated.
Top-down proteomics is a relatively young field compared to bottom-up proteomics, and currently suffers from several limitations. First, the very complex spectra generated by multiply charged proteins limits the approach to isolated proteins, or simple protein mixtures at best. Second, the favored instrumentation (FT-ICR, hybrid ion trap FT-ICR or hybrid ion trap–orbitrap) are expensive to purchase and operate. Third, the top-down approach does not work well with intact proteins larger than about 50 kDa. Fourth, the favored dissociation techniques (ECT, ETD) are low-efficiency processes requiring long ion accumulation, activation, and detection times. This limits the ability to couple top-down MS techniques with on-line separations. Fifth, the mechanisms of protein dissociation behavior are less well understood than those of peptide dissociation. If top-down approaches are to be adopted widely, a greater understanding of fragmentation of multiply charged ions is needed (1), including the influence of precursor ion charge state, the role of protein primary, secondary and tertiary structure, and the contribution of PTMs. Finally, bioinformatics tools for top-down proteomics are primitive compared to those for bottom-up proteomics.

Thermo Scientific* ProSightPC, the first stand-alone software for analyzing top-down proteomics data, has been enhanced to add support for middle-down and bottom-up experiments, making it an all-around tool for identification and characterization of both intact proteins and peptides.
ProSightPC* 2.0 software enables high-throughput processing of all accurate-mass MS/MS data, whether from top-down, middle-down or bottom-up experiments including the characterization of proteins with known PTMs. ProSightPC 2.0 software uses multiple search modes to determine the exact protein sequence including modifications and alternative splicing. It is the only proteomics software that allows the user to search their tandem MS data against proteome warehouses containing the known biological complexity present in UniProt.