Protein Life: R & D experience of a bioinformatician

Exploring science is typically characterized by a lot of puzzles, frustrations or even failures. This weblog is mainly intended to record my working, thinking and knowledge acquisitions. I expect that some reflection would refresh my mind from time to time, and motivate me to move further, and hopefully give me a better view about even changing the landscape of bioinformatics. You are welcome to leave some comments, good or bad, but hopefully something constructive. Enjoy your surfing!

Showing posts with label mass spectrometry. Show all posts

Saturday, October 29, 2011

Enhanced peptide quantification using spectral count clustering and cluster abundance

Quantification of protein expression by means of mass spectrometry (MS) has been introduced in various proteomics studies. In particular, two label-free quantification methods, such as spectral counting and spectra feature analysis have been extensively investigated in a wide variety of proteomic studies.

The cornerstone of both methods is peptide identification based on a proteomic database search and subsequent estimation of peptide retention time. However, they often suffer from restrictive database search and inaccurate estimation of the liquid chromatography (LC) retention time.

Furthermore, conventional peptide identification methods based on the spectral library search algorithms such as SEQUEST or SpectraST have been found to provide neither the best match nor high-scored matches. Lastly, these methods are limited in the sense that target peptides cannot be identified unless they have been previously generated and stored into the database or spectral libraries.To overcome these limitations, we propose a novel method, namely Quantification method based on Finding the Identical Spectral set for a Homogenous peptide (Q-FISH) to estimate the peptide's abundance from its tandem mass spectrometry (MS/MS) spectra through the direct comparison of experimental spectra.

Intuitively, our Q-FISH method compares all possible pairs of experimental spectra in order to identify both known and novel proteins, significantly enhancing identification accuracy by grouping replicated spectra from the same peptide targets.

Results: We applied Q-FISH to Nano-LC-MS/MS data obtained from human hepatocellular carcinoma (HCC) and normal liver tissue samples to identify differentially expressed peptides between the normal and disease samples. For a total of 44,318 spectra obtained through MS/MS analysis, Q-FISH yielded 14,747 clusters.

Among these, 5,777 clusters were identified only in the HCC sample, 6,648 clusters only in the normal tissue sample, and 2,323 clusters both in the HCC and normal tissue samples. While it will be interesting to investigate peptide clusters only found from one sample, further examined spectral clusters identified both in the HCC and normal samples since our goal is to identify and assess differentially expressed peptides quantitatively.

The next step was to perform a beta-binomial test to isolate differentially expressed peptides between the HCC and normal tissue samples. This test resulted in 84 peptides with significantly differential spectral counts between the HCC and normal tissue samples.

We independently identified 50 and 95 peptides by SEQUEST, of which 24 and 56 peptides, respectively, were found to be known biomarkers for the human liver cancer. Comparing Q-FISH and SEQUEST results, we found 22 of the differentially expressed 84 peptides by Q-FISH were also identified by SEQUEST.

Remarkably, of these 22 peptides discovered both by Q-FISH and SEQUEST, 13 peptides are known for human liver cancer and the remaining 9 peptides are known to be associated with other cancers.

Conclusions: We proposed a novel statistical method, Q-FISH, for accurately identifying protein species and simultaneously quantifying the expression levels of identified peptides from mass spectrometry data. Q-FISH analysis on human HCC and liver tissue samples identified many protein biomarkers that are highly relevant to HCC.

Q-FISH can be a useful tool both for peptide identification and quantification on mass spectrometry data analysis. It may also prove to be more effective in discovering novel protein biomarkers than SEQUEST and other standard methods.

Author: Seungmook LeeMin-Seok KwonHyoung-Joo LeeYoung-Ki PaikHaixu TangJae LeeTaesung Park
Credits/Source: BMC Bioinformatics 2011, 12:423

Wednesday, October 5, 2011

mz5: Space- and time-efficient storage of mass spectrometry data sets

"Across a host of mass spectrometry (MS)-driven -omics fields, researchers witness the acquisition of ever increasing amounts of high throughput MS datasets and the need for their compact yet efficiently accessible storage has become clear.
The HUPO proteomics standard initiative (PSI) has defined an ontology and associated controlled vocabulary that specifies the contents of MS data files in terms of an open data format. Current implementations are the mzXML and mzML formats (mzML specification), both of which are based on an XML representation of the data. As a consequence, these formats are not particular efficient with respect to their storage space requirements or I/O performance.
This contribution introduces mz5, an implementation of the PSI mzML ontology that is based on HDF5, an efficient, industrial strength storage backend.
Compared to the current mzXML and mzML standards, this strategy yields an average file size reduction of a factor of ~2 and increases I/O performace ~3-4 fold.
The format is implemented as part of the ProteoWizard project."
more

Monday, August 15, 2011

Can the false-discovery rate be misleading?

"The decoy-database approach is currently the gold standard for assessing the confidence of identifications in shotgun proteomic experiments. Here we demonstrate that what might appear to be a good result under the decoy-database approach for a given false-discovery rate could be, in fact, the product of overfitting. This problem has been overlooked until now and could lead to obtaining boosted identification numbers whose reliability does not correspond to the expected false-discovery rate. To remedy this, we are introducing a modified version of the method, termed a semi-labeled decoy approach, which enables the statistical determination of an overfitted result."

more

Comments: Dr. Pavel Pevzner published some paper with similar idea, if my memory services me correct.

Saturday, July 9, 2011

HPLC 2011 highlights chromatography technologies

Thermo Fisher Scientific is to highlight its expanded offering of chromatography instruments, software and consumables, including the Accucore HPLC column range, at the HPLC 2011 event in Budapest.

The Accucore HPLC column range is said to enhance laboratory workflow and efficiency by providing increased sensitivity and peak resolution in columns that are compatible with almost any instrument.

Thermo Fisher Scientific will also showcase the Easy-NLC 1000 split-free nano-flow system for advanced proteomics research at HPLC 2011.

This system is said to increase chromatographic resolution and, as a result, protein and peptide identifications, with ultra-high-pressure operation.

The company will also highlight the Velos Pro linear ion trap and the Orbitrap Velos Pro hybrid FTMS mass spectrometers.

These systems are said to provide improved quantitative performance, faster scanning, trap higher energy collision dissociation (HCD) and enhanced robustness.

Thermo Fisher Scientific will also introduce the Q Exactive high-performance benchtop quadrupole-Orbitrap LC-MS/MS, which combines quadrupole precursor selection and high-resolution accurate mass (HR/AM) Orbitrap mass analysis to deliver high-confidence quantitative and qualitative workflows.

With the HR/AM Quanfirmation capability, the Q Exactive mass spectrometer can identify, quantify and confirm more trace-level peptides and proteins in complex mixtures in one analytical run.

The Orbitrap Elite hybrid mass spectrometer is said to provide the resolution and sensitivity required to improve the determination of the molecular weights of intact proteins within laboratories, as well as enable greater proteome coverage through improved protein, PTM and peptide identification, even at low abundances.

Saturday, June 18, 2011

MSE from Waters - the ultimate technology for reproducible profiling

"Waters mass spectrometers provide a method of data acquisition - known as MS^E - that records exact mass precursor and fragment ion information from every detectable component in a sample. This method rapidly alternates between two functions: the first acquiring low-energy exact mass precursor ion spectra, the second acquiring elevated-energy exact mass fragment ion spectra. Every mass is measured, and spectra for each component aligned in retention time. This patented method records data without discrimination or pre-selection so your samples are completely catalogued in a single analysis.

When compared to Data Directed Analysis (DDA), MS^E maximizes instrument duty cycle by ensuring that exact mass precursor and fragment ion information data are obtained for the entire peak complement of a chromatogram, making it ideal for fast analysis and narrow, rapidly eluting peaks. DDA results in both a loss of data in the MS mode when MS/MS data are being acquired, and poor duty cycle. MS^E data is collected fast enough to accurately define the LC peaks for every detectable component.

MS^E is faster than traditional MS followed by MS/MS analysis, and provides data that is not readily obtained by DDA, as both MS and MS/MS data for all detectable components in the chromatogram are generated. MS^E can generate both precursor and product ions in a single analytical run thereby eliminating the need to rerun the samples to obtain further MS/MS spectra. To see MS^E in action and hear what scientists have to say about it, visit www.waters.com/MS^E."

Saturday, June 11, 2011

Bruker Announces Release of Breakthrough CaptiveSpray(TM) Nano/Capillary Electrospray Ion Source for Proteomics at ASMS 2011

At ASMS 2011, Bruker is introducing the breakthrough, proprietary CaptiveSpray electrospray ion source for nano-HPLC applications in proteomics. Using CaptiveSpray technology in many cases increases bottom-up protein identifications significantly, and CaptiveSpray is presently the best available technology for robust, reproducible protein ID or quantitative proteomics applications, with excellent, stable sensitivity over long time periods.
Unlike a traditional pulled nanospray tip, the Etch-Taper™ technology employed by CaptiveSpray ensures that the internal diameter of the spray tip remains constant, thereby reducing tip clogging, and providing excellent spray stability over the entire LC gradient and robust operation for long time periods, even with heavy proteomics samples loads. A key proprietary feature of the CaptiveSpray is its novel gas-flow focusing technology for dramatic sensitivity gains compared to normal electrospray. The CaptiveSpray source delivers nanospray sensitivity without the need for complex and time consuming spray tip adjustments, while its innovative plug-and-play design fits all current Bruker LC-MS instruments, including the latest maXis UHR-Qq-TOF systems, solariX FTMS systems and amaZon ETD ion trap mass spectrometers.
more

Thursday, June 2, 2011

Thermo Fisher Scientific Introduces New High-Field Orbitrap Mass Spectrometer at ASMS 2011

Novel high-field Orbitrap technology provides exceptional resolving power of >240,000 creating new possibilities in research and discovery

DENVER, CO. June 2, 2011.

Thermo Fisher Scientific Inc., the world leader in serving science, today introduced a new milestone in Orbitrap technology, the Thermo Scientific Orbitrap Elite. The Orbitrap Elite hybrid mass spectrometer integrates Thermo Scientific's faster, more sensitive ion trap - the Thermo Scientific Velos Pro - with the company's new high-field Orbitrap and advanced signal processing technologies. The system offers outstanding resolving power of 240,000, previously available only on Fourier transform ion cyclotron resonance (FTICR) mass spectrometers, as well as a range of fragmentation techniques, helping customers explore and address the most complex challenges in proteomics, metabolomics, lipidomics and metabolism applications. The new mass spectrometer can be seen in the Thermo Scientific hospitality suite at the Hyatt Regency in Centennial Ballroom D during the 59th Annual ASMS Conference on Mass Spectrometry and Allied Topics, from June 5-9 in Denver.

"Thermo Scientific Orbitrap technology is the recognized standard for accurate mass and high-resolution measurement," said Thomas Moehring, product manager, Thermo Fisher Scientific. "With the introduction of the high-field Orbitrap and advanced signal processing technology, we created a new standard in ultrahigh resolution and accurate mass for laboratories performing comprehensive proteomics and metabolism studies."

The Orbitrap Elite embodies multiple advanced technologies including its mass analyzer geometry, unique signal processing, new ion-transfer optics that improve ion beam transmission into the Orbitrap mass analyzer and a new image current pre-amplifier. These capabilities are coupled with new Velos Pro ion trap technology - linear detection electronics, fast scanning and neutral-blocking front-end ion optics - to enhance overall system quantitative performance, speed and uptime. The sum of these unique innovations offers:

Maximum resolving power of greater than 240,000 FWHM at m/z 400
An amazing four-fold increase in scan speed for increased precision and confidence in quantitative results, and enhanced compatibility with UHPLC.
More high-quality, higher-energy collisional induced dissociation (HCD) spectra and FTMSn spectral fragmentation trees for confident structural elucidation.
Exceptional sensitivity for the detection of very low abundance proteins, peptides and metabolites

Published Study Validates New Protein Enrichment Approach For Low-Abundance Biomarker Detection

Hercules, CA — April 20, 2011 — University of Minnesota researchers found that Bio-Rad Laboratories' ProteoMiner protein enrichment kit enhanced identification of changes to low-abundance proteins and detection of post-translationally modified (PTM) proteins in human saliva. These findings offer promise for improving differential proteomic analyses and biomarker studies aimed at identifying disease-specific proteins and their PTM variants in various types of biological samples and fluids. The study was published in the Dec. 13, 2010, issue of the Journal of Proteome Research.

ven when highly sensitive mass spectrometers are used to analyze complex biological samples and bodily fluids, high-abundance proteins obscure the detection of lower-abundance proteins and their post-translational modifications," said Sri Bandhakavi, who led the study at the University of Minnesota in 2010. (Bandhakavi is now a senior scientist at Bio-Rad.) "These lower-abundance proteins and PTMs are often of most interest to researchers, given their association with specific disease or physiological states."

MSblender: a probabilistic approach for integrating peptide identifications from multiple database search engines

"Shotgun proteomics using mass spectrometry is a powerful method for protein identification but suffers limited sensitivity in complex samples. Integrating peptide identifications from multiple database search engines is a promising strategy to increase the number of peptide identifications and reduce the volume of unassigned tandem mass spectra. Existing methods pool statistical significance scores such as p-values or posterior probabilities of peptide-spectrum matches (PSMs) from multiple search engines after high scoring peptides have been assigned to spectra, but these methods lack reliable control of identification error rates as data are integrated from different search engines. We developed a statistically coherent method for integrative analysis, termed MSblender. MSblender converts raw search scores from search engines into a probability score for all possible PSMs and properly accounts for the correlation between search scores. The method reliably estimates false discovery rates and identifies more PSMs than any single search engine at the same false discovery rate. Increased identifications increment spectral counts for all detected proteins and allow quantification of proteins that would not have been quantified by individual search engines. We also demonstrate that enhanced quantification contributes to improve sensitivity in differential expression analyses."
full article

However, a bunch of similar works have been published before. I am not convinced the method is much better than its counterparts.

Monday, March 28, 2011

Score regularization for peptide identification

Abstract

Background

Peptide identification from tandem mass spectrometry (MS/MS) data is one of the most important problems in computational proteomics. This technique relies heavily on the accurate assessment of the quality of peptide-spectrum matches (PSMs). However, current MS technology and PSM scoring algorithm are far from perfect, leading to the generation of incorrect peptide-spectrum pairs. Thus, it is critical to develop new post-processing techniques that can distinguish true identifications from false identifications effectively.

Results

In this paper, we present a consistency-based PSM re-ranking method to improve the initial identification results. This method uses one additional assumption that two peptides belonging to the same protein should be correlated to each other. We formulate an optimization problem that embraces two objectives through regularization: the smoothing consistency among scores of correlated peptides and the fitting consistency between new scores and initial scores. This optimization problem can be solved analytically. The experimental study on several real MS/MS data sets shows that this re-ranking method improves the identification performance.

Conclusions

The score regularization method can be used as a general post-processing step for improving peptide identifications. Source codes and data sets are available at: http://bioinformatics.ust.hk/SRPI.rar webcite.

full article

Xlink-Identifier: An Automated Data Analysis Platform for Confident Identifications of Chemically Cross-Linked Peptides Using Tandem Mass Spectrometry

Chemical cross-linking combined with mass spectrometry provides a powerful method for identifying protein−protein interactions and probing the structure of protein complexes. A number of strategies have been reported that take advantage of the high sensitivity and high resolution of modern mass spectrometers. Approaches typically include synthesis of novel cross-linking compounds, and/or isotopic labeling of the cross-linking reagent and/or protein, and label-free methods. We report Xlink-Identifier, a comprehensive data analysis platform that has been developed to support label-free analyses. It can identify interpeptide, intrapeptide, and deadend cross-links as well as underivatized peptides. The software streamlines data preprocessing, peptide scoring, and visualization and provides an overall data analysis strategy for studying protein−protein interactions and protein structure using mass spectrometry. The software has been evaluated using a custom synthesized cross-linking reagent that features an enrichment tag. Xlink-Identifier offers the potential to perform large-scale identifications of protein−protein interactions using tandem mass spectrometry.
read more

mProphet: automated data processing and statistical validation for large-scale SRMSRMSRM experiments

Selected reaction monitoring (SRM) is a targeted mass spectrometric method that is increasingly used in proteomics for the detection and quantification of sets of preselected proteins at high sensitivity, reproducibility and accuracy. Currently, data from SRM measurements are mostly evaluated subjectively by manual inspection on the basis of ad hoc criteria, precluding the consistent analysis of different data sets and an objective assessment of their error rates. Here we present mProphet, a fully automated system that computes accurate error rates for the identification of targeted peptides in SRM data sets and maximizes specificity and sensitivity by combining relevant features in the data into a statistical model.

mzServer: Web-based Programmatic Access for Mass Spectrometry Data Analysis

Abstract

Continued progress towards systematic generation of large-scale and comprehensive proteomics data in the context of biomedical research will create project-level data sets of unprecedented size and ultimately overwhelm current practices for results validation that are based on distribution of native or surrogate mass spectrometry files. Moreover, the majority of proteomics studies leverage discovery-mode MS/MS analyses, rendering associated data-reduction efforts incomplete at best, and essentially ensuring future demand for re-analysis of data as new biological and technical information become available. Based on these observations, we propose to move beyond the sharing of interpreted spectra, or even the distribution of data at the individual file or project level, to a system much like that used in high-energy physics and astronomy, whereby raw data are made programmatically accessible at the site of acquisition. Towards this end we have developed a web-based server (mzServer), which exposes our common API (mzAPI) through very intuitive (RESTful) URLs and provides remote data access and analysis capabilities to the research community. Our prototype mzServer provides a model for lab-based and community-wide data access and analysis. A live instance of the mzServer can be accessed directly at: http://blais.dfci.harvard.edu/mzServer/ The data associated with this manuscript may be downloaded from the ProteomeCommons.org Tranche network using the following hash: 6g+QpUvlpxc6PM/M9t/49h0PMLwA7dTCgpwyUqfciXEyZpLun7QzPz8E+LDDJfZzBf1lGKe7t1OkXbmomzTEy70Av/kAAAAAAAAYtg== In addition, the data are available here: http://ec2-50-16-31-157.compute-1.amazonaws.com/

University Uses Thermo Fisher Scientific ICP-MS for Reliable and Efficient Sulfur Detection in Proteins

"Thermo Fisher Scientific Inc. announced that the University of Oviedo’s analytical spectrometry research group has implemented the Thermo Scientific XSERIES 2 ICP-MS to perform reliable and interference-free sulfur detection in proteins.

The analytical spectrometry research group at the University of Oviedo in Asturias, Spain aims to solve the analytical challenges encountered by science and technology. Within this framework, a small sub-group has been established focusing on the development of inductively coupled plasma-mass spectrometry (ICP-MS) based analytical methods for the quantification of biopolymers such as DNA and proteins. One of the principal issues faced by the group is the interference from gas-based polyatomics such as oxygen in the determination of sulfur when using a low resolution instrument. To eliminate these problems, the group selected the Thermo Scientific XSERIES 2 ICP-MS with collision/reaction cell technology (CRC).

Quantitative protein analysis is currently one of the most demanding applications in analytical chemistry. Mass spectrometric techniques such as electrospray ionization-mass spectrometry (ESI-MS) and matrix assisted laser desorption ionization-mass spectrometry (MALDI-MS) have traditionally played a key role in protein analysis. However, the potential of ICP-MS has recently been recognized for the determination of proteins. Although ICP-MS detection does not provide any structural information, its outstanding capability to quantify most of the elements proves valuable for accurate protein quantification. Keeping pace with the latest technological developments, the University of Oviedo’s research group has coupled the XSERIES 2 ICP-MS with a reversed-phase capillary liquid chromatography (μLC) system to facilitate precise determination of sulfur isotopes in standard proteins.

Dr. Jörg Bettmer of the University of Oviedo’s analytical spectrometry research group comments, “The Thermo Scientific XSERIES 2 ICP-MS was chosen because no other quadrupole-based system matches its capabilities in terms of accuracy, reliability and overall efficiency. The implementation of the instrument has allowed us to achieve reliable, interference-free detection of sulfur isotopes. It has enabled us to determine sulfur-containing standard proteins in an accurate and efficient manner that had not been previously possible.”

The XSERIES 2 offers outstanding productivity in a quadrupole ICP-MS for both routine and high-performance analytical applications. By using the system, laboratories can achieve their analytical objectives faster, with greater confidence and less hands-on time from the operator. The innovative ion lens design of the instrument enables simple field upgrade to collision cell technology (CCT) performance without affecting the normal (non-CCT mode) sensitivity or background. The cell is also compatible with a range of reactive gases, such as pure oxygen for interference suppression in challenging matrices."

source

Wednesday, January 19, 2011

Stable isotope shifted matrices enable the use of low mass ion precursor scanning for targeted metabolite identification

We describe a method to identify metabolites of exogenous proteins that eliminates endogenous background by using stable isotope labeled matrices. This technique allows selective screening of the intact therapeutic molecule and all metabolites using a modified precursor ion scan that monitors low molecular weight fragment ions produced during MS/MS. This distinct set of low mass ions differs between isotopically labeled and natural isotope containing species allowing excellent discrimination between endogenous compounds and target analytes during the precursor scanning experiments. All compounds containing amino acids that consist of naturally abundant isotopes can be selected using this scanning technique for further analysis, including metabolites of the parent molecule. The sensitivity and selectivity of this technique is discussed with specific examples of insulin derived peptides being screened from a complex matrix using a range of different validated target ions.

more

Friday, January 14, 2011

Comparison of Different Signal Thresholds on Data Dependent Sampling in Orbitrap and LTQ Mass Spectrometry for the Identification of Peptides and Proteins in Complex Mixtures

We evaluate the effect of ion-abundance threshold settings for data dependent acquisition on a hybrid LTQ-Orbitrap mass spectrometer, analyzing features such as the total number of spectra collected, the signal to noise ratio of the full MS scans, the spectral quality of the tandem mass spectra acquired, and the number of peptides and proteins identified from a complex mixture. We find that increasing the threshold for data dependent acquisition generally decreases the quantity but increases the quality of the spectra acquired. This is especially true when the threshold setting is set above the noise level of the full MS scan. We compare two distinct experimental configurations: one where full MS scans are acquired in the Orbitrap analyzer, while tandem MS scans are acquired in the LTQ analyzer and one where both full MS and tandem MS scans are acquired in the LTQ analyzer. We examine the number of spectra, peptides, and proteins identified under various threshold conditions, and we find that the optimal threshold setting is at or below the respective noise level of the instrument regardless of whether the full MS scan is performed in the Orbitrap or in the LTQ analyzer. When comparing the high-throughput identification performance of the two analyzers, we conclude that, used at optimal threshold levels, the LTQ and the Orbitrap identify similar numbers of peptides and proteins. The higher scan speed of the LTQ, which results in more spectra being collected, is roughly compensated by the higher mass accuracy of the Orbitrap, which results in improved database searching and peptide validation software performance.
Keep reading

Friday, January 7, 2011