Thursday, December 30, 2010
Tuesday, December 28, 2010
BMC Genomics. 2010 Dec 1;11 Suppl 3:S8.
ICPD-a new peak detection algorithm for LC/MS.
Department of Electrical Engineering, University of Texas at San Antonio, Texas, USA. firstname.lastname@example.org
BACKGROUND: The identification and quantification of proteins using label-free Liquid Chromatography/Mass Spectrometry (LC/MS) play crucial roles in biological and biomedical research. Increasing evidence has shown that biomarkers are often low abundance proteins. However, LC/MS systems are subject to considerable noise and sample variability, whose statistical characteristics are still elusive, making computational identification of low abundance proteins extremely challenging. As a result, the inability of identifying low abundance proteins in a proteomic study is the main bottleneck in protein biomarker discovery.
RESULTS: In this paper, we propose a new peak detection method called Information Combining Peak Detection (ICPD ) for high resolution LC/MS. In LC/MS, peptides elute during a certain time period and as a result, peptide isotope patterns are registered in multiple MS scans. The key feature of the new algorithm is that the observed isotope patterns registered in multiple scans are combined together for estimating the likelihood of the peptide existence. An isotope pattern matching score based on the likelihood probability is provided and utilized for peak detection.
CONCLUSIONS: The performance of the new algorithm is evaluated based on protein standards with 48 known proteins. The evaluation shows better peak detection accuracy for low abundance proteins than other LC/MS peak detection methods.
Anal Chem. 2010 Dec 22. [Epub ahead of print]
On the Accuracy and Limits of Peptide Fragmentation Spectrum Prediction.
School of Informatics and Computing, Indiana University , Bloomington, Indiana 47408, United States.
We estimated the reproducibility of tandem mass spectra for the widely used collision-induced dissociation (CID) of peptide ions. Using the Pearson correlation coefficient as a measure of spectral similarity, we found that the within-experiment reproducibility of fragment ion intensities is very high (about 0.85). However, across different experiments and instrument types/setups, the correlation decreases by more than 15% (to about 0.70). We further investigated the accuracy of current predictors of peptide fragmentation spectra and found that they are more accurate than the ad-hoc models generally used by search engines (e.g., SEQUEST) and, surprisingly, approaching the empirical upper limit set by the average across-experiment spectral reproducibility (especially for charge +1 and charge +2 precursor ions). These results provide evidence that, in terms of accuracy of modeling, predicted peptide fragmentation spectra provide a viable alternative to spectral libraries for peptide identification, with a higher coverage of peptides and lower storage requirements. Furthermore, using five data sets of proteome digests by two different proteases, we find that PeptideART (a data-driven machine learning approach) is generally more accurate than MassAnalyzer (an approach based on a kinetic model for peptide fragmentation) in predicting fragmentation spectra but that both models are significantly more accurate than the ad-hoc models.
PMID: 21175207 [PubMed - as supplied by publisher]
My comments: the ad-hoc model used by SEQUEST internally is well-known a simple one. Most prediction models can
outperform it with flying color.
My comments: the ad-hoc model used by SEQUEST internally is well-known a simple one. Most prediction models can
outperform it with flying color.
|COPYRIGHT © 2001-2008 PEPTIDE 2.0 INCORPORATED|
Thursday, December 23, 2010
A label-free differential quantitative mass spectrometry method for the characterization and identification of protein changes during citrus fruit development
Citrus is one of the most important and widely grown commodity fruit crops. In this study a label-free LC-MS/MS based shot-gun proteomics approach was taken to explore three main stages of citrus fruit development. These approaches were used to identify and evaluate changes occurring in juice sac cells in various metabolic pathways affecting citrus fruit development and quality.
Protein changes in citrus juice sac cells were identified and quantified using label-free shotgun methodologies. Two alternative methods, differential mass-spectrometry (dMS) and spectral counting (SC) were used to analyze protein changes occurring during earlier and late stages of fruit development. Both methods were compared in order to develop a proteomics workflow that could be used in a non-model plant lacking a sequenced genome. In order to resolve the bioinformatics limitations of EST databases from species that lack a full sequenced genome, we established iCitrus. iCitrus is a comprehensive sequence database created by merging three major sources of sequences (HarvEST:citrus, NCBI/citrus/unigenes, NCBI/citrus/proteins) and improving the annotation of existing unigenes. iCitrus provided a useful bioinformatics tool for the high-throughput identification of citrus proteins. We have identified approximately 1500 citrus proteins expressed in fruit juice sac cells and quantified the changes of their expression during fruit development. Our results showed that both dMS and SC provided significant information on protein changes, with dMS providing a higher accuracy.
Our data supports the notion of the complementary use of dMS and SC for label-free comparative proteomics, broadening the identification spectrum and strengthening the identification of trends in protein expression changes during the particular processes being compared.
Saturday, December 18, 2010
From my Chinese colleagues.
Tandem mass spectrometry-based database searching has become an important technology for peptide and protein identification. One of the key challenges in database searching is the remarkable increase in computational demand, brought about by the expansion of protein databases, semi- or non-specific enzymatic digestion, post-translational modifications and other factors. Some software tools choose peptide indexing to accelerate processing. However, peptide indexing requires a large amount of time and space for construction, especially for the non-specific digestion. Additionally, it is not flexible to use.
We developed an algorithm based on the longest common prefix (ABLCP) to efficiently organize a protein sequence database. The longest common prefix is a data structure that is always coupled to the suffix array. It eliminates redundant candidate peptides in databases and reduces the corresponding peptide-spectrum matching times, thereby decreasing the identification time. This algorithm is based on the property of the longest common prefix. Even enzymatic digestion poses a challenge to this property, but some adjustments can be made to this algorithm to ensure that no candidate peptides are omitted. Compared with peptide indexing, ABLCP requires much less time and space for construction and is subject to fewer restrictions.
Monday, December 13, 2010
"Cytoscape is an open source bioinformatics software platform for visualizing molecular interaction networks and integrating with gene expression profiles and other state data. Additional features are available as plugins. Plugins are available for network and molecular profiling analyses, new layouts, additional file format support and connection with databases and searching in large networks. Plugins may be developed using the Cytoscape open Java software architecture by anyone and plugin community development is encouraged"
Dense subgraphs of Protein-Protein Interaction (PPI) graphs are assumed to be potential functional modules and play an important role in inferring the functional behavior of proteins. Increasing amount of available PPI data implies a fast, accurate approach of biological complex identification. Therefore, there are different models and algorithms in identifying functional modules. This paper describes a new graph theoretic clustering algorithm that detects densely connected regions in a large PPI graph. The method is based on finding bounded diameter subgraphs around a seed node. The algorithm has the advantage of being very simple and efficient when compared with other graph clustering methods. This algorithm is tested on the yeast PPI graph and the results are compared with MCL, Core-Attachment, and MCODE algorithms.
From the author of PEAKS (de facto standard of de novo sequencing).
De novo sequencing is an important task in proteomics to identify novel peptide sequences. Traditionally, only one MS/MS spectrum is used for the sequencing of a peptide; however, the use of multiple spectra of the same peptide with different types of fragmentation has the potential to significantly increase the accuracy and practicality of de novo sequencing. Research into the use of multiple spectra is in a nascent stage. We propose a general framework to combine the two different types of MS/MS data. Experiments demonstrate that our method significantly improves the de novo sequencing of existing software.
Friday, December 10, 2010
A separation in which the mobile phase composition remains constant throughout the procedure is termed isocratic (meaning constant composition). The word was coined by Csaba Horvath who was one of the pioneers of HPLC.
The mobile phase composition does not have to remain constant. A separation in which the mobile phase composition is changed during the separation process is described as a gradient elution.One example is a gradient starting at 10% methanol and ending at 90% methanol after 20 minutes. The two components of the mobile phase are typically termed "A" and "B"; A is the "weak" solvent which allows the solute to elute only slowly, while B is the "strong" solvent which rapidly elutes the solutes from the column. Solvent A is often water, while B is an organic solvent miscible with water, such as acetonitrile, methanol, THF, or isopropanol.
In isocratic elution, peak width increases with retention time linearly according to the equation for N, the number of theoretical plates. This leads to the disadvantage that late-eluting peaks get very flat and broad. Their shape and width may keep them from being recognized as peaks.
Gradient elution decreases the retention of the later-eluting components so that they elute faster, giving narrower (and taller) peaks for most components. This also improves the peak shape for tailed peaks, as the increasing concentration of the organic eluent pushes the tailing part of a peak forward. This also increases the peak height (the peak looks "sharper"), which is important in trace analysis. The gradient program may include sudden "step" increases in the percentage of the organic component, or different slopes at different times – all according to the desire for optimum separation in minimum time.
In isocratic elution, the selectivity does not change if the column dimensions (length and inner diameter) change – that is, the peaks elute in the same order. In gradient elution, the elution order may change as the dimensions or flow rate change.
The driving force in reversed phase chromatography originates in the high order of the water structure. The role of the organic component of the mobile phase is to reduce this high order and thus reduce the retarding strength of the aqueous component.
Reversed-phase chromatography (RPC) has a non-polar stationary phase and an aqueous, moderately polar mobile phase. The name "reversed phase" has a historical background. In the 1970s most liquid chromatography was done on non-modified silica or alumina with a hydrophilic surface chemistry and a stronger affinity for polar compounds - hence it was considered "normal". The introduction of alkyl chains bonded covalently to the support surface reversed the elution order . Now in RPC, polar compounds are eluted first while non-polar compounds are retained - hence "reversed phase". All of the mathematical and experimental considerations used in other chromatographic methods apply (ie separation resolution proportional to the column length). Today, reversed-phase column chromatography accounts for the vast majority of analysis performed in liquid chromatography.
Field Asymmetric Ion Mobility (FAIMS) - Mass Spectrometry
Field Asymmetric Ion Mobility Spectrometer (FAIMS) is a high speed, gas phase ion separation technique. When interfaced to a Mass Spectrometer, the FAIMS chip provides an additional separation stage, making it suitable for a applications ranging from drug development to proteomics.
The FAIMS ion filter is orthogonal to both LC and MS, so has the potential to separate analytes that are difficult to distinguish using only LC-MS. In some cases, the FAIMS stage can replace the LC and associated sample preparation steps.
Thursday, December 9, 2010
We evaluate the effect of ion-abundance threshold settings for data dependent acquisition on a hybrid
LTQ-Orbitrap mass spectrometer, analyzing features such as the total number of spectra collected,
the signal to noise ratio of the full MS scans, the spectral quality of the tandem mass spectra acquired,
and the number of peptides and proteins identified from a complex mixture. We find that increasing
the threshold for data dependent acquisition generally decreases the quantity but increases the quality
of the spectra acquired. This is especially true when the threshold setting is set above the noise level
of the full MS scan. We compare two distinct experimental configurations: one where full MS scans
are acquired in the Orbitrap analyzer, while tandem MS scans are acquired in the LTQ analyzer and
one where both full MS and tandem MS scans are acquired in the LTQ analyzer. We examine the
number of spectra, peptides, and proteins identified under various threshold conditions, and we find
that the optimal threshold setting is at or below the respective noise level of the instrument regardless
of whether the full MS scan is performed in the Orbitrap or in the LTQ analyzer. When comparing
the high-throughput identification performance of the two analyzers, we conclude that, used at
optimal threshold levels, the LTQ and the Orbitrap identify similar numbers of peptides and proteins.
The higher scan speed of the LTQ, which results in more spectra being collected, is roughly
compensated by the higher mass accuracy of the Orbitrap, which results in improved database
searching and peptide validation software performance.