Protein Life: R & D experience of a bioinformatician

Exploring science is typically characterized by a lot of puzzles, frustrations or even failures. This weblog is mainly intended to record my working, thinking and knowledge acquisitions. I expect that some reflection would refresh my mind from time to time, and motivate me to move further, and hopefully give me a better view about even changing the landscape of bioinformatics. You are welcome to leave some comments, good or bad, but hopefully something constructive. Enjoy your surfing!

Saturday, October 29, 2011

Enhanced peptide quantification using spectral count clustering and cluster abundance

Quantification of protein expression by means of mass spectrometry (MS) has been introduced in various proteomics studies. In particular, two label-free quantification methods, such as spectral counting and spectra feature analysis have been extensively investigated in a wide variety of proteomic studies.

The cornerstone of both methods is peptide identification based on a proteomic database search and subsequent estimation of peptide retention time. However, they often suffer from restrictive database search and inaccurate estimation of the liquid chromatography (LC) retention time.

Furthermore, conventional peptide identification methods based on the spectral library search algorithms such as SEQUEST or SpectraST have been found to provide neither the best match nor high-scored matches. Lastly, these methods are limited in the sense that target peptides cannot be identified unless they have been previously generated and stored into the database or spectral libraries.To overcome these limitations, we propose a novel method, namely Quantification method based on Finding the Identical Spectral set for a Homogenous peptide (Q-FISH) to estimate the peptide's abundance from its tandem mass spectrometry (MS/MS) spectra through the direct comparison of experimental spectra.

Intuitively, our Q-FISH method compares all possible pairs of experimental spectra in order to identify both known and novel proteins, significantly enhancing identification accuracy by grouping replicated spectra from the same peptide targets.

Results: We applied Q-FISH to Nano-LC-MS/MS data obtained from human hepatocellular carcinoma (HCC) and normal liver tissue samples to identify differentially expressed peptides between the normal and disease samples. For a total of 44,318 spectra obtained through MS/MS analysis, Q-FISH yielded 14,747 clusters.

Among these, 5,777 clusters were identified only in the HCC sample, 6,648 clusters only in the normal tissue sample, and 2,323 clusters both in the HCC and normal tissue samples. While it will be interesting to investigate peptide clusters only found from one sample, further examined spectral clusters identified both in the HCC and normal samples since our goal is to identify and assess differentially expressed peptides quantitatively.

The next step was to perform a beta-binomial test to isolate differentially expressed peptides between the HCC and normal tissue samples. This test resulted in 84 peptides with significantly differential spectral counts between the HCC and normal tissue samples.

We independently identified 50 and 95 peptides by SEQUEST, of which 24 and 56 peptides, respectively, were found to be known biomarkers for the human liver cancer. Comparing Q-FISH and SEQUEST results, we found 22 of the differentially expressed 84 peptides by Q-FISH were also identified by SEQUEST.

Remarkably, of these 22 peptides discovered both by Q-FISH and SEQUEST, 13 peptides are known for human liver cancer and the remaining 9 peptides are known to be associated with other cancers.

Conclusions: We proposed a novel statistical method, Q-FISH, for accurately identifying protein species and simultaneously quantifying the expression levels of identified peptides from mass spectrometry data. Q-FISH analysis on human HCC and liver tissue samples identified many protein biomarkers that are highly relevant to HCC.

Q-FISH can be a useful tool both for peptide identification and quantification on mass spectrometry data analysis. It may also prove to be more effective in discovering novel protein biomarkers than SEQUEST and other standard methods.

Author: Seungmook LeeMin-Seok KwonHyoung-Joo LeeYoung-Ki PaikHaixu TangJae LeeTaesung Park
Credits/Source: BMC Bioinformatics 2011, 12:423

Friday, October 21, 2011

Researchers generate first complete 3-D structures of bacterial chromosome

A team of researchers at the University of Massachusetts Medical School, Harvard Medical School, Stanford University and the Prince Felipe Research Centre in Spain have deciphered the complete three-dimensional structure of the bacterium Caulobacter cresc ...

Source: University of Massachusetts Medical School

Monday, October 17, 2011

A QuantuMDx Leap for Handheld DNA Sequencing

"October 17, 2011 | MONTREAL – Speaking for the first time in his life as a commercial consultant rather than a public servant, Sir John Burn, a highly respected clinical geneticist in the United Kingdom, provided the first glimpse at a nanowire technology for rapid DNA genotyping that could eventually mature into the world’s first handheld DNA sequencer.

Burn previewed a potentially disruptive genome diagnostic technology in a presentation on the closing day of the International Congress of Human Genetics in Montreal last weekend.

One day a week, Burn, professor of clinical genetics at Newcastle University, serves as medical director for QuantuMDx (QMDx), a British start-up co-founded by molecular biologist Jonathan O’Halloran and healthcare executive Elaine Warburton."

Drug2Gene

Drug2Gene is a free knowledge integrated data repository unifying a number of popular public resources to provide structured and organized information for identified and reported relations between genes/proteins and drugs/compounds. It allows user's interactive management by the ability to flag, comment and update relations, to import new drug-gene relations valuable for a specific project. Gene orthology and similarity information is matched to certain relationship entries assisting the prediction of new unreported drug-gene associations. You can go to the original source of the reported relation and explore for more facts, details and evidences.

website

Saturday, October 8, 2011

Gamers succeed where scientists fail

"Gamers have solved the structure of a retrovirus enzyme whose configuration had stumped scientists for more than a decade. The gamers achieved their discovery by playing Foldit, an online game that allows players to collaborate and compete in predicting the structure of protein molecules."

more

Wednesday, October 5, 2011

mz5: Space- and time-efficient storage of mass spectrometry data sets

"Across a host of mass spectrometry (MS)-driven -omics fields, researchers witness the acquisition of ever increasing amounts of high throughput MS datasets and the need for their compact yet efficiently accessible storage has become clear.
The HUPO proteomics standard initiative (PSI) has defined an ontology and associated controlled vocabulary that specifies the contents of MS data files in terms of an open data format. Current implementations are the mzXML and mzML formats (mzML specification), both of which are based on an XML representation of the data. As a consequence, these formats are not particular efficient with respect to their storage space requirements or I/O performance.
This contribution introduces mz5, an implementation of the PSI mzML ontology that is based on HDF5, an efficient, industrial strength storage backend.
Compared to the current mzXML and mzML standards, this strategy yields an average file size reduction of a factor of ~2 and increases I/O performace ~3-4 fold.
The format is implemented as part of the ProteoWizard project."
more