Tuesday, November 30, 2010

A new proof of Darwinian Evolution

Motivation: The article presents results of the listing of the quantity of amino acids, dipeptides and tripeptides for all proteins available in the UNIPROT–TREMBL database and the listing for selected species and enzymes. UNIPROT–TREMBL contains protein sequences associated with computationally generated annotations and large-scale functional characterization. Due to the distinct metabolic pathways of amino acid syntheses and their physicochemical properties, the quantities of subpeptides in proteins vary. We have proved that the distribution of amino acids, dipeptides and tripeptides is statistical which confirms that the evolutionary biodiversity development model is subject to the theory of independent events. It seems interesting that certain short peptide combinations occur relatively rarely or even not at all. First, it confirms the Darwinian theory of evolution and second, it opens up opportunities for designing pharmaceuticals among rarely represented short peptide combinations. Furthermore, an innovative approach to the mass analysis of bioinformatic data is presented.

read the full article.

Saturday, November 20, 2010

Improving gene annotation using peptide mass spectrometry

Tanner S, Shen Z, Ng J, Florea L, Guigó R, Briggs SP, Bafna V Genome Research, 2007, 17(2)

Annotation of protein-coding genes is a key goal of genome sequencing projects. In spite of tremendous recent advances in computational gene finding, comprehensive annotation remains a challenge. Peptide mass spectrometry is a powerful tool for researching the dynamic proteome and suggests an attractive approach to discover and validate protein-coding genes. We present algorithms to construct and efficiently search spectra against a genomic database, with no prior knowledge of encoded proteins. By searching a corpus of 18.5 million tandem mass spectra (MS/MS) from human proteomic samples, we validate 39,000 exons and 11,000 introns at the level of translation. We present translation-level evidence for novel or extended exons in 16 genes, confirm translation of 224 hypothetical proteins, and discover or confirm over 40 alternative splicing events. Polymorphisms are efficiently encoded in our database, allowing us to observe variant alleles for 308 coding SNPs. Finally, we demonstrate the use of mass spectrometry to improve automated gene prediction, adding 800 correct exons to our predictions using a simple rescoring strategy. Our results demonstrate that proteomic profiling should play a role in any genome sequencing project.

Friday, November 19, 2010

Training: Real-Time PCR Seminar Announcement

Key Steps to Generating High Quality Real-time PCR (RT-qPCR) data that meets the MIQE Guidelines

Speaker; Sean Taylor PhD Bio-RAD Laboratories

Date: Monday, Nov 22th, 2010
Time: 11:00AM-12:00PM
Location: University of Toronto, Donnelly Center, 2nd Floor, Red Seminar  Room

In an effort to assist the scientific community to produce consistent, high quality data  from RT-qPCR experiments, The Minimum Information for Publication of Quantitative Real-Time PCR Experiments (MIQE) guidelines were recently published. The ultimate goal of MIQE is to establish clear guidelines for the information  required o publish RT-qPCR data to allow reviewers and editors to measure the technical quality of submitted manuscripts against an established yardstick and to facilitate easier replication of experiments described in published studies. This talk focus on how to apply the MIQE guidelines to design a solid experimental approach for RT-qPCR.

Tuesday, November 16, 2010

SILAC: Stable Isotope Labeling with Amino acid in Cell culture

SILAC is a simple and straight forward approach for in vivo incorporation of a label into proteins for MS-based quantitative proteomics. SILAC relies on metabolic incorporation of a given "light" or "heavy" form of the amino acid into proteins. The method relies on the incorporation of amino acids with substituted stable isotopic nuclei (e.g. deuterium, 13C, 15N). This in an experiment, two cell populations are grown in cultre media that are identical except that one of them contains a "light" and the other a "heavy" form of a particular amino acid (e.g. 12C and 13C labeled L-lysine, respectively).When the labeled analog of an amino acid is supplied to cells in culture instead of the natural amino acid, it is incorporated into all synthesized proteins. After a number of cell division, each instance of this particular amino acid will be replaced by its isotope labeled analog. Since there is hardly any chemical difference between the labeled amino acid and the natural amino acid isotopes, the cells behave exactly like the control cell population grown in the presence of normal amino acid. It is efficient and reproducible as the incorporation of the isotope is 100%.We anticipate of SIALC will lead to is use as a routine technology in all areas of cell biology.

The above is from http://silac.org/index_html

Bioinformatics outsourcing

Eagle Genomics has just published a white paper that gives an analysis of the current state of outsourcing in the bioinformatics world, and lists 10 important considerations that every R&D IT manager should take into account when thinking about outsourcing a project.

Outsourcing has long been the holy grail of companies trying to make cost savings and increase efficiencies, and never more so than in the current economic turbulence that is sweeping the globe.Ten steps to successfully outsourcing industrial bioinformatics

With reducing R&D budgets, revenue streams under threat from near-expired drug patents, and general loss of consumer confidence leading to reduced sales, every organisation in the biotech world from corporate to academic is faced with making difficult decisions about the future structure and purpose of R&D teams. Outsourcing is vital to the ongoing ability of bioinformatics teams to effectively support R&D activities within their organisations.

More: http://www.eaglegenomics.com/files/register.html

Thursday, November 11, 2010

UCSD becoming the center of biomedical computing

Add caption
Researchers at the University of California, San Diego School of Medicine, led by Lucila Ohno-Machado, MD, PhD, chief of the Division of Biomedical Informatics in the Department of Medicine, have received two federal grants totaling more than $25 million to develop new ways to gather, analyze, use and share vast, ever-increasing amounts of biomedical information.

The first grant for $16.7 million over five years will create a national center for biomedical computing called iDASH. The center will be charged with developing novel algorithms, open-source tools and computational infrastructure and services so that scientists nationwide can share anonymized data essential to large-scale studies and medical progress. Funding comes from the National Heart, Lung and Blood Institute, the National Human Genome Research Institute, the National Library of Medicine, the National Institute of General Medical Sciences and the common fund from the Office of the Director of the National Institutes of Health. 

For more, http://ucsdnews.ucsd.edu/newsrel/health/10-22MajorBiomedical.asp

Wednesday, November 10, 2010

Tuesday, November 9, 2010

MedWorm: a RSS based utility

"MedWorm is a medical RSS feed provider as well as a search engine built on data collected from RSS feeds. RSS stands for Really Simple Syndication and it is a technology used to simply publish and gather details of the very latest information on the internet. You can read more about RSS here.

MedWorm collects updates from over 6000 authoritative data sources (growing each day) via RSS feeds. From the data collected, MedWorm provides new outgoing RSS feeds on various medical categories that you can subscribe to, via the free MedWorm online service, or another RSS reader of your choice, such as Bloglines, Newsgator, Google Reader or FeedDemon.

The best way to get a feel for the information that MedWorm can provide is to have a browse through the various categories on the menu above. New categories are being added all the time and we are happy to receive requests for new ones if you can't find the category of your choice."

Its web address is http://www.medworm.com/.

Legal Disclaimer: This weblog is not affiliated with MedWorm in any way.

Monday, November 8, 2010

Robert Edgar and his software tools

A amazing guy with very inspiring stories to conduct research and development on his own. I feel much better and motivated to know this guy after reviewing his articles and reading his personal experience as a freelance bioinformatician.

Though the meaning of "drive5" remains a mystery, check out the website http://www.drive5.com/. Read his blog at  http://robertedgar.wordpress.com/2010/05/04/an-unemployed-gentleman-scholar/, you  may want to admire him as I do.

Next Generation Blast

Copied from my LinkedIn message board.


1000x faster BLASTX?

What would a 100 to 1000x speed up on BLASTX mean to your research project. No special hardware. No supercomputer. No Joke. One US-based genome center is already using the speed up to enable their huge metagenomics project. The question is, who else has a project that could benefit from this ability? I'd be thankful for your ideas.

Saturday, November 6, 2010


"MaxQuant is a suite of algorithms for analysis of high-resolution mass spectrometry (e.g. Orbitrap and FT) data. It can be used for protein identification for non-labeled samples and identification and quantification for SILAC-labeled samples. MaxQuant includes all steps needed in a computational proteomic platform except that it uses the Mascot search algorithm for peptide identification. In brief, raw data acquired on Orbitrap, is processed using MaxQuant’s Quant module". The processed data is further searched with Mascot. Nature Protocols feature a step-by-step tutorial on how to configure and use this powerful software. http://www.natureprotocols.com/2009/04/16/a_practical_guide_to_the_maxqu.php

The program runs pretty slow on PC if the MS dataset is large. Some procedures appear awkward. For instance, if you change  the fasta files, the Quant step must be conducted completely, taking a lot of time away. Also not every lab has access to Mascot search engine. A relieve news is that the up to coming release of MaxQuant will  have its own search engine. 

It is designed for SILAC labeling MS data. Though it can be forced to analyze unlabelled data, but the results are generally poor.

The MSQuan is also a sibling software from MaxQuant, from the same core developers. Unfortunately, it is not fully maintained, and much harder to use,  according to my personal experience.