Protein Life: R & D experience of a bioinformatician

Exploring science is typically characterized by a lot of puzzles, frustrations or even failures. This weblog is mainly intended to record my working, thinking and knowledge acquisitions. I expect that some reflection would refresh my mind from time to time, and motivate me to move further, and hopefully give me a better view about even changing the landscape of bioinformatics. You are welcome to leave some comments, good or bad, but hopefully something constructive. Enjoy your surfing!

Tuesday, November 30, 2010

A new proof of Darwinian Evolution

Motivation: The article presents results of the listing of the quantity of amino acids, dipeptides and tripeptides for all proteins available in the UNIPROT–TREMBL database and the listing for selected species and enzymes. UNIPROT–TREMBL contains protein sequences associated with computationally generated annotations and large-scale functional characterization. Due to the distinct metabolic pathways of amino acid syntheses and their physicochemical properties, the quantities of subpeptides in proteins vary. We have proved that the distribution of amino acids, dipeptides and tripeptides is statistical which confirms that the evolutionary biodiversity development model is subject to the theory of independent events. It seems interesting that certain short peptide combinations occur relatively rarely or even not at all. First, it confirms the Darwinian theory of evolution and second, it opens up opportunities for designing pharmaceuticals among rarely represented short peptide combinations. Furthermore, an innovative approach to the mass analysis of bioinformatic data is presented.

read the full article.

Thursday, November 25, 2010

Selected Videos on Next Generation Sequencing

Saturday, November 20, 2010

Improving gene annotation using peptide mass spectrometry

Tanner S, Shen Z, Ng J, Florea L, Guigó R, Briggs SP, Bafna V . Genome Research, 2007, 17(2)

Annotation of protein-coding genes is a key goal of genome sequencing projects. In spite of tremendous recent advances in computational gene finding, comprehensive annotation remains a challenge. Peptide mass spectrometry is a powerful tool for researching the dynamic proteome and suggests an attractive approach to discover and validate protein-coding genes. We present algorithms to construct and efficiently search spectra against a genomic database, with no prior knowledge of encoded proteins. By searching a corpus of 18.5 million tandem mass spectra (MS/MS) from human proteomic samples, we validate 39,000 exons and 11,000 introns at the level of translation. We present translation-level evidence for novel or extended exons in 16 genes, confirm translation of 224 hypothetical proteins, and discover or confirm over 40 alternative splicing events. Polymorphisms are efficiently encoded in our database, allowing us to observe variant alleles for 308 coding SNPs. Finally, we demonstrate the use of mass spectrometry to improve automated gene prediction, adding 800 correct exons to our predictions using a simple rescoring strategy. Our results demonstrate that proteomic profiling should play a role in any genome sequencing project.

Friday, November 19, 2010

Training: Real-Time PCR Seminar Announcement

Key Steps to Generating High Quality Real-time PCR (RT-qPCR) data that meets the MIQE Guidelines

Speaker; Sean Taylor PhD Bio-RAD Laboratories

Date: Monday, Nov 22th, 2010
Time: 11:00AM-12:00PM
Location: University of Toronto, Donnelly Center, 2nd Floor, Red Seminar Room

In an effort to assist the scientific community to produce consistent, high quality data from RT-qPCR experiments, The Minimum Information for Publication of Quantitative Real-Time PCR Experiments (MIQE) guidelines were recently published. The ultimate goal of MIQE is to establish clear guidelines for the information required o publish RT-qPCR data to allow reviewers and editors to measure the technical quality of submitted manuscripts against an established yardstick and to facilitate easier replication of experiments described in published studies. This talk focus on how to apply the MIQE guidelines to design a solid experimental approach for RT-qPCR.

Tuesday, November 16, 2010

SILAC: Stable Isotope Labeling with Amino acid in Cell culture

SILAC is a simple and straight forward approach for in vivo incorporation of a label into proteins for MS-based quantitative proteomics. SILAC relies on metabolic incorporation of a given "light" or "heavy" form of the amino acid into proteins. The method relies on the incorporation of amino acids with substituted stable isotopic nuclei (e.g. deuterium, 13C, 15N). This in an experiment, two cell populations are grown in cultre media that are identical except that one of them contains a "light" and the other a "heavy" form of a particular amino acid (e.g. 12C and 13C labeled L-lysine, respectively).When the labeled analog of an amino acid is supplied to cells in culture instead of the natural amino acid, it is incorporated into all synthesized proteins. After a number of cell division, each instance of this particular amino acid will be replaced by its isotope labeled analog. Since there is hardly any chemical difference between the labeled amino acid and the natural amino acid isotopes, the cells behave exactly like the control cell population grown in the presence of normal amino acid. It is efficient and reproducible as the incorporation of the isotope is 100%.We anticipate of SIALC will lead to is use as a routine technology in all areas of cell biology.

The above is from http://silac.org/index_html

Bioinformatics outsourcing

Eagle Genomics has just published a white paper that gives an analysis of the current state of outsourcing in the bioinformatics world, and lists 10 important considerations that every R&D IT manager should take into account when thinking about outsourcing a project.

Outsourcing has long been the holy grail of companies trying to make cost savings and increase efficiencies, and never more so than in the current economic turbulence that is sweeping the globe.Ten steps to successfully outsourcing industrial bioinformatics

With reducing R&D budgets, revenue streams under threat from near-expired drug patents, and general loss of consumer confidence leading to reduced sales, every organisation in the biotech world from corporate to academic is faced with making difficult decisions about the future structure and purpose of R&D teams. Outsourcing is vital to the ongoing ability of bioinformatics teams to effectively support R&D activities within their organisations.

More: http://www.eaglegenomics.com/files/register.html

Thursday, November 11, 2010

UCSD becoming the center of biomedical computing

Add caption

Researchers at the University of California, San Diego School of Medicine, led by Lucila Ohno-Machado, MD, PhD, chief of the Division of Biomedical Informatics in the Department of Medicine, have received two federal grants totaling more than $25 million to develop new ways to gather, analyze, use and share vast, ever-increasing amounts of biomedical information.

The first grant for $16.7 million over five years will create a national center for biomedical computing called iDASH. The center will be charged with developing novel algorithms, open-source tools and computational infrastructure and services so that scientists nationwide can share anonymized data essential to large-scale studies and medical progress. Funding comes from the National Heart, Lung and Blood Institute, the National Human Genome Research Institute, the National Library of Medicine, the National Institute of General Medical Sciences and the common fund from the Office of the Director of the National Institutes of Health.

For more, http://ucsdnews.ucsd.edu/newsrel/health/10-22MajorBiomedical.asp

Wednesday, November 10, 2010

Mad Scientist

Tuesday, November 9, 2010

MedWorm: a RSS based utility

"MedWorm is a medical RSS feed provider as well as a search engine built on data collected from RSS feeds. RSS stands for Really Simple Syndication and it is a technology used to simply publish and gather details of the very latest information on the internet. You can read more about RSS here.

MedWorm collects updates from over 6000 authoritative data sources (growing each day) via RSS feeds. From the data collected, MedWorm provides new outgoing RSS feeds on various medical categories that you can subscribe to, via the free MedWorm online service, or another RSS reader of your choice, such as Bloglines, Newsgator, Google Reader or FeedDemon.

The best way to get a feel for the information that MedWorm can provide is to have a browse through the various categories on the menu above. New categories are being added all the time and we are happy to receive requests for new ones if you can't find the category of your choice."

Its web address is http://www.medworm.com/.

Legal Disclaimer: This weblog is not affiliated with MedWorm in any way.

Monday, November 8, 2010

Robert Edgar and his software tools

A amazing guy with very inspiring stories to conduct research and development on his own. I feel much better and motivated to know this guy after reviewing his articles and reading his personal experience as a freelance bioinformatician.

Though the meaning of "drive5" remains a mystery, check out the website http://www.drive5.com/. Read his blog at http://robertedgar.wordpress.com/2010/05/04/an-unemployed-gentleman-scholar/, you may want to admire him as I do.

Next Generation Blast

Copied from my LinkedIn message board.

=================================

1000x faster BLASTX?

What would a 100 to 1000x speed up on BLASTX mean to your research project. No special hardware. No supercomputer. No Joke. One US-based genome center is already using the speed up to enable their huge metagenomics project. The question is, who else has a project that could benefit from this ability? I'd be thankful for your ideas.

Martin Gollery • If it has the same sensitivity and selectivity as BLASTX, then it could mean a great deal. Can you contact me at marty.gollery@gmail.com to tell more about it?

12 days ago

sucheta Tripathy • I will also be interested to know more about this particular BlastX.

Thanks

Sucheta

12 days ago

Jian Liu • is that something like Pattern Hunter using spaced seeds?

11 days ago

Manish Gupta • Great ...................
Using jumping index algo?

11 days ago

Martin Asser Hansen • Usearch provides that kind of speedup

7 days ago

John Clouston • sorry for the delay in answering questions... here are 3 answers.

Martin G - sensitivity and accuracy is tunable, similarity to BLASTX is a must for any product that looks to replace the 'gold standard'.

Jian - I don't know if the strategy is like Pattern Hunter or not actually as I have not used that product. The search technology is based on a prior nucleotide space search that was developed a few years ago called SLIM Search. It is a hash-based approach but with novel strategies to those that are more commonly used in NGS space.

Martin H - USearch is not advertizing 1000x speed up for BLASTX protein search. Actually the author of Usearch says on the website "Currently, translated search (like BLASTX and TBLASTX) is not supported, but I'm working on it" So maybe you have access to a beta version? Please let us know what you know about USearch and protein space.

Disclaimer - when I started this conversation 4 months ago I was employed by Real Time Genomics, the inventors of the RTG mapx product that gives the 100x to 1000x speed up on BLASTX. The product is in use at the Genome Center at Washington University with their Human Microbiome Project (HMP). I am no longer officially representing Real Time Genomics but I will keep answering your questions on this topic. If you contact Real Time Genomics directly, let them know how you heard about the application.

7 days ago

Robert Edgar • I am the author of usearch and ublast. The group may be interested to know that I am now beta-testing a new variant of the ublast algorithm with the following new features: 1. translated search, 2. gapped alignments, 3. chaining of alignments across frame-shifts, 4. ORF identification and target coverage criteria. These features are designed with gene calling in next-generation meta-genomic reads, but can also be useful in other applications. You are welcome to contact me for more information at robert@drive5.com or visit http://drive5.com/usearch for more information and to sign up for updates via email.