Protein Life: R & D experience of a bioinformatician: clustering

Showing posts with label clustering. Show all posts

Friday, December 9, 2011

KABOOM! A new suffix array based algorithm for clustering expression data

Abstract

Motivation: Second-generation sequencing technology has reinvigorated research using expression data, and clustering such data remains a significant challenge, with much larger datasets and with different error profiles. Algorithms that rely on all-versus-all comparison of sequences are not practical for large datasets.

Results: We introduce a new filter for string similarity which has the potential to eliminate the need for all-versus-all comparison in clustering of expression data and other similar tasks. Our filter is based on multiple long exact matches between the two strings, with the additional constraint that these matches must be sufficiently far apart. We give details of its efficient implementation using modified suffix arrays. We demonstrate its efficiency by presenting our new expression clustering tool, wcd-express, which uses this heuristic. We compare it to other current tools and show that it is very competitive both with respect to quality and run time.

Availability: Source code and binaries available under GPL athttp://code.google.com/p/wcdest. Runs on Linux and MacOS X.

Integrating spatial fuzzy clustering with level set methods for automated medical image segmentation

The performance of the level set segmentation is subject to appropriate initialization and optimal configuration of controlling parameters, which require substantial manual intervention. A new fuzzy level set algorithm is proposed in this paper to facilitate medical image segmentation. It is able to directly evolve from the initial segmentation by spatial fuzzy clustering. The controlling parameters of level set evolution are also estimated from the results of fuzzy clustering. Moreover the fuzzy level set algorithm is enhanced with locally regularized evolution. Such improvements facilitate level set manipulation and lead to more robust segmentation. Performance evaluation of the proposed algorithm was carried on medical images from different modalities. The results confirm its effectiveness for medical image segmentation.
read more

Monday, December 13, 2010

Identification of functional modules in a ppi network by bounded diameter clustering.

Dense subgraphs of Protein-Protein Interaction (PPI) graphs are assumed to be potential functional modules and play an important role in inferring the functional behavior of proteins. Increasing amount of available PPI data implies a fast, accurate approach of biological complex identification. Therefore, there are different models and algorithms in identifying functional modules. This paper describes a new graph theoretic clustering algorithm that detects densely connected regions in a large PPI graph. The method is based on finding bounded diameter subgraphs around a seed node. The algorithm has the advantage of being very simple and efficient when compared with other graph clustering methods. This algorithm is tested on the yeast PPI graph and the results are compared with MCL, Core-Attachment, and MCODE algorithms.

Full article