Seminar

Peter Langfelder

Correlation Network Methods for Supervised and Unsupervised Analysis of Gene Expression

Correlation network methods are an increasing popular class of methods for statistical analysis of high-dimensional data such as microarray gene expression and brain imaging data. I will describe several recent theoretical and methodological results in the area of Weighted Gene Co-expression Network Analysis: 1. a data reduction scheme that exploits the cluster structure of gene expression data and allows one to study relationships among clusters (modules) of co-expressed genes that may form biological pathways; 2. a differential analysis framework for analyzing commonalities and differences between different gene expression data sets, in particular for finding clusters of genes present in all studied data sets and studying preservation of their relationships; 3. novel approaches to handling large data sets that arise in gene expression studies, allowing the clustering and module analysis of unlimited numbers of genes; and 4. the Dynamic Tree Cut, a new approach to branch cutting in hierarchical dendrograms that identifies clusters from branch shape, not absolute height. The Dynamic Tree Cut is more flexible, capable of identifying nested clusters, and can optionally combine the advantages of hierarchical clustering with Partitioning Around Medoids (PAM) to improve cluster assignment of outliers. Along with the methods I will present R software packages that allow users of R to quickly and conveniently perform all main points of network analyses of various types of high-dimensional data. I will illustrate the new methods in several applications analyzing recent empirical data sets.





Seminar Date:
October 22, 2008