
Steve Horvath
Professor of Biostatistics & Human Genetics
Tel: (310) 825-9299
Fax: (810) 277-7453
Office: 21-254A CHS & 4357A Gonda
Department of Biostatistics
Los Angeles, CA 90095-1772
E-mail: SHorvath@mednet.ucla.edu
Education
B.S. Mathematics and Physics (1989) Technical University of Berlin
Ph.D. in Mathematics (1995) University of North Carolina, Chapel Hill
Doctor of Science in Biostatistics (2000) Harvard School of Public Health
Research & Interests
I am heading the Array Data Analysis Group (ADAG) at UCLA, which specializes in the analysis of DNA and tissue microarray data and is comprised of faculty and students in the departments of Human Genetics, Biostatistics, and the Bioinformatics Program. ADAG has 3 missions: education, data analysis and research.
Please see www.genetics.ucla.edu/labs/horvath
Below I highlight some research efforts of my group.
Family-based Allelic Association Tests for Finding Complex Disease Genes Family-based allelic association tests (FBAT) are used to determine whether genetic markers are associated with disease occurrence. Family-based tests are attractive because they are robust to population admixture effects. Many complex genetic diseases, e.g., Alzheimer's disease, have late age of onset so that it can be difficult to obtain the genetic information of the patient's parents. We developed the sibship disequilibrium test that uses discordant sibships and we collaborated Profs Laird and Xu to develop and implement the family based allelic association test (FBAT) method and software www.biostat.harvard.edu/~fbat/default.html). The FBAT method provides haplotype tests for family-based studies that are efficient and robust to population admixture, phenotype distribution specification, and ascertainment based on phenotypes. It can handle missing parental genotypes and/or missing phase in both offspring and parents. It yields either haplotype-specific (univariate) tests or multi-haplotype (global) tests.
Tissue Microarray Data: Random Forest Clustering We have been excited about the potential of tissue microarray data for cancer genetics. Tissue microarrays are a new high-throughput tool for the study of protein expression patterns in tissues and are increasingly used to evaluate the diagnostic, prognostic importance of tumor biomarkers. Lack of appropriate statistical methodology have inspired us to develop and apply appropriate data analysis methods. Since it is standard practice in the tumor marker community to use cut-off values for tumor marker expression values, we realized the value of using tree- and forest- based prediction methods for these data. In particular, we have focused on the use of random forest dissimilarities for tumor class discovery and have studied the theoretical properties of a random forest dissimilarity www.genetics.ucla.edu/labs/horvath/RFclustering/RFclustering.htm. The random forest dissimilarity weighs the contribution of each covariate in a natural way: the more related the covariate is to other covariates the more it will affect the definition of the dissimilarity. Dependent markers may correspond to disease pathways, which drive the clinical outcomes of interest.
Systems biology: weighted gene co-expression networks High-throughput approaches for analyzing the expressed genome provide an unprecedented opportunity to enhance our understanding of human disease. Identifying disease-associated genes that predict patient survival or that may be therapeutically targeted remains a significant challenge. A relatively new approach to analyzing complex microarray data involves application of graphical network models to identify topological relationships between genes. By elucidating the higher level organizational pattern of gene coexpression networks that regulate cellular phenotype, this approach has the potential to identify key disease genes. We have worked on gene co-expression network methods that can be used to explore the system-level functionality of genes.
The gene network construction is conceptually straightforward: nodes represent genes and nodes are connected if the corresponding genes are significantly co-expressed across appropriately chosen tissue samples. In reality, it is tricky to define the connections between the nodes in such networks. An important question is whether it is biologically meaningful to encode gene co-expression using binary information (connected=1, unconnected=0). We have introduced a general framework for `soft' thresholding that assigns a connection weight to each gene pair. A technical report and an R tutorial can be found here: http://www.genetics.ucla.edu/labs/horvath/CoexpressionNetwork/
More details can be found here.
Courses
Chemistry 260 Winter 2002-3 Lecture: Statistical Methods for Microarray Data Analysis
(Microsoft PowerPoint required to view this file!)
Human Genetics 236 Winter 2001 Advanced Human Genetics: Statistical Genetics and Human Disease Genes
(Microsoft PowerPoint required to view this file!)
Biostat 250B Winter 2001-05 Linear Statistical Models
Biostat M278 Winter 2002, 04 Statistical Analysis of DNA Microarray Data
Biostat 402B Spring 2002 Biostatistical Consulting
