A Temporal Hidden Markov Regression Model for the Analysis of Gene Regulatory Networks
The profusion of genomic data through genome sequencing and
gene expression microarray technology has facilitated statistical
research in determining gene interactions regulating a biological
process. Current methods generally consist of a two-stage procedure:
clustering gene expression measurements, and searching for regulatory
"switches", typically short, conserved sequence patterns (motifs) in
the DNA sequence adjacent to the genes. This process often leads to
misleading conclusions as incorrect cluster selection may lead to
missing important regulatory motifs or making many false discoveries.
Treating cluster memberships as known, rather than estimated,
introduces bias into the analysis, preventing uncertainty about cluster
parameters. Further, there is under-utilization of the available data,
as the sequence information is ignored for purposes of expression
clustering and vice-versa. We propose a way to address these issues
by combining gene clustering and motif discovery in a unified framework.
We propose a novel hierarchical hidden Markov regression model for
determining gene regulatory networks from genomic sequence and temporally
collected gene expression microarray data. The statistical challenge is
to simultaneously determine the groupings of genes and subsets of motifs
involved in their regulation, when the groupings may vary over time, and
a large number of potential regulators are available. We devise a hybrid
Monte Carlo methodology to estimate parameters under two classes of latent
structure, one arising due to the unobservable state identity of genes,
and the other due to the unknown set of covariates influencing the response
within a state, leading to simultaneous variable selection (for motifs) and
clustering (for genes). This methodology is illustrated on a yeast cell
cycle dataset to determine an optimal set of motifs that discriminates
between groups of genes and simultaneously finds the most significant gene
clusters.
Seminar Date:
March 14, 2007
3:30 pm