· Prostate Cancer
· Colon Cancer
· Breast Cancer
· Leukemia

· Autoantibodies
· Proteomics
· Molecular Imaging
» Promoters
· Gene Functional Classification


Gray Box Models for Gene Networks, Use of Promoters


One of the challenges of Genomics research is to infer all of the gene regulatory pathways and their alteration in disease patients. As the quantity and variety of Genomic data increases, molecular biology shifts from a hypothesis-driven model to a data-driven one. This enables inference of regulatory pathways from genome-wide experimental results, including microarray gene expression data and the full DNA gene sequences that contain regulatory elements as well as protein coding regions.

Gene networks are graphic models of regulatory pathways showing causal relationships between the activation or inhibition of one gene compared to the activation or inhibition of another. The availability of massive amounts of genome-wide experimental data makes it possible to imagine disentangling gene networks by retaining only those connections between genes that are consistent with experimental data. Yet, learning the whole network without any prior assumption about its architecture is not mathematically tractable. The number of human genes is presently estimated at 30,000. If we define the state of a network as the current on-and-off values of all 30,000 genes, we have 230000 states, which is roughly 109000. So even if we treat genes as on or off ­ which is false because they show graded levels of activity ­ we have 109000 possible states. This value exceeds even the estimate of the number of particles in the known universe (1080). Therefore it is important to simplify the learning problem as much as possible. One approach that we are pursuing is to constrain the architecture of the network with as much knowledge as possible from already studied pathways and regulatory mechanisms. We call that a gray box model, as opposed to a black box model.

In collaboration with the Royal Holloway University of London, we have devised a new method to combine promoter sequence analysis with microarray gene expression data. A simplified view of gene regulation is that genes encoded in the DNA get transcribed into mRNA and then translated into proteins. Some proteins are transcription factors that activate the transcription of other genes by binding to their promoter region. Genes that are co-regulated may have similar structures of their promoter region. Genes that are co-regulated are usually co-expressed. Our technologies use special SVM kernels that make use of the synergy between the similarity in structure of the promoter region and gene co-expression to hypothesize new regulatory mechanisms.