PLoS One. 2009 Dec 1;4(12):e8155.           OPEN  ACCESS  ARTICLE.

http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0008155


"A Biophysical Model for Analysis of Transcription Factor Interaction and Binding Site Arrangement from Genome-Wide Binding Data".

Xin He 1, Chieh-Chun Chen 2, Feng Hong 3, Fang Fang 4, Saurabh Sinha1, Huck-Hui Ng 4, Sheng Zhong1,2,3*

1 Department of Computer Science, University of Illinois at Urbana-Champaign, Champaign, Illinois, United States of America,
2 Department of Bioengineering, University of Illinois at Urbana-Champaign, Champaign, Illinois, United States of America,
3 Department of Statistics, University of Illinois at Urbana-Champaign, Champaign, Illinois, United States of America,
4 Gene Regulation Laboratory, Genome Institute of Singapore, Singapore, Singapore

* E-mail: szhong@illinois.edu

Funding: Funding was provided by the National Science Foundation (http://www.nsf.gov/) DBI 08-45823 (to SZ) and by an NSF Career Grant DBI-0746303 (to SS). The funder had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.


Background:

How transcription factors (TFs) interact with cis-regulatory sequences and interact with each other is a fundamental, but not well understood, aspect of gene regulation.

Methodology/Principal Findings:

We present a computational method to address this question, relying on the established biophysical principles. This method, STAP (sequence to affinity prediction), takes into account all combinations and configurations of strong and weak binding sites to analyze large scale transcription factor (TF)-DNA binding data to discover cooperative interactions among TFs, infer sequence rules of interaction and predict TF target genes in new conditions with no TF-DNA binding data. The distinctions between STAP and other statistical approaches for analyzing cis-regulatory sequences include the utility of physical principles and the treatment of the DNA binding data as quantitative representation of binding strengths. Applying this method to the ChIP-seq data of 12 TFs in mouse embryonic stem (ES) cells, we found that the strength of TF-DNA binding could be significantly modulated by cooperative interactions among TFs with adjacent binding sites. However, further analysis on five putatively interacting TF pairs suggests that such interactions may be relatively insensitive to the distance and orientation of binding sites. Testing a set of putative Nanog motifs, STAP showed that a novel Nanog motif could better explain the ChIP-seq data than previously published ones. We then experimentally tested and verified the new Nanog motif. A series of comparisons showed that STAP has more predictive power than several state-of-the-art methods for cis-regulatory sequence analysis. We took advantage of this power to study the evolution of TF-target relationship in Drosophila. By learning the TF-DNA interaction models from the ChIP-chip data of D. melanogaster (Mel) and applying them to the genome of D. pseudoobscura (Pse), we found that only about half of the sequences strongly bound by TFs in Mel have high binding affinities in Pse. We show that prediction of functional TF targets from ChIP-chip data can be improved by using the conservation of STAP predicted affinities as an additional filter.

Conclusions/Significance:

STAP is an effective method to analyze binding site arrangements, TF cooperativity, and TF target genes from genome-wide TF-DNA binding data.




Figure 1. Model of cooperative DNA binding.

Figure 1. Model of cooperative DNA binding.

The sequence contains three binding sites, two for factor A, and one for factor B. All eight configurations of this sequence, in terms of binding site occupancy, are shown. The arrow connecting two adjacent bound molecules indicates cooperative interaction. For each configuration, the first column represents the weight, i.e., un-normalized probability, and the second column represents the number of bound molecules of A. The parameters in the weight terms are: qA (qB) – strength of factor A (B) binding to DNA; wAB – strength of the interaction between A and B. The binding affinity of A to this sequence is the average of the second column, weighted by the first column.



Related articles from PubMed, April 25, 2010: