Bioinformatics Advance Access published online on June 30, 2008
Bioinformatics, doi:10.1093/bioinformatics/btn335
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Automated mapping of large-scale chromatin structure in ENCODE
1Division of Mathematical Sciences, SPMS, Nanyang Technological University, Singapore, 2Center for Computational Molecular Biology, Division of Applied Mathematics, Brown University, Providence, RI, USA, 3Division of Medical Genetics, University of Washington, Seattle, WA, USA, 4Department of Genome Sciences, University of Washington, Seattle, WA, USA and 5Department of Computer Science and Engineering, University of Washington, Seattle, WA, USA
*To whom correspondence should be addressed. Charles E. Lawrence, E-mail: Charles_Lawrence{at}brown.edu
| Abstract |
|---|
Motivation: A recently developed DNase I assay has given us our first genome-wide view of chromatin structure. In addition to cataloging DNase I hypersensitive sites, these data allows us to more completely characterize overall features of chromatin accessibility. We employed a Bayesian hierarchical change point model (CPM), a generalization of a hidden Markov Model, to characterize tiled microarray DNase I sensitivity data available from the ENCODE project.
Results: Our analysis shows that the accessibility of chromatin to cleavage by DNase I is well described by a four state model of local segments with each state described by a continuous mixture of Gaussian variables. The CPM produces a better fit to the observed data than the HMM. The large posterior probability for the fourstate CPM suggests that the data falls naturally into four classes of regions, which we call major and minor DNase I hypersensitive sites (DHSs), regions of intermediate sensitivity, and insensitive regions. These classes agree well with a model of chromatin in which local disruptions (DHSs) are concentrated within larger domains of intermediate sensitivity, the accessibility islands. The CPM assigns 92% of the bases within the ENCODE regions to the insensitive regions. The 5.8% of the bases that are in regions of intermediate sensitivity are clearly enriched in functional elements, including genes and activating histone modifications, while the remaining 2.2% of the bases in hypersensitive regions are very strongly enriched in these elements.
Availability: The CPM software is available upon request from the authors.
Contact: jstam{at}stamlab.org; wnoble{at}u.washington.edu; Charles_Lawrence{at}brown.edu
Supplementary information: Supplementary data are available at Bioinformatics online.
Associate Editor: Dr. Jonathan Wren
Received on May 15, 2008; revised on June 28, 2008; accepted on June 28, 2008