Chromatin, Chromosomes, and Epigenetics
Learn how DNA is packaged into chromatin, how histone modifications and DNA methylation regulate gene expression, how chromosomes are organized in 3D, and the computational tools for analyzing the epigenome.
Introduction
The human genome contains about 3.2 billion base pairs of DNA per haploid set. Stretched end to end, the DNA from a single cell would be roughly two meters long — yet it must fit inside a nucleus only 5–10 µm in diameter. This requires extraordinary compaction, achieved through a hierarchy of packaging structures that do far more than simply compress the DNA. The packaging itself carries information: which regions of the genome are accessible for transcription, which are silenced, and how the three-dimensional architecture of chromosomes influences gene regulation.
This lesson explores chromatin structure from nucleosomes to whole chromosomes, the epigenetic modifications that regulate it, and the computational tools used to map and analyze the epigenome.
Eukaryotic DNA Is Packaged into Chromosomes
Each eukaryotic chromosome is a single continuous DNA molecule complexed with proteins to form chromatin. Human cells have 23 pairs of chromosomes (22 autosomal pairs plus the sex chromosomes), with each chromosome carrying a defined set of genes arranged in a linear order. The complete set of chromosomes visible during mitosis is called the karyotype.
Each chromosome must contain three essential DNA elements:
- Centromere — the attachment point for the mitotic spindle, ensuring accurate chromosome segregation during cell division
- Telomeres — repetitive sequences (TTAGGG in humans) at both ends that protect against degradation and end-to-end fusion
- Origins of replication — multiple sites (tens of thousands per human chromosome) where DNA replication initiates
The nucleotide sequence of the human genome reveals that genes are unevenly distributed. Some chromosomes are gene-rich (chromosome 19 has the highest gene density) while others are gene-poor (chromosome 18). Within chromosomes, gene-rich regions tend to be GC-rich, replicate early in S phase, and occupy the interior of the nucleus, while gene-poor regions are AT-rich, replicate late, and are found near the nuclear periphery.
Nucleosomes: The Basic Unit of Chromatin
The first level of DNA compaction is the nucleosome. A nucleosome core particle consists of 147 base pairs of DNA wrapped 1.65 turns around an octamer of histone proteins: two copies each of histones H2A, H2B, H3, and H4. Nucleosomes are connected by stretches of linker DNA (typically 20–60 bp), with linker histone H1 binding at the entry/exit point of the DNA to help stabilize higher-order folding.
Histones are among the most evolutionarily conserved proteins known. Histone H4, for example, differs by only 2 amino acids between humans and peas — organisms that diverged over a billion years ago. This extreme conservation reflects the critical structural constraints imposed by the need to package DNA while maintaining access for replication, transcription, and repair.
The nucleosome compacts DNA approximately 6-fold. Subsequent levels of folding — the 30-nm fiber (whose structure remains debated), chromatin loops, and scaffold attachment — achieve a total compaction of roughly 10,000-fold in interphase chromatin and up to 100,000-fold in mitotic chromosomes.
Dynamic Nucleosomes and Chromatin Remodeling
Nucleosomes are not static barriers. They have a dynamic structure and are frequently repositioned, evicted, or restructured by ATP-dependent chromatin remodeling complexes. Four major families of remodelers exist (SWI/SNF, ISWI, CHD, and INO80), each using the energy of ATP hydrolysis to slide nucleosomes along DNA, eject them, or exchange histone variants.
This dynamic remodeling is essential for gene regulation: a gene buried under tightly packed nucleosomes cannot be transcribed until remodelers clear the promoter region. Conversely, remodelers can close down a promoter by repositioning nucleosomes over it.
Histone variants (e.g., H2A.Z, H3.3, CENP-A) can replace standard histones in specific nucleosomes, altering the stability and properties of the nucleosome. H3.3 is enriched at actively transcribed genes, while CENP-A marks centromeric chromatin and is essential for kinetochore assembly.
Euchromatin and Heterochromatin
Chromatin exists in two broad functional states:
| State | Appearance | Transcription | Typical locations |
|---|---|---|---|
| Euchromatin | Open, decondensed | Active or poised | Gene-rich regions |
| Heterochromatin | Compact, condensed | Silenced | Centromeres, telomeres, repetitive regions |
Constitutive heterochromatin remains condensed in all cell types at all times. It is enriched in repetitive sequences (satellite DNA, transposable elements) and is marked by the histone modification H3K9me3 and the chromatin protein HP1. Centromeres and pericentromeric regions are constitutive heterochromatin.
Facultative heterochromatin can switch between open and closed states depending on cell type and developmental stage. The most dramatic example is X-chromosome inactivation in female mammals, where one entire X chromosome is silenced as facultative heterochromatin (the Barr body), coated by the long non-coding RNA Xist and marked by H3K27me3.
Histone Modifications: The Epigenetic Code
Histones have flexible N-terminal tails that protrude from the nucleosome and can be covalently modified. These modifications alter chromatin structure and recruit specific effector proteins without changing the DNA sequence — this is the basis of epigenetics.
Key histone modifications and their functions:
| Modification | Residue | Effect |
|---|---|---|
| H3K4me3 | H3 lysine 4, trimethyl | Marks active promoters |
| H3K4me1 | H3 lysine 4, monomethyl | Marks enhancers |
| H3K27ac | H3 lysine 27, acetyl | Marks active enhancers and promoters |
| H3K36me3 | H3 lysine 36, trimethyl | Marks actively transcribed gene bodies |
| H3K27me3 | H3 lysine 27, trimethyl | Marks Polycomb-repressed genes |
| H3K9me3 | H3 lysine 9, trimethyl | Marks constitutive heterochromatin |
| H4K20me3 | H4 lysine 20, trimethyl | Marks heterochromatin, DNA damage response |
These marks are placed by writers (histone methyltransferases, acetyltransferases, kinases), removed by erasers (demethylases, deacetylases, phosphatases), and interpreted by readers (proteins with recognition domains: bromodomains bind acetyl-lysine, chromodomains and Tudor domains bind methyl-lysine, 14-3-3 proteins bind phospho-serine).
The histone code hypothesis proposes that specific combinations of histone modifications form a code that is read by effector proteins to dictate downstream chromatin states. While the concept of a strict “code” has been debated, it is clear that histone modifications work combinatorially to define chromatin states and regulate gene expression.
DNA Methylation
In mammals, a second layer of epigenetic information is provided by DNA methylation — the addition of a methyl group to the 5-carbon of cytosine, primarily at CpG dinucleotides, catalyzed by DNA methyltransferases (DNMT1, DNMT3A, DNMT3B).
DNA methylation at gene promoters is associated with transcriptional silencing. It attracts methyl-CpG-binding proteins (MeCP2, MBD1-4) that recruit histone deacetylases and other repressive chromatin modifiers, creating a self-reinforcing silenced state.
CpG islands — regions of high CpG density found at ~60% of human gene promoters — are normally unmethylated. When they become methylated (as in cancer, imprinting, or X-inactivation), the associated gene is stably repressed.
Epigenetic marks are inherited through cell division: after replication, DNMT1 (the maintenance methyltransferase) recognizes hemimethylated CpG sites and methylates the new strand, preserving the methylation pattern. Similarly, some histone modifications are propagated during replication, though the mechanisms are less well understood. This epigenetic inheritance allows cells to maintain their differentiated identity across divisions without any change to the DNA sequence.
Despite their stability, epigenetic marks can be reprogrammed — during early embryonic development, germ cell development, and experimentally through introduction of the Yamanaka factors (Oct4, Sox2, Klf4, c-Myc) to generate induced pluripotent stem cells (iPSCs).
The Global Structure of Chromosomes
At higher levels of organization, chromosomes display distinctive structural features that influence gene regulation.
Lampbrush chromosomes — found in amphibian oocytes — display large loops of decondensed chromatin extending from a compact central axis. These loops correspond to actively transcribed genes and demonstrate that chromatin decondensation is required for transcription.
Polytene chromosomes — found in Drosophila salivary glands and other tissues — form when chromosomes replicate without cell division, producing up to 1,024 parallel copies held in register. The resulting giant chromosomes show a characteristic banding pattern that provides a rough map of gene distribution and chromatin states. Dark bands correspond to more compacted, AT-rich regions, while light interbands correspond to less compacted, gene-rich regions.
In the interphase nucleus, individual chromosomes do not mix freely. Instead, each chromosome occupies a discrete chromosome territory. Gene-rich chromosomes tend to be located in the nuclear interior, while gene-poor chromosomes are found near the nuclear periphery and the nuclear lamina (a meshwork of lamin proteins lining the inner nuclear membrane). Genes that associate with the lamina (in lamina-associated domains, LADs) are generally transcriptionally silent.
DNA loop extrusion — driven by the motor protein cohesin and delimited by the insulator protein CTCF — organizes chromatin into loops that bring enhancers into contact with their target promoters while insulating them from neighboring genes. Loop extrusion is a major mechanism for establishing the three-dimensional architecture of the genome.
ChIP-seq: Mapping Histone Modifications
Chromatin immunoprecipitation followed by sequencing (ChIP-seq) is the standard technique for mapping histone modifications and transcription factor binding sites genome-wide. The procedure involves cross-linking proteins to DNA, fragmenting the chromatin, immunoprecipitating with an antibody against the modification of interest, and sequencing the enriched DNA fragments. The result is a genome-wide map of where a particular histone mark or protein is located.
ATAC-seq (Assay for Transposase-Accessible Chromatin) and DNase-seq map chromatin accessibility — regions where the DNA is not tightly wrapped in nucleosomes and is therefore accessible to regulatory proteins. Open chromatin regions correspond to active promoters, enhancers, and other regulatory elements.
Nucleosome positioning analysis uses MNase-seq (micrococcal nuclease digestion followed by sequencing) to map the precise locations of nucleosomes across the genome. Well-positioned nucleosomes at gene promoters regulate access to the transcription start site.
Chromatin State Modeling
The combination of histone modifications at any genomic location defines a chromatin state. Computational tools segment the genome into distinct states:
ChromHMM and Segway use hidden Markov models to integrate multiple ChIP-seq datasets and define chromatin states (e.g., “active promoter,” “strong enhancer,” “Polycomb-repressed,” “heterochromatin”) at every position in the genome. These models have revealed that the human genome can be described by approximately 15–25 distinct chromatin states, each associated with specific functional roles.
Hi-C and 3D Genome Organization
Hi-C is a chromosome conformation capture method that maps all chromatin interactions genome-wide, revealing the three-dimensional organization of chromosomes. Hi-C data shows that chromosomes are organized into:
- Compartments — megabase-scale regions that are either active (A compartments, euchromatic) or inactive (B compartments, heterochromatic)
- Topologically associating domains (TADs) — sub-megabase regions (~100 kb–1 Mb) within which chromatin interactions are frequent, separated by boundaries where interactions drop sharply
- Loops — specific long-range contacts, often between promoters and enhancers, mediated by CTCF and cohesin
TAD boundaries act as insulators, preventing enhancers in one TAD from activating genes in a neighboring TAD. Disruption of TAD boundaries — by deletion or mutation of CTCF binding sites — can cause developmental disorders and cancer by allowing enhancers to mis-regulate genes.
Integrative Epigenomic Analysis
Large-scale consortia have generated comprehensive epigenomic maps:
- ENCODE (Encyclopedia of DNA Elements) has mapped histone modifications, transcription factor binding, chromatin accessibility, and DNA methylation across hundreds of human cell types
- Roadmap Epigenomics characterized the epigenomes of 111 human tissues and cell types using ChIP-seq, ATAC-seq, and bisulfite sequencing
- IHEC (International Human Epigenome Consortium) coordinates epigenome mapping across multiple countries
These datasets enable integrative analysis: combining histone marks with gene expression, DNA methylation, chromatin accessibility, and 3D conformation to build comprehensive models of gene regulation in health and disease. Epigenome-wide association studies (EWAS) identify DNA methylation changes associated with diseases, analogous to GWAS for genetic variants.
Single-cell epigenomics (scATAC-seq, single-cell bisulfite sequencing) now allows chromatin accessibility and methylation to be profiled in individual cells, revealing epigenetic heterogeneity within seemingly uniform cell populations and enabling the construction of cell-type-specific regulatory maps.
let promoter = "GCGCGCTAGCGCGCGCTAGCGCGCGC"
let cg_count = Seq.kmer_count(promoter, 2)
print("Dinucleotide counts in promoter region:")
print(cg_count)
CpG dinucleotide frequency is a key indicator of CpG islands. Promoter-associated CpG islands show elevated CG dinucleotide counts relative to the genomic background, where CpG is typically suppressed due to deamination of methylcytosine.
let marks = [["H3K4me3", "H3K27ac", "H3K36me3", "H3K27me3", "H3K9me3"], ["active_promoter", 0.95, 0.88, 0.12, 0.05, 0.02], ["strong_enhancer", 0.15, 0.92, 0.08, 0.03, 0.01], ["transcribed_body", 0.10, 0.20, 0.85, 0.04, 0.03], ["polycomb_repressed", 0.02, 0.03, 0.05, 0.91, 0.08], ["heterochromatin", 0.01, 0.01, 0.02, 0.10, 0.93]]
print("Histone modification patterns across chromatin states:")
let heatmap = Viz.heatmap(marks)
print(heatmap)
Each chromatin state has a distinctive combination of histone marks. Active promoters are enriched for H3K4me3 and H3K27ac, while heterochromatin is dominated by H3K9me3. This combinatorial logic is what ChromHMM and Segway learn from ChIP-seq data.
let state_probs = [0.25, 0.15, 0.30, 0.10, 0.08, 0.05, 0.04, 0.03]
let diversity = Stats.shannon(state_probs)
print("Chromatin state diversity (Shannon entropy): " + diversity)
Shannon entropy quantifies the diversity of chromatin states in a genomic region. Higher entropy indicates a more heterogeneous chromatin landscape, as found in gene-dense, actively regulated regions. Lower entropy indicates uniform chromatin, typical of large heterochromatic domains.
Exercise: Detect a CpG Island
Use dinucleotide counting to identify which sequence has CpG island characteristics. A CpG island will have an elevated count of CG dinucleotides compared to a non-island region where CpG is suppressed by methylation-driven deamination.
let seq_a = "CGCGCGATCGCGCCGCGATCGCGCG"
let seq_b = "ATAGATGATATTAATGATAAAGATAT"
let counts_a = Seq.kmer_count(seq_a, 2)
let counts_b = Seq.kmer_count(seq_b, 2)
print("Sequence A dinucleotides:")
print(counts_a)
print("Sequence B dinucleotides:")
print(counts_b)
let answer = "Sequence A"
print(answer)
Exercise: Interpret Histone Modification Patterns
Given histone mark signals for an unknown genomic region, determine its chromatin state. Active promoters show high H3K4me3 and H3K27ac but low repressive marks.
let marks = [["H3K4me3", "H3K27ac", "H3K36me3", "H3K27me3", "H3K9me3"], ["unknown_region", 0.93, 0.85, 0.10, 0.04, 0.01]]
print("Unknown region histone marks:")
let heatmap = Viz.heatmap(marks)
print(heatmap)
let answer = "active_promoter"
print(answer)
Exercise: Analyze Chromatin State Diversity
Compare the Shannon entropy of chromatin state distributions from two cell types. A stem cell with many active regulatory states will show higher entropy than a terminally differentiated cell with large uniform chromatin domains.
let stem_cell = [0.15, 0.14, 0.13, 0.12, 0.12, 0.11, 0.12, 0.11]
let differentiated = [0.03, 0.02, 0.05, 0.02, 0.03, 0.02, 0.03, 0.80]
let entropy_stem = Stats.shannon(stem_cell)
let entropy_diff = Stats.shannon(differentiated)
print("Stem cell entropy: " + entropy_stem)
print("Differentiated cell entropy: " + entropy_diff)
let answer = "Stem cell"
print(answer)
Knowledge Check
Summary
In this lesson you covered chromatin structure, epigenetics, and their computational analysis:
- Eukaryotic DNA is packaged into chromosomes with centromeres, telomeres, and replication origins; genes are unevenly distributed
- Nucleosomes (147 bp around a histone octamer) compact DNA ~6-fold; higher-order folding achieves up to 100,000-fold compaction
- Chromatin remodeling complexes dynamically reposition nucleosomes; histone variants mark specific functional regions
- Euchromatin (open, active) and heterochromatin (condensed, silenced) represent two major chromatin states
- Histone modifications (H3K4me3, H3K27me3, H3K27ac, H3K9me3, etc.) form an epigenetic code read by writers, readers, and erasers
- DNA methylation at CpG sites silences genes; CpG islands at promoters are normally unmethylated
- Epigenetic marks are inherited through cell division; they can be reprogrammed during development or by Yamanaka factors
- Chromosome territories organize the nucleus; lampbrush and polytene chromosomes reveal chromatin structure
- Loop extrusion by cohesin and CTCF establishes 3D chromatin architecture
- ChIP-seq maps histone modifications; ATAC-seq/DNase-seq maps accessibility; MNase-seq maps nucleosome positions
- ChromHMM/Segway define chromatin states from combined histone mark data
- Hi-C reveals 3D genome organization: compartments, TADs, and loops
- ENCODE and Roadmap Epigenomics provide comprehensive epigenomic maps across human cell types
- Single-cell epigenomics (scATAC-seq) reveals cell-to-cell epigenetic heterogeneity
References
- Alberts B, Johnson A, Lewis J, Morgan D, Raff M, Roberts K, Walter P. Molecular Biology of the Cell, 7th ed. New York: W.W. Norton; 2022. Chapter 4: DNA, Chromosomes, and Genomes.
- Luger K, Mäder AW, Richmond RK, Sargent DF, Richmond TJ. Crystal structure of the nucleosome core particle at 2.8 Å resolution. Nature. 1997;389(6648):251–260.
- Jenuwein T, Allis CD. Translating the histone code. Science. 2001;293(5532):1074–1080.
- Park PJ. ChIP-seq: advantages and challenges of a maturing technology. Nat Rev Genet. 2009;10(10):669–680.
- Lister R, Pelizzola M, Dowen RH, et al. Human DNA methylomes at base resolution show widespread epigenomic differences. Nature. 2009;462(7271):315–322.
- Buenrostro JD, Giresi PG, Zaba LC, Chang HY, Greenleaf WJ. Transposition of native chromatin for fast and sensitive epigenomic profiling of open chromatin, DNA-binding proteins and nucleosome position. Nat Methods. 2013;10(12):1213–1218.
- Ernst J, Kellis M. ChromHMM: automating chromatin-state discovery and characterization. Nat Methods. 2012;9(3):215–216.
- Cusanovich DA, Hill AJ, Aghamirzaie D, et al. A single-cell atlas of in vivo mammalian chromatin accessibility. Cell. 2018;174(5):1309–1324.