Skip to main content
Alpha Cyanea is in public alpha. We're building in the open — expect rough edges and rapid iteration. See what's live

Epigenetic, Post-Transcriptional, and Evolutionary Controls

Intermediate Molecular Biology ~40 min

Explore how cells maintain gene expression patterns through epigenetic inheritance, regulate mRNAs via miRNAs and RNA-binding proteins, and how genomes evolve new regulatory programs — with the computational tools for each.

Introduction

Transcription factor binding determines which genes are turned on or off at any given moment, but cells also need mechanisms to remember their expression patterns through cell division, fine-tune gene output after transcription, and evolve new regulatory programs over evolutionary time. This lesson covers three interconnected layers of gene expression control: epigenetic inheritance (how chromatin and methylation states are maintained through cell division), post-transcriptional regulation (how mRNA stability, translation, and protein degradation fine-tune protein output), and genome evolution (how duplications, transposable elements, and horizontal gene transfer create new genes and regulatory circuits).

DNA Methylation Patterns Can Be Inherited

DNA methylation at CpG dinucleotides is one of the most stable epigenetic marks. In vertebrates, the enzyme DNMT1 (DNA methyltransferase 1) is recruited to the replication fork and copies the methylation pattern from the parental strand to the newly synthesized daughter strand. Because CpG is a palindromic dinucleotide, the parental methylation pattern serves as a template: wherever the parental strand is methylated, DNMT1 methylates the corresponding cytosine on the new strand.

This maintenance methylation ensures that methylation patterns — and the gene silencing they encode — are faithfully transmitted through cell division. De novo methyltransferases (DNMT3A and DNMT3B) establish new methylation patterns during development, while TET enzymes (TET1, TET2, TET3) oxidize 5-methylcytosine to 5-hydroxymethylcytosine and further intermediates, promoting active demethylation.

CpG Islands Are Associated with Many Genes in Mammals

CpG islands — regions of ~500–2,000 bp with high CpG density and GC content — are found at approximately 60% of human gene promoters, including virtually all housekeeping genes. In normal cells, CpG islands are unmethylated and associated with active or poised chromatin, regardless of whether the associated gene is currently expressed.

When CpG islands become aberrantly methylated (as commonly occurs in cancer), the associated gene is stably silenced. This is a frequent mechanism for inactivating tumor suppressor genes in cancer — an alternative to mutational inactivation.

let utr_3prime = "AUGCCUGCACUGUGCCUGUAAAGCUUAUUGCAC"
let seed_miR21 = "UAAGCUA"
print("3' UTR k-mer composition (7-mers):")
print(Seq.kmer_count(utr_3prime, 7))
print("miR-21 seed region: " + seed_miR21)
print("CpG dinucleotides in a CpG island promoter:")
let cpg_island = "GCGCCCGGCGCGCCGCGCCCCGCGCGCCG"
print(Seq.kmer_count(cpg_island, 2))

Chromatin States Can Be Stably Inherited

Beyond DNA methylation, histone modifications can also be inherited through cell division. The mechanisms are less well understood than for DNA methylation, but involve:

  • Recycling of parental histones: during replication, parental histones (with their modifications) are distributed to both daughter chromosomes
  • Read-write mechanisms: enzymes that “read” a histone mark on a parental nucleosome and “write” the same mark on nearby new nucleosomes (e.g., Polycomb repressive complex 2 binds H3K27me3 and also catalyzes new H3K27me3)
  • Phase separation: some chromatin states may be maintained by liquid-liquid phase separation that concentrates specific factors

Wide-ranging chromatin structures — such as the inactive X chromosome (Barr body) or pericentric heterochromatin — are propagated through multiple cell divisions by a combination of DNA methylation, histone H3K9me3, HP1 protein spreading, and non-coding RNA scaffolding (Xist for X inactivation).

Reprogramming of Epigenetic Marks

Although epigenetic marks are stable during normal development, they can be reprogrammed under specific conditions:

  • During germ cell development, nearly all DNA methylation is erased and re-established, resetting the epigenetic slate for the next generation (with important exceptions at imprinted genes)
  • After fertilization, the paternal genome is rapidly demethylated (by TET enzymes), followed by gradual demethylation of the maternal genome
  • Somatic cell nuclear transfer (cloning) and iPSC reprogramming (Yamanaka factors) demonstrate that differentiated epigenetic states can be reversed, though often incompletely

Incomplete reprogramming during cloning explains why cloned animals often have developmental abnormalities — residual epigenetic marks from the donor cell persist and interfere with normal development.

Bisulfite-seq and Methylation Analysis

Bisulfite sequencing is the gold standard for measuring DNA methylation at single-base resolution. Bisulfite treatment converts unmethylated cytosines to uracil (read as thymine after PCR), while methylated cytosines are protected. Comparing bisulfite-converted to unconverted sequences reveals the methylation status of every cytosine.

Computational analysis tools include:

  • Bismark — aligns bisulfite-converted reads to the genome, handling the reduced-complexity sequence
  • DMR detection (differentially methylated regions) — tools like methylKit, DSS, and dmrseq identify regions where methylation differs between conditions (e.g., tumor vs. normal)
  • Chromatin state annotation across cell types, integrating methylation with histone marks (using ChromHMM or Segway)

Epigenetic Clock Models

Epigenetic clocks are mathematical models that predict biological age from DNA methylation patterns. The Horvath clock (2013) uses methylation levels at 353 CpG sites to predict age with remarkable accuracy across tissues. More recent clocks predict not just chronological age but also biological age — deviations from expected methylation patterns correlate with disease risk, mortality, and cellular aging.

Epigenetic clocks are now used as biomarkers for aging, disease, and the effects of interventions in clinical trials.

Multi-Omics Data Integration

Understanding gene regulation increasingly requires integrating data across multiple epigenomic layers simultaneously:

  • DNA methylation (bisulfite-seq)
  • Histone modifications (ChIP-seq)
  • Chromatin accessibility (ATAC-seq)
  • 3D genome organization (Hi-C)
  • Gene expression (RNA-seq)
  • Protein levels (proteomics)

Computational frameworks for multi-omics integration (MOFA+, Seurat WNN, LIGER) combine these layers to reveal regulatory relationships that are invisible from any single data type alone.

Regulation of mRNA Stability and Degradation

After an mRNA has been produced, processed, and exported to the cytoplasm, its half-life determines how long it remains available for translation. mRNA half-lives vary enormously:

mRNA typeHalf-lifeExamples
Unstable15–30 minutesCytokines (TNF, IL-6), proto-oncogenes (MYC, FOS)
Moderate4–8 hoursMost housekeeping genes
Stable>24 hoursGlobin mRNAs, ribosomal protein mRNAs

mRNA stability is controlled by sequence elements in the 3′ UTR:

  • AU-rich elements (AREs) recruit destabilizing factors that promote deadenylation and decay
  • miRNA binding sites target mRNAs for degradation or translational repression
  • Poly(A) tail shortening (deadenylation) is often the rate-limiting step in mRNA decay, followed by decapping and 5′→3′ exonuclease degradation (Xrn1) or 3′→5′ exosome degradation

RNA Interference and MicroRNAs

MicroRNAs (miRNAs) are small (~22-nucleotide) non-coding RNAs that regulate gene expression through the RNA interference (RNAi) pathway:

  1. A primary miRNA transcript (pri-miRNA) is processed in the nucleus by Drosha to a ~70 nt precursor (pre-miRNA)
  2. The pre-miRNA is exported to the cytoplasm and processed by Dicer into a ~22 nt double-stranded RNA
  3. One strand (the guide strand) is loaded into the RISC (RNA-Induced Silencing Complex), which contains an Argonaute protein
  4. The miRNA guides RISC to partially complementary sequences (the seed region, nucleotides 2–7 of the miRNA) in the 3′ UTR of target mRNAs
  5. Depending on complementarity, the target is either cleaved (near-perfect match, common in plants) or translationally repressed and destabilized (partial match, common in animals)

The human genome encodes over 2,000 miRNAs, and each miRNA can target hundreds of different mRNAs. Together, miRNAs regulate an estimated 60% of human protein-coding genes, providing a pervasive layer of post-transcriptional fine-tuning.

Small interfering RNAs (siRNAs) use the same RISC pathway but are derived from exogenous double-stranded RNA (e.g., viral RNA). Synthetic siRNAs are widely used as research tools and are being developed as therapeutics (patisiran, the first FDA-approved RNAi drug, treats hereditary transthyretin amyloidosis).

Translational Regulation

Translation can be regulated rapidly and reversibly:

  • Iron response elements (IREs) in the 5′ UTR of ferritin mRNA are bound by iron-regulatory proteins (IRPs) when iron is low, blocking ribosome scanning and preventing translation; when iron is abundant, IRPs release the IRE and ferritin is translated
  • Upstream open reading frames (uORFs) in the 5′ UTR divert ribosomes away from the main coding sequence, reducing translation efficiency (this mechanism regulates the stress-response transcription factor ATF4)
  • Phosphorylation of eIF2α globally reduces translation initiation during stress (the integrated stress response)
  • mTOR signaling promotes cap-dependent translation by phosphorylating 4E-BP, releasing the cap-binding factor eIF4E

RNA Localization and Storage

In some cell types, mRNAs are actively transported to specific subcellular locations before translation:

  • In developing Drosophila embryos, bicoid mRNA is localized to the anterior pole, where its translation establishes the head-to-tail axis
  • In neurons, specific mRNAs are transported along axons and dendrites for local translation at synapses, enabling rapid, spatially restricted responses to stimulation
  • P-bodies and stress granules are cytoplasmic RNA-protein condensates that serve as sites of mRNA storage, degradation, or translational repression

mRNA localization is mediated by zip codes — sequence elements (usually in the 3′ UTR) recognized by RNA-binding proteins that connect the mRNA to molecular motor proteins for directed transport.

miRNA Target Prediction and Small RNA Analysis

Computational tools predict miRNA-mRNA regulatory relationships:

  • TargetScan — predicts miRNA targets based on seed region complementarity, conservation, and site context (the most widely used target prediction tool)
  • miRDB — uses machine learning to predict functional miRNA targets
  • Small RNA-seq analysis pipelines process sequencing data from small RNA libraries, identifying and quantifying miRNAs, piRNAs, and other small RNA species

CLIP-seq and RNA-Binding Protein Analysis

CLIP-seq (cross-linking and immunoprecipitation followed by sequencing) maps the binding sites of RNA-binding proteins (RBPs) transcriptome-wide:

  • eCLIP (enhanced CLIP) — developed by ENCODE, provides high-resolution maps of RBP-RNA interactions with size-matched input controls for statistical rigor
  • CLIP data reveal which mRNAs are regulated by a given RBP and where in the transcript the protein binds (5′ UTR, coding region, 3′ UTR, introns)

RNA Stability Measurement and Circular RNA

New technologies measure RNA dynamics directly:

  • SLAM-seq (thiol-linked alkylation for the metabolic sequencing of RNA) — incorporates 4-thiouridine into newly transcribed RNA; chemical conversion and sequencing distinguish new from old transcripts, measuring synthesis and degradation rates genome-wide
  • TimeLapse-seq — similar approach using chemical recoding of metabolically labeled RNA

Circular RNAs (circRNAs) are a class of non-coding RNAs formed by back-splicing, where a downstream splice donor joins to an upstream splice acceptor. CircRNAs are resistant to exonuclease degradation (because they lack free ends) and can function as miRNA sponges, sequestering miRNAs and de-repressing their targets. Computational detection of circRNAs from RNA-seq data requires specialized tools that identify back-splice junction reads (e.g., CIRCexplorer, CIRI).

Genome Changes Underlying Evolution

Genomes evolve through several mechanisms that can create new genes and regulatory programs:

Gene duplications are a primary source of new genes. After a gene is duplicated, the two copies are initially redundant. This frees one copy (or both) from selective constraint, allowing it to accumulate mutations that may lead to:

  • Neofunctionalization — one copy acquires a new function
  • Subfunctionalization — the ancestral functions are partitioned between the two copies
  • Pseudogenization — one copy accumulates inactivating mutations and becomes a pseudogene (the most common outcome)

Horizontal gene transfer (HGT) moves genes between unrelated organisms, bypassing normal vertical inheritance. HGT is rampant in prokaryotes (mediated by conjugation, transformation, and transduction) and has also occurred in eukaryotes (e.g., genes from bacterial endosymbionts transferred to the nuclear genome).

Transposable elements are major agents of genome evolution. TE insertions can:

  • Disrupt genes (causing loss-of-function mutations)
  • Donate new regulatory elements (enhancers, promoters, insulators derived from TE sequences)
  • Create new exons (exonization of TE sequences)
  • Drive genome expansion (the ~45% of the human genome derived from TEs)

Genome Evolution Analysis Tools

Computational methods for studying genome evolution include:

  • Synteny and collinearity analysis — compares gene order between species to identify conserved blocks and rearrangements (SynMap, MCScanX)
  • Whole-genome duplication (WGD) detection — identifies ancient polyploidy events from patterns of duplicated gene blocks (e.g., two rounds of WGD in early vertebrate evolution)
  • Gene family expansion and contractionCAFE (Computational Analysis of gene Family Evolution) uses birth-death models to identify gene families that have expanded or contracted along specific branches of a phylogeny
  • Pan-genome analysis — characterizes the total gene content of a species (core genes shared by all individuals + dispensable genes present in some), important for both bacterial and eukaryotic genomics
  • Lateral gene transfer detection — identifies genes in a genome whose phylogenetic history is incongruent with the species tree, suggesting HGT
  • Genome-wide positive selection scans — tools like SweeD and SweepFinder detect signatures of selective sweeps from population genomic data
let dscam_isoforms = [0.15, 0.28, 0.07, 0.32, 0.10, 0.08]
let globin_isoforms = [0.92, 0.05, 0.03]
let dscam_diversity = Stats.shannon(dscam_isoforms)
let globin_diversity = Stats.shannon(globin_isoforms)
print("DSCAM splicing diversity (Shannon): " + dscam_diversity)
print("Globin splicing diversity (Shannon): " + globin_diversity)
print("DSCAM isoform distribution:")
print(Viz.bar(["Ex4a", "Ex4b", "Ex4c", "Ex6a", "Ex6b", "Ex6c"], dscam_isoforms))

Alternative splicing dramatically expands proteome diversity from a limited gene set. The Drosophila DSCAM gene produces over 38,000 isoforms through combinatorial exon choice. Shannon entropy quantifies this diversity: a high value indicates many isoforms at similar frequencies (complex regulation), while a low value indicates dominance of a single isoform (simple regulation). Visualizing isoform frequencies reveals which splice variants predominate in a given tissue or condition.

Exercise: Identify miRNA Seed Region Targets

A miRNA has the seed sequence UAGCAGU (nucleotides 2-8 of miR-199a). Scan a 3’ UTR sequence for the presence of the complementary seed match by counting 7-mers. Determine whether this UTR is a potential target of miR-199a.

let utr = "AUGCUUACUGCUAGGCUAUACUGCUACCGAUU"
let seed = "UAGCAGU"
let seed_match = Seq.complement(seed)
print("miR-199a seed: " + seed)
print("Expected target site: " + seed_match)
print("3' UTR 7-mer composition:")
print(Seq.kmer_count(utr, 7))
// Does this UTR contain the seed match site?
let answer = "Target confirmed"
print(answer)

Exercise: Measure Splicing Diversity Across Tissues

Two tissues express different splicing programs for the same gene. Calculate the Shannon entropy of each tissue’s isoform distribution to determine which tissue has greater splicing diversity. Visualize both distributions.

let brain_isoforms = [0.22, 0.25, 0.18, 0.20, 0.15]
let muscle_isoforms = [0.82, 0.08, 0.05, 0.03, 0.02]
let labels = ["Iso-A", "Iso-B", "Iso-C", "Iso-D", "Iso-E"]
let brain_h = Stats.shannon(brain_isoforms)
let muscle_h = Stats.shannon(muscle_isoforms)
print("Brain splicing entropy: " + brain_h)
print("Muscle splicing entropy: " + muscle_h)
print("Brain isoform distribution:")
print(Viz.bar(labels, brain_isoforms))
print("Muscle isoform distribution:")
print(Viz.bar(labels, muscle_isoforms))
// Which tissue has greater splicing diversity?
let answer = "Brain"
print(answer)

Exercise: Compare Post-Transcriptional Regulation Mechanisms

Three genes are regulated by different post-transcriptional mechanisms. Analyze their 3’ UTR sequences for miRNA seed sites (7-mers) and AU-rich elements (count AU dinucleotides). Determine which gene is most heavily regulated at the post-transcriptional level.

let utr_a = "GCGCCGAUGCGCGGCCGCGAUCGGCUGAU"
let utr_b = "AUGCCUGCACUGUGCCUGCCGCUUAUGCAC"
let utr_c = "AUUUAUUUAUUUACUGCUAAUUUAUUUAUU"
print("Gene A dinucleotides:")
print(Seq.kmer_count(utr_a, 2))
print("Gene B dinucleotides:")
print(Seq.kmer_count(utr_b, 2))
print("Gene C dinucleotides:")
print(Seq.kmer_count(utr_c, 2))
// Which gene has the most AU-rich elements and potential miRNA sites?
let answer = "Gene C"
print(answer)

Knowledge Check

Summary

In this lesson you covered epigenetic, post-transcriptional, and evolutionary controls:

  • DNA methylation at CpG sites is inherited through cell division via DNMT1 maintenance methylation
  • CpG islands mark ~60% of human gene promoters; aberrant methylation silences tumor suppressors in cancer
  • Chromatin states are inherited through histone recycling, read-write mechanisms, and phase separation
  • Epigenetic reprogramming occurs in germ cells and after fertilization; incomplete reprogramming limits cloning efficiency
  • Bisulfite sequencing maps methylation at single-base resolution; DMR detection identifies differentially methylated regions
  • Epigenetic clocks (Horvath clock) predict biological age from methylation patterns
  • Multi-omics integration combines methylation, histone marks, accessibility, 3D structure, and expression data
  • mRNA stability varies from minutes to days, controlled by AREs, miRNAs, and poly(A) tail length
  • MicroRNAs (~2,000 in humans) regulate ~60% of genes via RISC-mediated mRNA degradation or translational repression
  • Translational regulation through IREs, uORFs, eIF2α phosphorylation, and mTOR signaling provides rapid control
  • RNA localization (zip codes, molecular motors) enables spatially restricted translation in neurons and embryos
  • miRNA target prediction (TargetScan, miRDB) and CLIP-seq (eCLIP) map RNA-protein interactions
  • SLAM-seq measures RNA dynamics; circular RNAs resist degradation and can sponge miRNAs
  • Gene duplications create new genes through neofunctionalization, subfunctionalization, or pseudogenization
  • Horizontal gene transfer and transposable elements reshape genomes and donate new regulatory elements
  • CAFE analyzes gene family expansion/contraction; pan-genome analysis characterizes species-level gene content variation
  • Positive selection scans (SweeD, SweepFinder) detect adaptive evolution from population genomic data

References

  1. Alberts B, Johnson A, Lewis J, Morgan D, Raff M, Roberts K, Walter P. Molecular Biology of the Cell, 7th ed. New York: W.W. Norton; 2022. Chapter 7: Control of Gene Expression.
  2. Lee JT, Strauss WM, Dausman JA, Jaenisch R. A 450 kb transgene displays properties of the mammalian X-inactivation center. Cell. 1993;86(1):83–94.
  3. Bartel DP. MicroRNAs: genomics, biogenesis, mechanism, and function. Cell. 2004;116(2):281–297.
  4. Agarwal V, Bell GW, Nam JW, Bartel DP. Predicting effective microRNA target sites in mammalian mRNAs. eLife. 2015;4:e05005.
  5. Kozomara A, Birgaoanu M, Griffiths-Jones S. miRBase: from microRNA sequences to function. Nucleic Acids Res. 2019;47(D1):D155–D162. https://www.mirbase.org/
  6. Horvath S. DNA methylation age of human tissues and cell types. Genome Biol. 2013;14(10):R115.
  7. Licatalosi DD, Darnell RB. RNA processing and its regulation: global insights into biological networks. Nat Rev Genet. 2010;11(1):75–87.
  8. Nielsen R, Bustamante C, Clark AG, et al. A scan for positively selected genes in the genomes of humans and chimpanzees. PLoS Biol. 2005;3(6):e170.

Powered by

cyanea-seq cyanea-stats
epigenetics DNA methylation CpG islands chromatin inheritance reprogramming miRNA RNA interference RNAi mRNA stability translational regulation RNA localization gene duplication horizontal gene transfer transposable elements genome evolution bisulfite-seq DMR epigenetic clock TargetScan CLIP-seq SLAM-seq circular RNA CAFE pan-genome positive selection