DNA Repair and Recombination
Learn how cells detect and fix DNA damage through multiple repair pathways, how homologous recombination shuffles genetic information, how transposable elements reshape genomes, and the computational methods for analyzing these processes.
Introduction
DNA is under constant assault. Every human cell experiences tens of thousands of DNA lesions per day from spontaneous chemical damage, reactive oxygen species, and environmental agents. Without efficient repair, genetic information would degrade rapidly. Cells have evolved a sophisticated arsenal of repair pathways, each specialized for different types of damage. Beyond repair, recombination provides a mechanism for shuffling genetic information during meiosis and for rescuing stalled replication forks. And transposable elements — genomic parasites that copy and move within genomes — have been a major force in shaping genome architecture over evolutionary time.
This lesson covers DNA repair pathways, homologous recombination, transposition, and the computational tools used to analyze DNA damage, mutational signatures, and structural variation.
DNA Damage Is Constant and Diverse
Without DNA repair, spontaneous damage would rapidly change DNA sequences. Each human cell experiences roughly 10,000–100,000 lesions per day from multiple sources:
- Depurination (~5,000/day) — loss of a purine base, leaving an abasic site
- Deamination (~100–500/day) — cytosine deaminates to uracil (mispairs with A); 5-methylcytosine deaminates to thymine (a particularly insidious mutation because T is a normal base)
- Oxidative damage — reactive oxygen species from metabolism modify bases (8-oxoguanine is the most common, mispairs with A)
- UV radiation — creates pyrimidine dimers (covalent links between adjacent thymines or cytosines)
- Ionizing radiation and chemicals — cause double-strand breaks, base modifications, and cross-links
The chemistry of DNA bases facilitates damage detection: most forms of damage distort the double helix or create abnormal base pairs, providing physical signals that repair enzymes can recognize. The double-stranded nature of DNA is key — the undamaged strand provides the template for accurate repair.
DNA Can Be Removed by More Than One Pathway
Cells use multiple, partially overlapping repair pathways to handle different types of damage:
Base Excision Repair (BER)
BER handles small, non-helix-distorting base modifications (oxidation, deamination, alkylation). A DNA glycosylase specific for the damaged base flips it out of the helix and cleaves the base-sugar bond, creating an abasic (AP) site. An AP endonuclease nicks the backbone, and DNA polymerase fills the gap using the opposite strand as template. DNA ligase seals the nick.
Each of the ~10 known glycosylases recognizes a specific type of damage, ensuring broad coverage against diverse lesion types.
Nucleotide Excision Repair (NER)
NER handles bulky, helix-distorting lesions like UV-induced thymine dimers and chemical adducts. The damage is recognized by its distortion of the helix, two incisions are made flanking the lesion (one on each side), and a ~24–32 nucleotide patch is excised and resynthesized.
NER is coupled to transcription: the RNA polymerase stalls at a lesion on the transcribed strand, triggering transcription-coupled NER (TC-NER), which repairs the template strand of active genes faster than the rest of the genome. This ensures that the cell’s most important DNA — the genes being actively expressed — is repaired first.
Defects in NER cause xeroderma pigmentosum (XP) — extreme sensitivity to sunlight and a 1,000-fold increase in skin cancer risk.
Mismatch Repair (MMR)
MMR corrects replication errors that escape the polymerase’s proofreading (discussed in the previous lesson). Loss of MMR causes Lynch syndrome and a mutator phenotype with ~100–1,000-fold elevated mutation rates.
Special Translesion Polymerases
When damage is so severe that the replicative polymerase cannot proceed, cells deploy translesion DNA polymerases — specialized enzymes that can synthesize past damaged bases. These polymerases (Pol η, Pol ι, Pol κ, Pol ζ) have relaxed active sites that accommodate damaged templates but are inherently error-prone. They are used only in emergencies, as a last resort to prevent lethal replication fork collapse.
Pol η (eta) is particularly important: it accurately copies past thymine dimers by inserting the correct AA across from TT dimers. Humans with defective Pol η develop a variant form of xeroderma pigmentosum (XP-V).
Double-Strand Break Repair
Double-strand breaks (DSBs) are the most dangerous form of DNA damage — both strands of the helix are severed, and without accurate repair, chromosomes can be lost or rearranged. DSBs are efficiently repaired by two major pathways:
Homologous Recombination (HR)
HR uses an intact homologous sequence (usually the sister chromatid in S/G2 phase) as a template for accurate, error-free repair. The broken ends are resected to generate 3′ single-stranded tails, which are coated by RPA (single-strand binding protein) and then loaded with RecA (in bacteria) or Rad51 (in eukaryotes). The Rad51/RecA filament searches for homologous sequences and catalyzes strand invasion — the damaged strand physically invades the intact duplex and base-pairs with the complementary strand. DNA synthesis using the intact strand as template, followed by resolution of the recombination intermediates, restores the original sequence.
HR can also rescue broken replication forks: when a fork encounters a nick or a lesion, the fork can collapse into a DSB, which HR repairs by rebuilding the fork.
Cells carefully regulate the choice between HR and NHEJ. HR is favored in S and G2 phases (when a sister chromatid is available), while NHEJ predominates in G1. The key regulatory step is end resection: the protein 53BP1 inhibits resection (favoring NHEJ), while BRCA1 promotes resection (favoring HR). Loss of BRCA1 or BRCA2 — both essential for HR — dramatically increases cancer risk (breast, ovarian).
Non-Homologous End Joining (NHEJ)
NHEJ directly ligates the broken ends without a homologous template. The Ku70/Ku80 heterodimer binds to the broken ends, recruits DNA-PKcs (DNA-dependent protein kinase catalytic subunit), and the ends are processed and ligated by DNA ligase IV. NHEJ is fast and available throughout the cell cycle but may delete or insert a few nucleotides at the junction, making it error-prone.
NHEJ is the predominant DSB repair pathway in mammalian cells and is also the mechanism exploited by CRISPR-Cas9 gene editing: when Cas9 creates a DSB at a targeted site, NHEJ repair introduces small insertions or deletions (indels) that disrupt the target gene.
DNA Damage Delays the Cell Cycle
When DNA damage is detected, cells activate the DNA damage response (DDR) — a signaling network centered on the kinases ATM (activated by DSBs) and ATR (activated by single-stranded DNA at stalled forks). The DDR:
- Halts the cell cycle at checkpoints (G1/S, intra-S, G2/M) to allow time for repair
- Activates the appropriate repair pathways
- If damage is irreparable, triggers apoptosis (programmed cell death) or senescence (permanent cell cycle arrest)
The tumor suppressor p53 is a central effector of the DDR. It is stabilized by phosphorylation from ATM/Chk2 and activates genes for cell cycle arrest (p21), DNA repair, and apoptosis (Bax, PUMA). Loss of p53 function — the most common genetic alteration in human cancer — disables this checkpoint, allowing damaged cells to continue dividing and accumulating mutations.
Homologous Recombination in Meiosis
Beyond repair, HR plays an essential role in meiosis. During meiosis I, the enzyme Spo11 introduces programmed DSBs across the genome. These breaks are repaired by recombination with the homologous chromosome (not the sister chromatid), generating crossovers that:
- Physically link homologous chromosomes (as chiasmata) for proper segregation at meiosis I
- Shuffle genetic information between maternal and paternal chromosomes
- Create new allele combinations in gametes, increasing genetic diversity
Recombination hotspots — genomic regions with elevated crossover rates — are not randomly distributed. In humans, most hotspots are determined by the binding of the zinc-finger protein PRDM9, which recognizes specific DNA motifs and recruits the recombination machinery.
Transposable Elements
Transposable elements (TEs) are DNA sequences that can move from one genomic location to another, acting as molecular parasites of the genome. They use two major mechanisms:
DNA transposons (cut-and-paste): the transposase enzyme excises the element from its original position and inserts it elsewhere. Most DNA transposons in the human genome are now inactive fossils.
Retrotransposons (copy-and-paste): the element is transcribed to RNA, reverse-transcribed to DNA, and inserted at a new site, increasing copy number. There are two subtypes:
- LTR retrotransposons — resemble retroviruses (indeed, retroviruses may have evolved from them). Include human endogenous retroviruses (HERVs).
- Non-LTR retrotransposons — include LINEs (Long Interspersed Nuclear Elements, especially LINE-1/L1) and SINEs (Short Interspersed Nuclear Elements, especially Alu). These are by far the most abundant TEs in the human genome: L1 elements alone comprise ~17% of the genome, and Alu elements number over 1 million copies.
A large fraction of the human genome is composed of nonretroviral retrotransposons — L1 and Alu elements together account for roughly one-third of the genome. Some L1 elements remain active and continue to transpose, occasionally causing genetic disease by inserting into genes.
Different transposable elements predominate in different organisms: while L1 and Alu dominate in humans, DNA transposons are more active in many plants and insects, and different TE families characterize the genomes of maize, Drosophila, zebrafish, and other model organisms.
Genome sequences reveal the approximate times at which TEs moved: because TE copies accumulate mutations after insertion, the degree of divergence from the consensus TE sequence serves as a molecular clock, estimating when each insertion occurred.
Conservative site-specific recombination uses specialized enzymes (recombinases like Cre, Flp, and phage integrases) to catalyze recombination between specific short DNA sequences. Unlike homologous recombination, it does not require extensive sequence similarity. This mechanism is used by bacteriophages to integrate into and excise from host chromosomes, and it has been harnessed as a powerful tool in genetic engineering (Cre-lox and Flp-FRT systems).
Despite their parasitic nature, TEs have been a creative evolutionary force: TE insertions can donate new regulatory elements (enhancers, promoters), contribute new exons through exonization, and mediate chromosomal rearrangements through recombination between dispersed copies.
Computational Analysis of DNA Damage and Repair
Damage-seq and repair-seq map the genomic locations of specific types of DNA damage and repair events at nucleotide resolution, revealing how damage distribution and repair efficiency vary across the genome.
Mutational signature analysis decomposes the spectrum of somatic mutations in a cancer genome into component signatures, each linked to a specific mutagenic process. The COSMIC database defines ~100 single-base substitution (SBS) signatures, double-base substitution (DBS) signatures, and small insertion/deletion (ID) signatures. Tumors with defective HR (BRCA1/BRCA2 loss) show a distinctive signature (SBS3) that predicts sensitivity to PARP inhibitors — an example of computational genomics directly guiding clinical treatment.
Recombination, Structural Variants, and Copy Number Variation
Errors in recombination and replication can produce structural variants — large-scale changes in genome architecture:
- Deletions and duplications from unequal crossing over between misaligned repeats
- Inversions from recombination between inverted repeats
- Translocations from recombination between repeats on different chromosomes
Structural variant calling from sequencing data uses tools like Manta, DELLY, and LUMPY, which detect SVs from discordant read pairs, split reads, and read-depth changes.
Copy number variation (CNV) analysis identifies regions where the number of copies of a DNA segment varies between individuals. CNVs range from kilobases to megabases and can encompass entire genes, contributing to phenotypic variation and disease susceptibility. CNV detection uses read-depth analysis, comparative genomic hybridization (array CGH), or SNP arrays.
Recombination hotspot detection from population data uses patterns of linkage disequilibrium (LD) — the non-random association of alleles at nearby loci. Regions where LD breaks down sharply correspond to recombination hotspots. Tools like LDhat and LDhelmet estimate fine-scale recombination rate maps from population sequencing data.
TE identification and classification uses RepeatMasker (with Repbase/Dfam libraries) to annotate TE-derived sequences. TE age estimation compares each copy to the family consensus, with more divergent copies representing older insertions. TE insertion polymorphism detection from long-read sequencing reveals recent and active transposition events. Mobilome analysis characterizes the full complement of active mobile elements in a genome.
Counting DNA Damage with Hamming Distance
We can quantify the extent of DNA damage by comparing an undamaged reference sequence to a damaged copy. Each position where the sequences differ represents a lesion or mutation.
let undamaged = "ATGGCTAGCAAAGACTTCACCGAG"
let uv_damaged = "ATGGCTAGTAAAGACTTCACTGAG"
let oxidative = "ATGGCTAGCAAAGATTTCACCGAG"
print("UV damage (C→T at pyrimidine dimers):")
print(" Mutations: " + Seq.hamming(undamaged, uv_damaged))
print("Oxidative damage (8-oxoG mispairing):")
print(" Mutations: " + Seq.hamming(undamaged, oxidative))
Finding Recombination Breakpoints by Local Alignment
During homologous recombination, strand exchange occurs at regions of sequence similarity. Local alignment can identify the segment where two divergent sequences share a region of homology — the likely recombination breakpoint.
let allele_a = "CCTTGAGGATGGCTAGCAAAGAC"
let allele_b = "TTTACAATGGCTAGCGGGACCTT"
let result = Align.local(allele_a, allele_b)
print("Local alignment — shared homology region:")
print(result.alignment)
print("Score: " + result.score)
The high-scoring local alignment identifies the shared core sequence where strand invasion and crossover could occur.
Comparing Repair Pathway Efficiency
Different repair pathways handle different damage loads. We can visualize their relative contributions to genome maintenance.
let pathways = ["BER", "NER", "MMR", "HR", "NHEJ"]
let lesions_per_day = [10000, 2000, 500, 50, 50]
print("Estimated lesions repaired per cell per day:")
print(Viz.bar(pathways, lesions_per_day))
BER handles the largest volume of damage (depurination, oxidation), while HR and NHEJ repair fewer but more dangerous double-strand breaks.
Alignment of an original sequence with a version containing a transposon-like insertion reveals the gap introduced by the TE — a simple illustration of how structural variants are detected by alignment-based methods.
Exercise: Count Mutations from DNA Damage
A cell’s DNA has been exposed to a mutagen. Use Seq.hamming() to count the number of mutations introduced by the damage. Compare the undamaged reference to three damaged samples from different mutagens and print the total number of mutations across all three.
let reference = "ATGGCTAGCAAAGACTTCACCGAG"
let uv_damage = "ATGGCTAGTAAAGACTTCACTGAG"
let oxidative = "ATGGCTAGCAAAGAATTCACCGAG"
let alkylation = "ATGGCTAGCAAAGACTTCACCGGG"
let uv_count = Seq.hamming(reference, uv_damage)
let ox_count = Seq.hamming(reference, oxidative)
let alk_count = Seq.hamming(reference, alkylation)
print("UV mutations: " + uv_count)
print("Oxidative mutations: " + ox_count)
print("Alkylation mutations: " + alk_count)
let total = uv_count + ox_count + alk_count
print("Total mutations: " + total)
print(total)
Exercise: Find a Recombination Breakpoint
Two alleles share a region of homology where recombination can occur. Use Align.local() to find the shared core sequence (the breakpoint region) and report the alignment score.
let allele_1 = "CCTTGAGGATGGCTAGCAAAGAC"
let allele_2 = "TTTACAATGGCTAGCGGGACCTT"
let result = Align.local(allele_1, allele_2)
print("Recombination breakpoint region:")
print(result.alignment)
print("Score: " + result.score)
let core = "ATGGCTAGC"
print(core)
Exercise: Compare Repair Pathway Capacity
Different repair pathways handle vastly different numbers of lesions per day. Use Viz.bar() to visualize the repair capacity of four pathways and identify which pathway handles the most damage.
let pathways = ["BER", "NER", "MMR", "HR/NHEJ"]
let capacity = [10000, 2000, 500, 100]
print("Lesions repaired per cell per day:")
print(Viz.bar(pathways, capacity))
let busiest = "BER"
print(busiest)
Knowledge Check
Summary
In this lesson you covered DNA repair, recombination, and transposition:
- 10,000–100,000 DNA lesions per day from spontaneous damage, UV, oxidation, and chemicals
- Multiple repair pathways: BER (small base damage), NER (bulky lesions), MMR (replication errors), HR (accurate DSB repair), NHEJ (fast but error-prone DSB repair)
- Transcription-coupled NER prioritizes repair of actively transcribed genes
- Translesion polymerases bypass damage in emergencies but are error-prone
- HR uses RecA/Rad51 for strand invasion; regulated by 53BP1/BRCA1 balance; can rescue stalled forks
- NHEJ uses Ku70/Ku80 and DNA-PKcs; exploited by CRISPR-Cas9 for gene editing
- The DNA damage response (ATM/ATR → p53) halts the cell cycle, activates repair, or triggers apoptosis
- Meiotic recombination creates crossovers at hotspots determined by PRDM9, generating genetic diversity
- Transposable elements — DNA transposons (cut-and-paste) and retrotransposons (copy-and-paste) — comprise ~45% of the human genome
- Different TE families predominate in different organisms; TE age is estimated from sequence divergence
- Site-specific recombination (Cre-lox, Flp-FRT) enables precise genetic engineering
- Mutational signature analysis links mutation patterns to mutagenic processes and predicts drug sensitivity
- Structural variant callers (Manta, DELLY, LUMPY) detect deletions, duplications, inversions, and translocations
- CNV analysis and LD-based recombination mapping complete the toolkit for analyzing genome dynamics
References
- Alberts B, Johnson A, Lewis J, Morgan D, Raff M, Roberts K, Walter P. Molecular Biology of the Cell, 7th ed. New York: W.W. Norton; 2022. Chapter 5: DNA Replication, Repair, and Recombination.
- Lindahl T. Instability and decay of the primary structure of DNA. Nature. 1993;362(6422):709–715.
- Modrich P. Mechanisms in E. coli and human mismatch repair (Nobel Lecture). Angew Chem Int Ed. 2016;55(30):8490–8501.
- Sancar A. Mechanisms of DNA repair by photolyase and excision nuclease (Nobel Lecture). Angew Chem Int Ed. 2016;55(30):8502–8527.
- Jinek M, Chylinski K, Fonfara I, Hauer M, Doudna JA, Charpentier E. A programmable dual-RNA-guided DNA endonuclease in adaptive bacterial immunity. Science. 2012;337(6096):816–821.
- The 1000 Genomes Project Consortium. A global reference for human genetic variation. Nature. 2015;526(7571):68–74.
- McKenna A, Hanna M, Banks E, et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 2010;20(9):1297–1303.
- Alexandrov LB, Nik-Zainal S, Wedge DC, et al. Signatures of mutational processes in human cancer. Nature. 2013;500(7463):415–421.