Skip to main content
Alpha Cyanea is in public alpha. We're building in the open — expect rough edges and rapid iteration. See what's live

Pathogens and Infection

Intermediate Cell Biology ~40 min

Explore how pathogens infect cells, how the microbiome shapes health, and the computational tools for metagenomics, pathogen genomics, and viral surveillance.

Introduction

Humans exist in a world teeming with microorganisms. Every surface of the body — skin, gut, lungs, mouth — is colonized by a complex community of bacteria, archaea, fungi, and viruses that together constitute the human microbiome. Most of these microbes are harmless or beneficial, but a small fraction have evolved the molecular weaponry to breach host defenses, commandeer cellular machinery, and cause disease. These are the pathogens: bacteria, viruses, fungi, parasites, and even misfolded proteins that exploit the very cell biology we have studied throughout this course.

Understanding infection requires thinking at the molecular level. How does a bacterium adhere to an intestinal epithelial cell? How does a virus hijack the ribosome? How does an intracellular pathogen hide from the immune system inside the very cell tasked with destroying it? The answers to these questions draw on nearly every topic in cell biology — membrane receptors, endocytosis, cytoskeletal dynamics, gene expression, and protein trafficking.

This lesson explores the biology of pathogens and infection across three major themes. First, we examine the human microbiome and the computational tools used to study it. Second, we investigate the cell biology of infection — the molecular strategies pathogens use to enter cells, survive inside them, and spread. Third, we focus on viruses in detail, covering their genome diversity, replication strategies, and the bioinformatics platforms that track viral evolution in real time.

23.1 — Introduction to Pathogens and the Human Microbiome

The Human Body Hosts a Vast Community of Microorganisms

The human body harbors roughly 38 trillion bacterial cells — approximately equal to the number of human cells. The vast majority reside in the large intestine, where a dense microbial ecosystem ferments dietary fiber, synthesizes vitamins (K, B12, folate), trains the immune system, and protects against pathogenic colonization through competitive exclusion. This community, collectively termed the microbiome, varies dramatically by body site:

Body siteDominant bacterial phylaRelative diversityKey functions
Large intestineFirmicutes, BacteroidetesVery high (~1,000 species)Fiber fermentation, vitamin synthesis, immune training
SkinActinobacteria, Firmicutes, ProteobacteriaHighBarrier defense, antimicrobial lipid production
Oral cavityFirmicutes, Bacteroidetes, ProteobacteriaHighBiofilm formation, nitrate reduction
Vaginal tractLactobacillus (Firmicutes)LowAcid production inhibits pathogens
LungLow biomass; Firmicutes, BacteroidetesLowMucosal immune priming

The composition of the microbiome is shaped by diet, geography, age, antibiotic use, and mode of birth (vaginal vs. cesarean). Disruption of this community — dysbiosis — is associated with conditions ranging from inflammatory bowel disease and obesity to neuropsychiatric disorders, underscoring the intimate relationship between microbial and human health.

Pathogens Interact with the Host in Diverse Ways

Pathogens are organisms (or agents) that cause disease. They include bacteria (Mycobacterium tuberculosis, Salmonella, Vibrio cholerae), viruses (influenza, HIV, SARS-CoV-2), fungi (Candida, Aspergillus), protozoan parasites (Plasmodium, Trypanosoma), helminths (tapeworms, roundworms), and even misfolded proteins (prions). The relationship between microbe and host spans a continuum from mutualism (both benefit) through commensalism (microbe benefits, host unharmed) to parasitism (microbe benefits, host harmed). Many organisms that are normally commensal can become opportunistic pathogens when the host is immunocompromised — Candida albicans, for example, is a harmless resident of the gut and skin that causes life-threatening systemic infections in patients undergoing chemotherapy or organ transplantation.

Pathogens Exploit Host Cell Biology at Every Stage of Infection

The infection cycle can be decomposed into discrete stages, each requiring specific molecular interactions with host cells:

  1. Attachment — surface molecules called adhesins bind host cell receptors (e.g., Helicobacter pylori uses BabA to bind the Lewis B antigen on gastric epithelial cells)
  2. Entry — some pathogens actively induce their own uptake into host cells
  3. Immune evasion — strategies to avoid or suppress the immune response
  4. Nutrient acquisition — scavenging iron, amino acids, and carbon sources from the host
  5. Replication — exploiting host resources for growth and reproduction
  6. Spread — exiting the cell and disseminating to new cells or new hosts

At every stage, pathogens co-opt normal cellular processes — receptor signaling, membrane trafficking, cytoskeletal remodeling, and transcription — turning the cell’s own machinery against it.

Pathogens Evolve Rapidly

Pathogens evolve far faster than their hosts. Bacteria can divide every 20 minutes, and RNA viruses such as HIV produce roughly one mutation per genome per replication cycle. This rapid evolution enables pathogens to evade immune responses, develop drug resistance, jump between host species, and adapt to new ecological niches. Horizontal gene transfer — the movement of genes between organisms via plasmids, transposons, and bacteriophages — further accelerates bacterial evolution, allowing resistance genes and virulence factors to spread through microbial populations in a matter of days.

Bioinformatics: Metagenomics and Microbiome Analysis

Studying the microbiome requires culture-independent approaches, because the majority of microbial species cannot be grown in the laboratory. Two sequencing strategies dominate:

16S rRNA gene sequencing targets the 16S ribosomal RNA gene, which is present in all bacteria and archaea and contains both conserved regions (for primer binding) and variable regions (for taxonomic classification). Amplicon sequences are processed by pipelines such as QIIME2, DADA2, and mothur, which perform quality filtering, denoising (inferring exact amplicon sequence variants, or ASVs), taxonomic assignment, and diversity analysis.

Shotgun metagenomics sequences all DNA in a sample, providing not only taxonomic profiles but also functional gene content. Tools such as MetaPhlAn classify reads by mapping to clade-specific marker genes, HUMAnN reconstructs metabolic pathway abundances, and Kraken2 performs rapid k-mer-based taxonomic classification against reference databases.

Microbiome studies rely heavily on diversity metrics. Alpha diversity measures the richness and evenness within a single sample (Shannon index, observed species count), while beta diversity quantifies how community composition differs between samples (Bray-Curtis dissimilarity, UniFrac distance). Ordination methods such as PCoA (principal coordinates analysis) visualize beta diversity patterns across experimental groups.

Beyond community profiling, metagenome-assembled genome (MAG) reconstruction bins metagenomic contigs into draft genomes of individual organisms, enabling the study of uncultured species at the genome level. Microbiome association studies use tools like MaAsLin2 and LEfSe to identify taxa or functional pathways differentially abundant between disease and control groups. Major microbiome databases include the Human Microbiome Project (HMP), the Earth Microbiome Project (EMP), and MGnify (formerly EBI Metagenomics).

Viral metagenomics (viromics) extends these approaches to the viral component of the microbiome, revealing that the human gut alone harbors an estimated 1012 bacteriophages that profoundly influence bacterial community dynamics through predation and horizontal gene transfer.

23.2 — Cell Biology of Infection

Pathogens Have Evolved Specific Mechanisms for Interacting with Their Host

The molecular details of host-pathogen interaction are remarkably sophisticated. Pathogenic bacteria often encode specialized secretion systems — molecular syringes that inject virulence proteins (called effectors) directly into host cells. The type III secretion system (T3SS), found in Salmonella, Shigella, Yersinia, and many other Gram-negative bacteria, resembles a molecular needle and syringe: it assembles a hollow pilus that punctures the host cell membrane and delivers effector proteins that manipulate host signaling, cytoskeletal dynamics, and membrane trafficking.

Secretion systemStructureExample pathogenFunction
Type III (T3SS)Needle/syringe complexSalmonella, ShigellaInject effectors to trigger invasion, block immune signaling
Type IV (T4SS)Pilus-like conjugation apparatusHelicobacter pylori, LegionellaInject effectors; also transfers DNA
Type VI (T6SS)Contractile phage-tail-likeVibrio cholerae, PseudomonasKill competing bacteria; inject toxins into host cells

Pathogens Exploit Host Cell Machinery for Cell Entry

Pathogens that replicate inside cells must first cross the plasma membrane. Two major entry strategies exist:

Trigger mechanism: The pathogen injects effectors (via T3SS) that activate host Rho-family GTPases (Rac1, Cdc42), triggering massive actin polymerization and membrane ruffling. The resulting macropinocytosis engulfs the bacterium. Salmonella enterica is the classic example — its SopE effector acts as a guanine nucleotide exchange factor (GEF) for Rac1 and Cdc42.

Zipper mechanism: Surface proteins on the pathogen bind host cell adhesion receptors (integrins, cadherins), activating local actin assembly that gradually wraps the membrane around the bacterium. Listeria monocytogenes uses its surface protein internalin A (InlA) to bind E-cadherin on intestinal epithelial cells, triggering a tight, zipper-like phagocytic cup.

Viruses use analogous strategies: they bind specific host surface receptors and enter by receptor-mediated endocytosis, membrane fusion, or macropinocytosis, as detailed in section 23.3.

Intracellular Pathogens Have Mechanisms for Surviving Inside Host Cells

Once inside a host cell, pathogens face a lethal threat: the phagolysosome, an acidic compartment armed with hydrolytic enzymes and reactive oxygen species. Intracellular pathogens have evolved three distinct survival strategies:

  1. Escape from the phagosomeListeria monocytogenes secretes listeriolysin O (LLO), a pore-forming toxin activated by the low pH of the phagosome, which ruptures the phagosomal membrane and releases the bacterium into the cytoplasm
  2. Prevent phagosome-lysosome fusionMycobacterium tuberculosis arrests phagosome maturation by secreting effectors that block Rab conversion (Rab5 → Rab7), preventing recruitment of lysosomal markers and acid hydrolases
  3. Create a specialized replication vacuoleLegionella pneumophila uses its T4SS to inject hundreds of effectors that redirect ER-derived vesicles to its vacuole, creating a ribosome-studded compartment that mimics the ER and supports bacterial replication

Viruses Exploit Host Cell Machinery for Their Replication

Unlike bacteria, viruses are obligate intracellular parasites with no metabolic machinery of their own. They depend entirely on the host cell for energy production, amino acid synthesis, and (in most cases) the ribosomes needed for protein synthesis. A typical viral replication cycle proceeds as follows:

  1. Attachment and entry — binding to a host receptor and internalization
  2. Uncoating — release of the viral genome from the capsid
  3. Genome replication — copying the viral nucleic acid (using viral or host polymerases)
  4. Gene expression — transcription and translation of viral proteins
  5. Assembly — packaging of genomes into new viral particles
  6. Release — exit from the cell by budding (enveloped viruses) or lysis (non-enveloped viruses)

Many viruses actively suppress host gene expression to commandeer the translational machinery. Poliovirus, for example, cleaves the host translation initiation factor eIF4G, shutting down cap-dependent host mRNA translation while the virus’s own mRNA is translated via an internal ribosome entry site (IRES).

Viruses and Bacteria Use the Host Cytoskeleton for Intracellular Movement

Once inside the cytoplasm, several pathogens exploit the host actin cytoskeleton for propulsion. Listeria monocytogenes expresses the surface protein ActA, which recruits the host Arp2/3 complex to nucleate actin polymerization on one pole of the bacterium. The resulting “actin comet tail” propels Listeria through the cytoplasm at speeds of up to 1.5 μm per second, pushing it into membrane protrusions that are engulfed by neighboring cells — allowing cell-to-cell spread without ever entering the extracellular space. Shigella uses a similar mechanism via its IcsA protein. Vaccinia virus also harnesses actin-based motility for cell-to-cell spread, demonstrating convergent evolution of this strategy across kingdoms.

Viral Pathogens Can Persist in the Host as Proviruses

Some viruses integrate their genome into the host cell’s chromosomal DNA, establishing a latent infection called a provirus. The retroviruses (including HIV) do this as an obligatory part of their replication cycle: the enzyme reverse transcriptase converts the viral RNA genome into double-stranded DNA, which is then integrated into the host genome by the viral integrase. Once integrated, the proviral DNA is replicated along with the host chromosome and can persist silently for years, evading immune detection.

Herpesviruses achieve latency without integration: they maintain their genomes as circular episomes in the nucleus, expressing only a minimal set of latency genes that prevent immune recognition. Periodically, the virus reactivates from latency and enters a productive lytic cycle, producing new viral particles and causing recurrent disease (e.g., cold sores from HSV-1, shingles from varicella-zoster virus).

Let’s examine a fragment of the HIV reverse transcriptase gene and translate it to see the protein sequence:

let labels = '["Wuhan-1", "Alpha", "Delta", "Omicron-BA1", "Omicron-BA5"]'
let distances = '[0, 12, 25, 45, 50, 12, 0, 18, 40, 45, 25, 18, 0, 35, 38, 45, 40, 35, 0, 8, 50, 45, 38, 8, 0]'
let tree = Phylo.nj(labels, distances)
print("SARS-CoV-2 variant phylogeny:")
print(tree)

Reverse transcriptase is error-prone — it lacks proofreading activity, introducing roughly one mutation per 10,000 nucleotides copied. This high error rate generates the enormous genetic diversity that allows HIV to evade immune responses and develop drug resistance.

Prions Are Infectious Proteins

Prions represent the most unusual class of infectious agents. They contain no nucleic acid — they are misfolded forms of a normal host protein called PrPC (cellular prion protein). The misfolded form, PrPSc (scrapie prion protein), acts as a template that converts normal PrPC molecules into the pathological conformation. PrPSc is rich in β-sheet structure (compared to the predominantly α-helical PrPC) and aggregates into insoluble amyloid fibrils that are resistant to proteases, heat, and standard sterilization methods.

Prion diseases — including Creutzfeldt-Jakob disease (CJD) in humans, bovine spongiform encephalopathy (BSE, or “mad cow disease”) in cattle, and scrapie in sheep — are invariably fatal neurodegenerative disorders. They can be sporadic (spontaneous misfolding), inherited (mutations in the PRNP gene that destabilize PrPC), or acquired by infection (transmission of PrPSc).

Bioinformatics: Pathogen Genomics

Whole-genome sequencing has transformed our ability to identify, track, and characterize pathogens.

Pathogen genome assembly and annotation use short-read (Illumina) or long-read (Oxford Nanopore, PacBio) platforms to reconstruct complete genomes from clinical isolates. Automated annotation pipelines (Prokka, PGAP) identify open reading frames, rRNAs, tRNAs, and functional elements.

Antimicrobial resistance (AMR) gene detection is critical for guiding treatment. Tools such as AMRFinderPlus (NCBI), the Comprehensive Antibiotic Resistance Database (CARD), and ResFinder search pathogen genomes against curated databases of known resistance genes, mutations, and regulatory elements. These tools detect resistance determinants such as β-lactamases (blaCTX-M, blaNDM), methicillin-resistance genes (mecA), and fluoroquinolone-resistance mutations in DNA gyrase.

Virulence factor identification uses databases like the Virulence Factor Database (VFDB) to catalog genes encoding toxins, adhesins, secretion systems, iron acquisition systems, and immune evasion factors.

Pathogen phylogenomics and molecular epidemiology reconstruct the evolutionary relationships among isolates from an outbreak, enabling epidemiologists to determine whether cases are linked by transmission. Transmission chain reconstruction from whole-genome sequencing data compares single nucleotide polymorphisms (SNPs) between isolates: closely related genomes (differing by only a few SNPs) are likely connected by recent transmission events, while more divergent genomes represent independent introductions.

Viral quasispecies analysis and variant calling recognizes that viral populations within a single host are not homogeneous but rather exist as clouds of closely related variants (quasispecies). Specialized variant callers (LoFreq, iVar) detect low-frequency mutations that may represent emerging drug resistance or immune escape variants.

Let’s compare two isolates from a hypothetical outbreak to see how genomic surveillance detects transmission-linked cases:

let sim = PopGen.wright_fisher(500, 0.01, 100, 42)
print("Antibiotic resistance allele frequency over 100 generations:")
print(sim)

A single SNP difference between two isolates is consistent with direct transmission or a very recent common ancestor. In a real outbreak investigation, epidemiologists combine this genomic distance with contact tracing data and timing to reconstruct the likely chain of transmission.

23.3 — Molecular Biology of Viral Infections

Viral Genome Diversity

Viruses display extraordinary diversity in the nature of their genetic material. The Baltimore classification organizes viruses into seven groups based on their genome type and replication strategy:

Baltimore classGenome typeExamplesReplication strategy
IdsDNAHerpesvirus, adenovirus, poxvirusHost or viral DNA polymerase
IIssDNAParvovirus, AAVConverted to dsDNA, then replicated
IIIdsRNARotavirus, reovirusViral RNA-dependent RNA polymerase (RdRp)
IV+ssRNASARS-CoV-2, poliovirus, ZikaDirectly translated; replicated by viral RdRp
V−ssRNAInfluenza, Ebola, rabiesTranscribed to +sense by viral RdRp before translation
VI+ssRNA-RTHIV, HTLVReverse-transcribed to dsDNA; integrated into host genome
VIIdsDNA-RTHepatitis BReplicates via an RNA intermediate and reverse transcriptase

This classification highlights a striking fact: while all cellular life uses dsDNA as its genetic material, viruses have explored virtually every possible nucleic acid configuration. RNA viruses, in particular, pose special challenges because RNA-dependent RNA polymerases lack proofreading, leading to mutation rates 100–1,000 times higher than those of DNA-based organisms.

Viral Entry and Uncoating

Viral infection begins with attachment to a specific host cell receptor. The identity of this receptor determines viral tropism — which cell types and host species the virus can infect:

  • HIV binds CD4 and a coreceptor (CCR5 or CXCR4) on T helper cells and macrophages
  • SARS-CoV-2 binds ACE2 (angiotensin-converting enzyme 2) on respiratory epithelial cells
  • Influenza binds sialic acid residues on upper airway epithelial cells
  • Epstein-Barr virus binds complement receptor CR2 (CD21) on B cells

After attachment, viruses enter cells by several mechanisms: receptor-mediated endocytosis (influenza, adenovirus), direct membrane fusion at the plasma membrane (HIV), or macropinocytosis (Ebola). Once inside, uncoating — the release of the viral genome from its protein capsid — is triggered by the low pH of the endosome (for viruses entering by endocytosis) or by conformational changes induced by receptor binding.

Viral Genome Replication Strategies

Each Baltimore class uses a distinct replication strategy:

DNA viruses (classes I and II) generally replicate in the nucleus, where they can access host DNA replication machinery. Large dsDNA viruses (herpesviruses, poxviruses) encode their own DNA polymerases and many accessory factors, making them relatively independent of host replication machinery. Poxviruses are exceptional in replicating entirely in the cytoplasm.

Positive-sense RNA viruses (class IV, e.g., SARS-CoV-2) carry genomes that can be directly translated by host ribosomes upon entry. The viral RNA serves as mRNA, producing a polyprotein that is processed by viral proteases into the RNA-dependent RNA polymerase (RdRp) and structural proteins. The RdRp then copies the +RNA genome through a −RNA intermediate.

Negative-sense RNA viruses (class V, e.g., influenza, Ebola) carry genomes that are complementary to mRNA and cannot be directly translated. These viruses must package their own RdRp within the viral particle so that, upon entry, the enzyme immediately transcribes the −RNA genome into +sense mRNA for translation.

Retroviruses (class VI, e.g., HIV) convert their RNA genome to dsDNA using reverse transcriptase, and the resulting DNA is integrated into the host chromosome by integrase. The integrated provirus is then transcribed by host RNA polymerase II, producing both mRNAs for viral protein synthesis and full-length genomic RNAs for packaging into new viral particles.

Viral Assembly and Release

After genome replication and protein synthesis, new viral particles are assembled. Non-enveloped viruses (adenovirus, poliovirus) assemble their protein capsids in the cytoplasm or nucleus and are released when the host cell lyses. Enveloped viruses (HIV, influenza, SARS-CoV-2) acquire their lipid envelope by budding through host cell membranes — typically the plasma membrane, although some bud through the ER or Golgi. During budding, viral glycoproteins embedded in the membrane are incorporated into the envelope, providing the attachment machinery for the next round of infection.

Some viruses have evolved elaborate release mechanisms. Influenza neuraminidase cleaves sialic acid residues on the host cell surface, preventing newly released virions from sticking back to the cell that produced them. This enzyme is the target of the antiviral drugs oseltamivir (Tamiflu) and zanamivir (Relenza).

Emerging Viruses and Pandemic Preparedness

Most emerging infectious diseases are caused by viruses that jump from animal reservoirs to humans (zoonotic spillover). HIV originated from simian immunodeficiency virus (SIV) in chimpanzees, SARS-CoV-2 likely originated in bats (possibly through an intermediate host), and influenza pandemics arise when avian or swine influenza strains reassort with human strains.

Factors driving emergence include habitat destruction, wildlife trade, intensive animal farming, international travel, and climate change. Pandemic preparedness now relies heavily on genomic surveillance: rapid sequencing of novel pathogens, real-time sharing of genome data, and computational analysis of mutations that might affect transmissibility, virulence, or immune evasion.

Let’s examine two variants of a viral surface protein gene to see how mutations accumulate:

let hiv_diversity = '[35, 28, 22, 15, 10, 8, 5, 3, 2, 1]'
let flu_diversity = '[40, 30, 15, 10, 5]'
print("HIV quasispecies diversity (variant frequencies):")
print("Shannon index: " + Stats.shannon(hiv_diversity))
print("Influenza diversity:")
print("Shannon index: " + Stats.shannon(flu_diversity))
let ancestor = "ATGTTTGTTTTTCTTGTTTTATTGCCACTAGTCTCTAGTCAGTGTG"
let variant1 = "ATGTTTGTTTTTCTTGTTTTATTGCCACTAGTCTCTAGTCAGTGTG"
let variant2 = "ATGTTTGTTTTCCTTGTTTTATTGCCACAAGTCTCTAGTCAGTGTG"
let variant3 = "ATGTTTGTTTTTCTTATTTTATTGCCACAAGTCTCTAGTCAGTGAG"
print("Mutations from ancestor:")
print("Variant 1: " + Seq.hamming(ancestor, variant1) + " mutations")
print("Variant 2: " + Seq.hamming(ancestor, variant2) + " mutations")
print("Variant 3: " + Seq.hamming(ancestor, variant3) + " mutations")

Even two or three nucleotide changes in a surface protein gene can alter receptor binding, immune recognition, or transmissibility — the molecular basis for why new variants of concern emerge during viral pandemics.

Bioinformatics: Viral Bioinformatics

The rapid evolution of viruses makes computational analysis indispensable for public health.

Viral genome databases serve as central repositories. NCBI Virus provides curated viral genome sequences and metadata. GISAID (Global Initiative on Sharing Avian Influenza Data) hosts the world’s largest collection of SARS-CoV-2 and influenza genomes, with over 16 million SARS-CoV-2 sequences shared during the COVID-19 pandemic. ViralZone (ExPASy) provides curated information on viral families, replication cycles, and host ranges.

Viral phylogenetics and molecular clock analysis reconstruct the evolutionary history of viral lineages. Tools such as BEAST (Bayesian Evolutionary Analysis by Sampling Trees) estimate divergence times using a molecular clock — the assumption that mutations accumulate at a roughly constant rate over time. Nextstrain combines phylogenetic analysis with geographic and temporal metadata to produce real-time visualizations of viral evolution and spread, informing variant classification and public health response.

Viral mutation tracking and variant surveillance monitor the emergence of mutations in key viral genes. During the SARS-CoV-2 pandemic, surveillance systems tracked mutations in the spike protein that affected ACE2 binding affinity, antibody neutralization, and vaccine efficacy. Variant lineages (Alpha, Delta, Omicron) were defined by characteristic constellations of mutations identified through global genomic surveillance.

Viral protein structure prediction has been accelerated by tools like AlphaFold and RoseTTAFold, which predict the three-dimensional structures of viral proteins from sequence alone. These structures reveal receptor-binding interfaces, antibody epitopes, and potential drug-binding pockets.

Vaccine antigen design (reverse vaccinology) uses computational analysis of pathogen genomes to identify promising vaccine candidates without culturing the pathogen. First applied to Neisseria meningitidis serogroup B, this approach scans the genome for surface-exposed proteins, predicts their immunogenicity, and selects candidates for experimental validation. Modern reverse vaccinology incorporates structural modeling and epitope prediction to design antigens that elicit broadly protective immune responses.

Host-pathogen protein interaction analysis maps the physical and functional interactions between viral and host proteins using proteomics (affinity purification-mass spectrometry, yeast two-hybrid) and computational prediction. These interaction maps identify the cellular pathways hijacked by the virus and highlight potential therapeutic targets.

Exercises

Exercise: Build a Pathogen Strain Tree

Phylogenetic analysis of pathogen genomes reveals transmission patterns and the emergence of new variants. Build a neighbor-joining tree from genetic distances between influenza strains:

let labels = '["H3N2_2018", "H3N2_2019", "H3N2_2020", "H1N1_2019", "H1N1_2020"]'
let dist = '[0, 8, 15, 45, 48, 8, 0, 6, 42, 44, 15, 6, 0, 40, 42, 45, 42, 40, 0, 10, 48, 44, 42, 10, 0]'
let tree = Phylo.nj(labels, dist)
print("Influenza strain phylogeny:")
print(tree)
// Which two strains are most closely related?
let answer = "H3N2_2019 and H3N2_2020"
print(answer)

Exercise: Simulate Antibiotic Resistance Evolution

Antibiotic resistance can spread rapidly through bacterial populations under selective pressure. Simulate how a resistant allele starting at 1% frequency changes over generations with and without antibiotic selection:

let no_selection = PopGen.wright_fisher(200, 0.01, 50, 42)
let selection = PopGen.wright_fisher(200, 0.10, 50, 42)
print("Without antibiotics (neutral drift):")
print(no_selection)
print("With antibiotic selection pressure:")
print(selection)
// Does resistance frequency increase or decrease under selection?
let answer = "increases"
print(answer)

Exercise: Measure Viral Quasispecies Diversity

RNA viruses like HIV exist as diverse populations (quasispecies) within a single host. Higher diversity helps the virus evade immune responses. Compare diversity at different stages of infection:

let early = '[90, 5, 3, 2]'
let chronic = '[25, 20, 18, 15, 10, 7, 5]'
let treated = '[85, 10, 3, 2]'
print("Early infection diversity:")
print("Shannon: " + Stats.shannon(early))
print("Chronic infection diversity:")
print("Shannon: " + Stats.shannon(chronic))
print("After antiviral treatment:")
print("Shannon: " + Stats.shannon(treated))
// Which stage has the highest viral diversity?
let answer = "chronic"
print(answer)

Knowledge Check

Summary

In this lesson you explored the biology of pathogens and infection:

  • The human microbiome comprises ~38 trillion bacteria, primarily in the gut, with essential roles in digestion, vitamin synthesis, and immune training; disruption (dysbiosis) is linked to diverse diseases
  • Metagenomics (16S rRNA sequencing, shotgun metagenomics) enables culture-independent microbiome analysis using tools such as QIIME2, MetaPhlAn, Kraken2, and HUMAnN; diversity is quantified by alpha (within-sample) and beta (between-sample) metrics
  • Pathogens exploit host cell biology at every stage of infection — attachment (adhesins), entry (trigger/zipper mechanisms), immune evasion, replication, and spread
  • Bacterial secretion systems (type III, IV, VI) inject effector proteins that manipulate host cell signaling, cytoskeletal dynamics, and membrane trafficking
  • Intracellular pathogens survive inside host cells by escaping the phagosome (Listeria), blocking phagosome-lysosome fusion (M. tuberculosis), or creating specialized replication vacuoles (Legionella)
  • Viruses are obligate intracellular parasites that hijack host ribosomes, membranes, and metabolic pathways; some use the host cytoskeleton for intracellular movement
  • Proviruses (HIV, herpesviruses) persist in the host by integrating into chromosomal DNA or maintaining latent episomes, enabling lifelong infection
  • Prions are misfolded proteins (PrPSc) that template conversion of normal PrPC, causing fatal neurodegenerative diseases
  • Viral genome diversity spans seven Baltimore classes (dsDNA, ssDNA, dsRNA, +ssRNA, −ssRNA, retroviruses, pararetroviruses), each with distinct replication strategies
  • Viral entry depends on receptor binding (determining tropism), followed by endocytosis, membrane fusion, or macropinocytosis; uncoating releases the genome
  • Viral assembly and release occur by lysis (non-enveloped) or budding through host membranes (enveloped viruses)
  • Emerging viruses arise through zoonotic spillover; pandemic preparedness depends on rapid genomic sequencing, real-time data sharing, and computational surveillance
  • Pathogen genomics tools detect antimicrobial resistance genes (AMRFinderPlus, CARD, ResFinder), identify virulence factors (VFDB), and reconstruct transmission chains from whole-genome sequencing data
  • Viral bioinformatics platforms (GISAID, Nextstrain, BEAST) track viral evolution in real time, enabling molecular clock analysis, variant surveillance, and reverse vaccinology for vaccine design

References

  1. Alberts B, Johnson A, Lewis J, Morgan D, Raff M, Roberts K, Walter P. Molecular Biology of the Cell, 7th ed. New York: W.W. Norton; 2022. Chapter 23: Pathogens and Infection.
  2. Elbe S, Buckland-Merrett G. Data, disease and diplomacy: GISAID's innovative contribution to global health. Glob Chall. 2017;1(1):33–46. https://www.gisaid.org/
  3. Hadfield J, Megill C, Bell SM, et al. Nextstrain: real-time tracking of pathogen evolution. Bioinformatics. 2018;34(23):4121–4123. https://nextstrain.org/
  4. Drummond AJ, Suchard MA, Xie D, Rambaut A. Bayesian phylogenetics with BEAUti and the BEAST 1.7. Mol Biol Evol. 2012;29(8):1969–1973.
  5. Rappuoli R, Bottomley MJ, D'Oro U, Finco O, De Gregorio E. Reverse vaccinology 2.0: human immunology instructs vaccine antigen design. J Exp Med. 2016;213(4):469–481.
  6. Seemann T. Prokka: rapid prokaryotic genome annotation. Bioinformatics. 2014;30(14):2068–2069.
  7. Jia B, Raphenya AR, Alcock B, et al. CARD 2017: expansion and model-centric curation of the Comprehensive Antibiotic Resistance Database. Nucleic Acids Res. 2017;45(D1):D566–D573. https://card.mcmaster.ca/
  8. Brister JR, Ako-Adjei D, Bao Y, Blinkova O. NCBI Viral Genomes Resource. Nucleic Acids Res. 2015;43(Database issue):D571–D577.

Powered by

cyanea-phylo cyanea-popgen cyanea-stats cyanea-seq
pathogens infection microbiome viruses bacteria antimicrobial resistance AMR metagenomics 16S rRNA QIIME2 Kraken2 MetaPhlAn GISAID Nextstrain BEAST provirus prions viral entry intracellular pathogens type III secretion reverse vaccinology pandemic preparedness quasispecies