Cancer as a Genetic Disease
Understand how mutations in oncogenes and tumor suppressors drive cancer development through clonal evolution — from the multi-hit model to cancer genomics.
Introduction
Cancer is not a single disease but a collection of more than 200 distinct disorders united by a common principle: the uncontrolled growth of cells that have escaped normal regulatory constraints. In the United States alone, roughly 1.9 million new cancer cases are diagnosed each year, and cancer remains the second leading cause of death worldwide. Yet the past five decades of research have revealed that cancer is, at its core, a genetic disease — it arises from the accumulation of mutations in genes that control cell proliferation, survival, differentiation, and tissue organization.
Understanding cancer at the molecular level has transformed both diagnosis and treatment. Where oncologists once classified tumors solely by tissue of origin and histological appearance, they now routinely sequence tumor genomes to identify the specific mutations driving each patient’s cancer. This lesson covers the genetic basis of cancer development, the classes of genes that go wrong, the evolutionary process by which tumors progress from mild abnormalities to lethal malignancies, the external agents that cause cancer-critical mutations, and the experimental and computational methods used to identify the genes at the heart of the disease.
20.1 — Cancer as a Microevolutionary Process
Cancer Cells Proliferate, Invade, and Metastasize
Cancer cells are defined by two fundamental properties: they proliferate in defiance of normal controls, and they invade surrounding tissues. A tumor that remains confined to its tissue of origin is called benign; one that invades and spreads is malignant. The most dangerous property of malignant cells is their ability to metastasize — to detach from the primary tumor, travel through blood or lymph, and establish secondary tumors (metastases) at distant sites. It is metastasis, not the primary tumor, that causes most cancer deaths.
Cancer cells carry defects in the regulatory circuits governing normal cell proliferation and homeostasis. The hallmark capabilities that distinguish cancer cells from their normal counterparts include self-sufficiency in growth signals, insensitivity to growth-inhibitory signals, evasion of programmed cell death, unlimited replicative potential, sustained angiogenesis, and the ability to invade and metastasize. More recently, deregulated cellular energetics, genome instability, tumor-promoting inflammation, and immune evasion have been added to this list.
Most Cancers Develop from a Single Abnormal Cell
A critical insight is that most cancers are clonal — they develop from a single cell that has acquired an initial growth advantage. Evidence for monoclonality comes from several lines of observation. In women heterozygous for X-linked markers, all cells within a given tumor express the same X chromosome, indicating they descended from one progenitor. Similarly, in B-cell lymphomas, all tumor cells produce the same antibody with the same antigen-binding specificity, reflecting their origin from a single B lymphocyte. The rare cancers caused by specific chromosomal translocations (such as the Philadelphia chromosome in chronic myeloid leukemia) display the identical translocation breakpoint in every tumor cell.
Cancer-Critical Mutations Cluster in a Few Types of Pathways
Although a tumor genome may contain thousands of mutations, the mutations that actually drive cancer — the driver mutations — cluster in a surprisingly small number of regulatory pathways. These include the Ras–MAPK signaling pathway, the PI3K–Akt–mTOR growth pathway, the p53 DNA damage response, the Rb cell-cycle control pathway, the Wnt signaling pathway, the Notch pathway, and pathways governing apoptosis, telomere maintenance, and chromatin remodeling. The remaining mutations are passenger mutations — neutral hitchhikers that accumulated during clonal expansion but do not confer a growth advantage. Typically, a solid tumor contains 3–7 driver mutations among hundreds to thousands of passengers.
Oncogenes and Tumor Suppressors
Mutations in proto-oncogenes and tumor suppressor genes are equally important for cancer development and represent the two main classes of cancer-critical genes.
Proto-oncogenes encode proteins that promote cell growth and division. When a proto-oncogene acquires a gain-of-function mutation, it becomes an oncogene that drives proliferation even in the absence of normal stimulatory signals. A single mutant allele is sufficient (the mutation acts dominantly). Oncogene activation can occur by point mutation, gene amplification, chromosomal translocation, or regulatory mutation.
Tumor suppressor genes encode proteins that restrain cell growth, promote apoptosis, or maintain genomic integrity. Cancer requires loss-of-function mutations in both alleles (the mutation is recessive at the cellular level) — a principle known as Knudson’s two-hit hypothesis, first proposed based on studies of retinoblastoma. Loss of the first allele is often by point mutation; loss of the second can occur through deletion, mitotic recombination, or epigenetic silencing.
| Gene class | Mutation type | Effect | Examples |
|---|---|---|---|
| Oncogene (gain-of-function) | Point mutation, amplification, translocation | Constitutive activation of growth signaling | RAS, MYC, EGFR, BCR-ABL, BRAF, HER2 |
| Tumor suppressor (loss-of-function) | Deletion, nonsense mutation, epigenetic silencing | Removal of growth restraints or genome maintenance | TP53, RB, APC, BRCA1/2, PTEN, CDKN2A |
Cancers Develop Gradually Through Clonal Evolution
Cancers develop gradually from increasingly aberrant cells over a period of years to decades. This process, called tumor progression, involves successive rounds of mutation and natural selection that parallel Darwinian evolution on a cellular scale. An initial mutation gives one cell a slight growth advantage; that cell’s progeny expand to form a clone; within that clone, a second mutation arises that confers an additional advantage; and so on. Each round of mutation and selection drives the population toward greater malignancy.
This evolutionary framework explains several features of cancer: the long latency between carcinogen exposure and tumor appearance, the dramatic increase in cancer incidence with age (more time allows more mutations to accumulate), and the progressive acquisition of increasingly aggressive properties.
Genetic Instability in Human Cancer Cells
Human cancer cells are genetically unstable. Normal cells have robust DNA repair and checkpoint mechanisms that keep the mutation rate low — roughly one nucleotide substitution per cell division in normal somatic cells. Cancer cells, however, often acquire mutations in genes that maintain genome integrity (such as DNA repair genes and mitotic checkpoint genes), leading to dramatically elevated mutation rates. This genetic instability accelerates tumor evolution by increasing the supply of new mutations on which selection can act.
Genetic instability manifests in two main forms: chromosomal instability (CIN), characterized by gains, losses, and rearrangements of whole chromosomes or large chromosomal segments, seen in most solid tumors; and microsatellite instability (MSI), characterized by expansions or contractions of short tandem repeat sequences due to defects in the DNA mismatch repair system (MLH1, MSH2, MSH6, PMS2).
Cancer Depends on Defective Control of Cell Death and Differentiation
Cancerous growth often depends on defective control of cell death, cell differentiation, or both. Apoptosis normally eliminates cells with oncogenic mutations, DNA damage, or improper survival signals. Cancer cells frequently disable this safeguard by overexpressing anti-apoptotic proteins (such as Bcl-2), losing pro-apoptotic regulators (such as p53), or upregulating survival signaling pathways. Similarly, many cancers arise from cells that fail to undergo terminal differentiation — instead of exiting the cell cycle and specializing, they remain in a proliferative, stem-like state.
Cancer Cells Evade, Subvert, and Exploit the Immune System
Cancer cells are usually altered in their responsiveness to other cells and to the extracellular matrix — they lose contact inhibition, grow without attachment to a substratum, and become insensitive to signals from neighboring cells that would normally restrain their proliferation.
Furthermore, cancer cells evade, subvert, and exploit the immune system. The immune system normally surveils tissues for abnormal cells and eliminates them — a process called immunosurveillance. Cancer cells escape this surveillance through multiple mechanisms: downregulating MHC class I molecules to avoid recognition by cytotoxic T cells, expressing immune checkpoint ligands such as PD-L1 to inhibit T-cell activity, secreting immunosuppressive cytokines (TGF-β, IL-10), and recruiting regulatory T cells and myeloid-derived suppressor cells that dampen anti-tumor immune responses. The tumor, in effect, creates a local immunosuppressive microenvironment.
Tumor Heterogeneity and the Microenvironment
Cancers become more and more heterogeneous as they progress. Because tumor cells continue to mutate and diverge, different regions of the same tumor — and certainly different metastases — harbor distinct sets of mutations. This intratumoral heterogeneity has profound clinical implications: a drug that targets one subclone may leave resistant subclones to expand, and a single biopsy may miss critical mutations present elsewhere.
The tumor microenvironment influences cancer development profoundly. Tumors are not simply masses of cancer cells — they are complex ecosystems containing fibroblasts, endothelial cells, immune cells, and extracellular matrix. Cancer-associated fibroblasts (CAFs) remodel the extracellular matrix and secrete growth factors. Tumor-associated macrophages can suppress immune attack and promote angiogenesis. Chronic inflammation in the microenvironment generates reactive oxygen species that accelerate mutagenesis. The crosstalk between cancer cells and their stromal neighbors critically shapes tumor behavior.
Metastasis Requires Survival in a Foreign Environment
Cancer cells must survive and proliferate in a foreign environment to metastasize. Metastasis is an inefficient process: millions of tumor cells may enter the bloodstream, but only a tiny fraction establish secondary tumors. Successful metastasis requires a cell to detach from the primary tumor, invade through the basement membrane, enter the vasculature (intravasation), survive in the circulation, exit at a distant site (extravasation), and colonize a foreign tissue. Each step poses a selective barrier. Metastatic cells often show an epithelial-to-mesenchymal transition (EMT) — they lose cell-cell adhesion (downregulating E-cadherin) and acquire migratory, invasive properties.
The Colorectal Cancer Paradigm
Colorectal cancers evolve slowly through a succession of visible changes and provide the best-studied example of multi-step tumor progression. The adenoma-carcinoma sequence describes the stepwise transformation from normal colonic epithelium to metastatic carcinoma over a period of 10–20 years:
Normal epithelium → small adenoma (polyp) → large adenoma → carcinoma in situ → invasive carcinoma → metastasis
A few key genetic lesions are common to many colorectal cancers, and the steps of tumor progression can often be correlated with specific mutations:
| Stage | Genetic event | Consequence |
|---|---|---|
| Normal → small adenoma | Loss of APC (both alleles) | Constitutive Wnt signaling; excess proliferation |
| Small adenoma → large adenoma | Activating mutation in KRAS | Constitutive Ras–MAPK signaling |
| Large adenoma → carcinoma | Loss of TP53 | Loss of DNA damage checkpoint and apoptosis |
| Carcinoma → metastasis | Loss of SMAD4 and other late events | Loss of TGF-β growth inhibition; invasive capacity |
Some colorectal cancers have defects in DNA mismatch repair rather than following the classic APC → KRAS → TP53 sequence. These tumors display microsatellite instability (MSI-high), accumulate mutations at an accelerated rate, and follow a different genetic pathway to malignancy. Hereditary nonpolyposis colorectal cancer (Lynch syndrome) results from inherited mutations in mismatch repair genes (MLH1, MSH2), predisposing carriers to this alternative route.
Each case of cancer is characterized by its own array of genetic lesions. While certain driver mutations are common across many tumors of the same type, no two cancers are genetically identical. This molecular individuality is the basis of precision oncology — the tailoring of treatment to the specific mutational profile of each patient’s tumor.
Let us examine a classic oncogenic mutation computationally. The KRAS proto-oncogene is activated by point mutations that lock the Ras protein in its GTP-bound (active) state. The most common mutation is a G→T change at codon 12, converting glycine (GGT) to valine (GTT):
let mutation_rates = '[2.1, 3.5, 1.8, 4.2, 2.9, 3.1, 5.0, 2.4, 1.5, 3.8]'
print("Somatic mutation rates (mutations/Mb) across tumor samples:")
print(Stats.describe(mutation_rates))
The single amino acid change from glycine to valine at position 12 prevents the GTPase-activating protein (GAP) from stimulating GTP hydrolysis, locking KRAS in its active conformation and constitutively driving proliferative signaling through the MAPK cascade.
Cancer Genomics and Computational Oncology
The sequencing revolution has transformed our ability to study cancer genomes at scale. Several key databases and computational approaches underpin modern cancer genomics:
Cancer genomic databases: The Cancer Genome Atlas (TCGA) and the International Cancer Genome Consortium (ICGC) have together sequenced tens of thousands of tumors across more than 50 cancer types, creating comprehensive catalogs of somatic mutations, copy number alterations, structural variants, and gene expression changes. cBioPortal provides an interactive web interface for exploring these datasets. The Catalogue of Somatic Mutations in Cancer (COSMIC) curates known somatic mutations across all cancer types.
Somatic mutation calling: Identifying mutations present in tumor DNA but absent from the matched normal tissue requires specialized algorithms. Mutect2 (part of GATK), Strelka2, and VarScan compare tumor and normal sequencing data to call single-nucleotide variants and small insertions/deletions, while filtering out germline variants and sequencing artifacts.
Driver gene and driver mutation identification: Among thousands of somatic mutations per tumor, computational methods distinguish the handful of drivers from the mass of passengers. MutSig identifies genes mutated more frequently than expected given the background mutation rate and gene characteristics. IntOGen and OncodriveFML use functional impact scores to find genes enriched in high-impact mutations.
Tumor mutational burden and microsatellite instability: Tumor mutational burden (TMB) — the number of somatic mutations per megabase — varies from fewer than 1/Mb in pediatric cancers to over 100/Mb in hypermutated tumors. MSI analysis detects instability at microsatellite loci indicative of mismatch repair deficiency. Both TMB and MSI-high status are biomarkers for response to immune checkpoint inhibitors.
Copy number alteration analysis: Many cancer-critical genes are altered not by point mutation but by amplification or deletion. GISTIC (Genomic Identification of Significant Targets in Cancer) identifies recurrently amplified or deleted genomic regions across many tumor samples. FACETS estimates allele-specific copy number from tumor sequencing data, accounting for tumor purity and ploidy.
Tumor clonality and subclonal architecture: Tools such as PyClone and SciClone use variant allele frequencies to infer the clonal structure of a tumor — distinguishing founder (clonal) mutations present in all cancer cells from subclonal mutations present in only a fraction. Reconstructing the evolutionary tree of a tumor’s subclones is essential for understanding drug resistance and predicting relapse.
Cancer gene fusion detection: Chromosomal rearrangements can create oncogenic fusion genes (such as BCR-ABL in CML or EML4-ALK in lung cancer). STAR-Fusion and Arriba detect gene fusions from RNA-seq data by identifying chimeric reads that span fusion junctions.
Mutational signature decomposition: Different mutagenic processes leave characteristic patterns of nucleotide substitutions in the genome. SigProfiler and MutationalPatterns decompose the spectrum of mutations in a tumor into contributions from known mutational signatures — such as UV radiation (C→T at dipyrimidine sites), tobacco smoke (C→A transversions), APOBEC activity (C→T and C→G in TCN context), defective mismatch repair, or defective homologous recombination (BRCA signature). Identifying active signatures reveals the mutagenic history of a tumor and can guide treatment decisions.
Let us compare a normal TP53 sequence with a cancer-associated mutant. The R175H hotspot mutation in the p53 DNA-binding domain is one of the most common TP53 mutations across all cancer types:
let data = '[{"x": 1.2, "y": 48, "label": "Low burden"}, {"x": 2.5, "y": 36, "label": "Moderate"}, {"x": 5.0, "y": 24, "label": "High"}, {"x": 8.5, "y": 18, "label": "Very high"}, {"x": 15, "y": 12, "label": "Hypermutated"}]'
let plot = Viz.scatter(data, '{"title": "Mutation Burden vs Median Survival", "x_label": "Mutations/Mb", "y_label": "Survival (months)", "color": "#EF4444"}')
print(plot)
The p53 protein is the most frequently mutated gene in human cancers, altered in approximately 50% of all tumors. Most cancer-associated mutations cluster in the DNA-binding domain and are missense mutations that disable p53’s ability to bind its target sequences, abolishing its function as a transcription factor for cell-cycle arrest and apoptosis genes.
20.2 — Preventable Causes of Cancer
Carcinogens and DNA Damage
Many, but not all, cancer-causing agents damage DNA. Chemical and physical agents that cause cancer are called carcinogens. The connection between environmental exposure and cancer was first recognized in 1775 when the surgeon Percivall Pott observed that chimney sweeps developed scrotal cancer from chronic soot exposure. Today we know that carcinogens work primarily by causing mutations in cancer-critical genes.
Major classes of carcinogens include:
| Carcinogen | Source | Mechanism | Associated cancers |
|---|---|---|---|
| Polycyclic aromatic hydrocarbons | Tobacco smoke, grilled meat | Form DNA adducts; cause G→T transversions | Lung, bladder, head and neck |
| Ultraviolet radiation | Sunlight | Causes pyrimidine dimers; C→T transitions at dipyrimidines | Melanoma, squamous cell carcinoma |
| Aflatoxin B1 | Aspergillus mold contamination of food | Forms DNA adducts; causes G→T transversion in TP53 codon 249 | Hepatocellular carcinoma |
| Ionizing radiation | X-rays, nuclear fallout | Double-strand breaks, chromosomal rearrangements | Leukemia, thyroid cancer |
| Asbestos | Industrial exposure | Chronic inflammation, oxidative damage | Mesothelioma |
Not all carcinogens are mutagens. Some act as tumor promoters — they do not damage DNA directly but stimulate the proliferation of cells that already carry mutations, thereby expanding the pool of pre-malignant cells and increasing the probability of further mutations. Chronic inflammation, hormonal stimulation, and certain dietary factors can act as tumor promoters.
Tumor Viruses
Tumor viruses cause approximately 15–20% of human cancers worldwide. They promote cancer through several mechanisms: by carrying viral oncogenes, by integrating near and activating cellular proto-oncogenes, or by producing proteins that inactivate tumor suppressors.
| Virus | Type | Cancer | Mechanism |
|---|---|---|---|
| HPV (Human papillomavirus) | DNA virus | Cervical, oropharyngeal | E6 degrades p53; E7 inactivates Rb |
| HBV/HCV (Hepatitis B/C) | DNA/RNA virus | Hepatocellular carcinoma | Chronic inflammation; HBx protein; viral integration |
| EBV (Epstein-Barr virus) | DNA virus | Burkitt lymphoma, nasopharyngeal carcinoma | Latent membrane proteins activate NF-κB and survival signals |
| HTLV-1 | Retrovirus | Adult T-cell leukemia | Tax protein activates NF-κB; viral integration |
| HHV-8 (Kaposi sarcoma herpesvirus) | DNA virus | Kaposi sarcoma | Viral homologs of cyclin D and Bcl-2 |
Importantly, viruses do not cause cancer on their own — additional somatic mutations are required. The virus provides one or more “hits” in the multi-step progression, but the full complement of driver mutations must still accumulate.
Epidemiology and Risk Factor Identification
The epidemiology of cancer and the identification of risk factors have been instrumental in cancer prevention. Epidemiological studies comparing cancer rates across populations, occupations, and lifestyles have identified major modifiable risk factors: tobacco use (responsible for ~30% of cancer deaths), diet and obesity, alcohol consumption, infectious agents, and UV exposure. The observation that Japanese immigrants to the United States acquire American cancer rates within one to two generations demonstrated that environmental factors, rather than genetic predisposition alone, are the dominant determinants of cancer risk for most common cancers.
Cancer Epidemiology Informatics
Computational approaches play an increasing role in identifying and quantifying cancer risk factors:
Cancer epidemiology databases: The Surveillance, Epidemiology, and End Results (SEER) program of the US National Cancer Institute tracks cancer incidence, survival, and mortality data for the US population. The Global Burden of Disease (GBD) project provides global estimates of cancer burden by country, age, and risk factor.
Risk factor association analysis: Mendelian randomization uses genetic variants as instrumental variables to test causal relationships between risk factors and cancer. For example, genetic variants that affect alcohol metabolism have been used to establish the causal effect of alcohol consumption on esophageal cancer risk, free from the confounding that plagues observational epidemiological studies.
Viral integration site detection: In virus-associated cancers, identifying where the viral genome has integrated into the host genome reveals which cellular genes are disrupted or activated. Computational tools analyze whole-genome sequencing data to detect chimeric reads spanning virus-host junctions, mapping integration sites genome-wide and identifying recurrently targeted loci.
20.3 — Finding the Cancer-Critical Genes
Identifying Gain-of-Function and Loss-of-Function Mutations
Different methods are used to identify gain-of-function and loss-of-function cancer-critical mutations. Oncogenes can be identified by their ability to transform normal cells in culture (gain-of-function assays), while tumor suppressors are typically identified through loss-of-heterozygosity analysis, linkage studies in cancer-prone families, or genome-wide screening for recurrently deleted regions in tumors.
The Discovery of RAS
The oncogene RAS was discovered using a DNA transfection assay. In 1982, researchers showed that DNA extracted from a human bladder carcinoma cell line, when introduced into normal mouse fibroblasts, could transform them — causing them to proliferate uncontrollably and form tumors in nude mice. The transforming gene was isolated and identified as a mutant version of the HRAS gene carrying a single point mutation (G12V). This landmark experiment provided the first direct demonstration that a point mutation in a human gene could cause cancer. The Ras family (HRAS, KRAS, NRAS) is now known to be the most frequently mutated oncogene family in human cancers, with KRAS mutations found in ~25% of all tumors.
Tumor Virus Insertions Activate Oncogenes
Rare tumor virus insertions can activate oncogenes. When retroviruses integrate into the host genome, they occasionally insert near a proto-oncogene. The strong viral promoter and enhancer sequences can drive overexpression of the adjacent cellular gene, converting it into an oncogene. This mechanism, called insertional mutagenesis, was used to identify many important oncogenes, including MYC (discovered through avian leukosis virus insertion) and ERBB2/HER2 (identified through a related approach). Although retroviral insertional mutagenesis is rare in human cancer, it remains a powerful experimental tool for oncogene discovery.
Genome-Wide Analysis Reveals the Full Catalog of Mutations
Genome-wide analyses reveal the full catalog of mutations in a cancer cell. With the advent of next-generation sequencing, it became possible to sequence entire cancer genomes and identify every somatic mutation. Projects like TCGA and ICGC have performed this analysis systematically across dozens of cancer types, revealing that the mutational landscapes vary enormously — from fewer than 10 somatic mutations in some pediatric cancers to over 100,000 in hypermutated tumors with mismatch repair deficiency. The central challenge is distinguishing the small number of functionally important driver mutations from the large background of neutral passengers.
The Tumor Suppressor p53
Many cancers have an altered p53 gene. The TP53 gene encodes the p53 protein, often called the “guardian of the genome” because of its central role in responding to cellular stress. When DNA is damaged, p53 is stabilized and activates the transcription of genes that cause cell-cycle arrest (p21), DNA repair, or apoptosis (PUMA, Noxa). Loss of p53 function allows cells with damaged DNA to continue dividing and accumulating mutations. TP53 is mutated in approximately 50% of all human cancers — more than any other single gene. Most p53 mutations are missense mutations in the DNA-binding domain that abolish transcriptional activity while producing a stable, often dominant-negative protein.
Inherited Cancer Syndromes
Inherited cancer syndromes help identify cancer-critical genes. Individuals who inherit one defective copy of a tumor suppressor gene are predisposed to cancer because only one additional somatic mutation (the “second hit”) is needed to eliminate the gene’s function. Study of these hereditary cancer syndromes has identified many of the most important tumor suppressor genes:
| Syndrome | Gene | Cancer predisposition |
|---|---|---|
| Retinoblastoma | RB1 | Childhood retinoblastoma |
| Li-Fraumeni syndrome | TP53 | Sarcomas, breast cancer, brain tumors, leukemia |
| Familial adenomatous polyposis | APC | Colorectal cancer |
| Lynch syndrome (HNPCC) | MLH1, MSH2, MSH6, PMS2 | Colorectal, endometrial cancer |
| Hereditary breast/ovarian cancer | BRCA1, BRCA2 | Breast, ovarian cancer |
These syndromes confirmed that tumor suppressor loss is a rate-limiting step in tumor development and led directly to genetic testing programs that identify at-risk individuals before cancer develops.
Cancer Gene Discovery and Prioritization
Modern computational methods accelerate the discovery and characterization of cancer-critical genes:
Pan-cancer analysis workflows integrate mutation, copy number, expression, and methylation data across all tumor types to identify genes and pathways recurrently altered in cancer regardless of tissue of origin. These analyses have revealed core cancer pathways — p53/cell cycle, Ras/MAPK, PI3K/mTOR, Wnt, Myc, Notch, chromatin remodeling, and RNA splicing — that are disrupted in the majority of cancers.
Cancer gene census (COSMIC CGC) and OncoKB annotation: The COSMIC Cancer Gene Census is a curated catalog of genes with mutations causally implicated in cancer. OncoKB provides clinical annotation of cancer gene alterations, linking specific mutations to their biological effects and therapeutic implications (e.g., BRAF V600E is annotated as a level 1 therapeutic target in melanoma).
Pathway-level mutation analysis: Tools such as PARADIGM and resources like Pathway Commons assess the collective impact of mutations on signaling pathways rather than individual genes. A pathway may be disrupted in nearly all tumors of a given type, even though different tumors carry mutations in different genes within that pathway.
Synthetic lethality prediction and analysis: Two genes are synthetically lethal if loss of either alone is tolerable but loss of both is fatal. This principle has been exploited therapeutically: cancers with BRCA1/2 mutations (defective homologous recombination) are exquisitely sensitive to PARP inhibitors because PARP provides a backup DNA repair pathway. Computational screens using genetic interaction data, CRISPR screening results, and tumor genomic profiles identify new synthetic lethal relationships that could be targeted therapeutically.
Pharmacogenomics databases: The Genomics of Drug Sensitivity in Cancer (GDSC), the Cancer Cell Line Encyclopedia (CCLE), and the Cancer Dependency Map (DepMap) provide large-scale data linking the genetic features of cancer cell lines to their sensitivity to hundreds of drugs and genetic perturbations. DepMap, in particular, uses genome-wide CRISPR knockout screens to identify genes essential for the survival of each cancer cell line, revealing cancer-specific dependencies that are candidates for therapeutic targeting.
Let us now use sequence alignment to compare two tumor suppressor genes — BRCA1 and BRCA2. Both are involved in homologous recombination repair of double-strand breaks, and mutations in either predispose to breast and ovarian cancer, yet they are structurally distinct proteins that arose independently:
// 8 tumor samples, 4 gene expression features each
let expr_data = '[5.2, 8.1, 1.2, 0.5, 4.8, 7.9, 1.4, 0.3, 1.1, 0.8, 6.5, 7.2, 0.9, 1.0, 6.8, 7.5, 5.5, 8.3, 1.0, 0.4, 1.3, 0.7, 6.2, 7.0, 4.9, 7.8, 1.3, 0.6, 1.0, 0.9, 6.6, 7.3]'
let pca = ML.pca(expr_data, 4, 2)
print("PCA of tumor gene expression (2 subtypes visible):")
print(pca)
Despite both being critical for DNA repair via homologous recombination, BRCA1 and BRCA2 show little sequence similarity — they are not homologs but rather convergently recruited components of the same repair pathway. BRCA1 functions as a ubiquitin ligase and scaffold, while BRCA2 directly loads RAD51 recombinase onto single-stranded DNA at break sites.
We can also examine the structural properties of key cancer proteins. The p53 tumor suppressor and KRAS oncogene have very different biophysical characteristics, reflecting their distinct roles in the cell:
let p53_dna_binding = "MCNSSCMGGMNRRPILTIITLEDSSG"
let ras_gtp_binding = "MTEYKLVVVGAGGVGKSALTIQLIQ"
print("p53 DNA-binding domain (tumor suppressor):")
print(Struct.protein_props(p53_dna_binding))
print("KRAS GTP-binding domain (oncogene):")
print(Struct.protein_props(ras_gtp_binding))
The contrast in amino acid composition reflects the functional divergence between these two proteins: the p53 DNA-binding domain is rich in polar and charged residues that mediate sequence-specific contacts with DNA, while the KRAS GTPase domain is enriched in small hydrophobic residues that form the compact nucleotide-binding pocket.
Exercises
Exercise: Analyze Tumor Mutation Burden
Tumor mutation burden (TMB) varies widely between cancer types. High TMB correlates with response to immunotherapy. Analyze mutation rates across different cancers:
let melanoma = '[15.2, 18.5, 12.3, 20.1, 16.8, 14.5, 19.2, 17.0]'
let lung = '[8.5, 10.2, 7.8, 12.1, 9.5, 11.0, 8.2, 10.8]'
let pediatric = '[0.5, 0.8, 0.3, 1.1, 0.6, 0.4, 0.7, 0.9]'
print("Melanoma TMB:")
print(Stats.describe(melanoma))
print("Lung cancer TMB:")
print(Stats.describe(lung))
print("Pediatric cancer TMB:")
print(Stats.describe(pediatric))
// Which cancer type has the highest mutation burden?
let answer = "melanoma"
print(answer)
Exercise: Visualize Cancer Hallmarks
The hallmarks of cancer describe distinct capabilities acquired during tumor development. Not all hallmarks are equally prominent in every cancer type. Compare two cancer profiles:
let breast = '[{"label": "Proliferative signaling", "value": 90}, {"label": "Evading apoptosis", "value": 75}, {"label": "Angiogenesis", "value": 60}, {"label": "Invasion", "value": 40}, {"label": "Immune evasion", "value": 50}]'
let melanoma = '[{"label": "Proliferative signaling", "value": 85}, {"label": "Evading apoptosis", "value": 60}, {"label": "Angiogenesis", "value": 45}, {"label": "Invasion", "value": 70}, {"label": "Immune evasion", "value": 80}]'
print(Viz.bar(breast, '{"title": "Breast Cancer Hallmarks (%)", "color": "#EC4899"}'))
print(Viz.bar(melanoma, '{"title": "Melanoma Hallmarks (%)", "color": "#6B7280"}'))
// Which hallmark is most prominent in breast cancer?
let answer = "proliferative signaling"
print(answer)
Exercise: Tumor Subtype Classification
Gene expression profiling can classify tumors into molecular subtypes. Use PCA to reduce expression data and identify clusters:
// 6 tumors, 3 expression features: [ER, HER2, Ki67]
let data = '[8.5, 0.5, 2.0, 8.2, 0.3, 1.8, 1.0, 9.0, 7.5, 0.8, 8.8, 7.0, 8.0, 0.4, 2.2, 1.2, 8.5, 6.8]'
let result = ML.pca(data, 3, 2)
print("PCA of tumor expression profiles:")
print(result)
// How many distinct clusters are visible?
let answer = "2"
print(answer)
Knowledge Check
Summary
In this lesson you covered the genetic basis of cancer and the methods used to identify cancer-critical genes:
- Cancer is a genetic disease driven by the accumulation of mutations in genes controlling cell proliferation, survival, differentiation, and tissue organization
- Most cancers are clonal, arising from a single cell that acquired an initial growth advantage
- Driver mutations cluster in a small number of key pathways (Ras–MAPK, PI3K–mTOR, p53, Rb, Wnt, Notch) among a large background of neutral passenger mutations
- Oncogenes (gain-of-function: RAS, MYC, EGFR, BRAF) and tumor suppressors (loss-of-function: TP53, RB, APC, BRCA1/2) are equally important in cancer development
- Tumor progression is an evolutionary process of successive rounds of mutation and natural selection, typically spanning years to decades
- Genetic instability (chromosomal instability or microsatellite instability) accelerates tumor evolution by increasing mutation rates
- Cancer cells subvert cell death, differentiation, immune surveillance, and cell-cell communication to sustain malignant growth
- Intratumoral heterogeneity and the tumor microenvironment (fibroblasts, immune cells, extracellular matrix) profoundly influence cancer behavior and treatment response
- Metastasis requires cancer cells to navigate multiple selective barriers to survive and proliferate in foreign tissue environments
- Colorectal cancer exemplifies multi-step progression: APC loss → KRAS activation → TP53 loss → SMAD4 loss, with MSI-high tumors following an alternative mismatch-repair-deficient pathway
- Carcinogens (tobacco smoke, UV radiation, aflatoxin) cause cancer primarily by damaging DNA, while tumor promoters stimulate proliferation of pre-malignant cells
- Tumor viruses (HPV, HBV/HCV, EBV, HTLV-1) contribute to ~15–20% of cancers by inactivating tumor suppressors, activating oncogenes, or driving chronic inflammation
- Cancer epidemiology has identified major modifiable risk factors; computational methods like Mendelian randomization strengthen causal inference
- RAS was discovered by DNA transfection, MYC by retroviral insertional mutagenesis, and tumor suppressors by studying hereditary cancer syndromes (retinoblastoma, Li-Fraumeni, familial adenomatous polyposis, Lynch syndrome, hereditary breast/ovarian cancer)
- Cancer genomics tools — Mutect2 (variant calling), MutSig/IntOGen (driver identification), GISTIC/FACETS (copy number), PyClone/SciClone (clonality), STAR-Fusion/Arriba (fusions), SigProfiler (mutational signatures) — provide a comprehensive computational toolkit for characterizing tumor genomes
- Cancer gene resources (COSMIC CGC, OncoKB) and pharmacogenomics databases (GDSC, CCLE, DepMap) link genetic alterations to biological function and therapeutic vulnerability
References
- Alberts B, Johnson A, Lewis J, Morgan D, Raff M, Roberts K, Walter P. Molecular Biology of the Cell, 7th ed. New York: W.W. Norton; 2022. Chapter 20: Cancer.
- Hanahan D, Weinberg RA. The hallmarks of cancer. Cell. 2000;100(1):57–70.
- Hanahan D, Weinberg RA. Hallmarks of cancer: the next generation. Cell. 2011;144(5):646–674.
- Vogelstein B, Papadopoulos N, Velculescu VE, et al. Cancer genome landscapes. Science. 2013;339(6127):1546–1558.
- Tate JG, Bamford S, Jubb HC, et al. COSMIC: the Catalogue Of Somatic Mutations In Cancer. Nucleic Acids Res. 2019;47(D1):D941–D947. https://cancer.sanger.ac.uk/cosmic
- Chakravarty D, Gao J, Phillips SM, et al. OncoKB: a precision oncology knowledge base. JCO Precis Oncol. 2017;1:1–16. https://www.oncokb.org/
- Ghandi M, Huang FW, Jané-Valbuena J, et al. Next-generation characterization of the Cancer Cell Line Encyclopedia. Nature. 2019;569(7757):503–508. https://depmap.org/
- Yang W, Soares J, Greninger P, et al. Genomics of Drug Sensitivity in Cancer (GDSC): a resource for therapeutic biomarker discovery in cancer cells. Nucleic Acids Res. 2013;41(D1):D955–D961.