How Proteins Work
Understand how proteins bind ligands, catalyze reactions, and are regulated by phosphorylation, allostery, and GTP switches — plus the computational tools for predicting protein function and interaction networks.
Introduction
The previous lesson explored how proteins are built and how they fold. This lesson asks the next question: how do proteins actually work? The answer, in nearly every case, begins with binding — a protein’s function depends on its ability to grab hold of specific molecules with exquisite selectivity. From this simple foundation, proteins achieve an astonishing range of functions: catalyzing reactions, transducing signals, generating mechanical force, and assembling into machines far more complex than any individual molecule.
We will also cover the computational tools used to predict protein function, map interaction networks, and identify drug targets — essential methods in modern biology and drug discovery.
All Proteins Bind to Other Molecules
Every protein functions by binding to one or more partner molecules, called ligands. A ligand can be a small molecule (a substrate, drug, or metabolite), another protein, a nucleic acid, or a lipid. Binding occurs at a specific binding site on the protein surface, where the shape and chemical character of the site are complementary to the ligand.
The surface conformation of a protein determines its chemistry. The binding site presents a precise arrangement of hydrogen-bond donors and acceptors, hydrophobic patches, and charged groups that match the ligand. The interaction follows the induced-fit model: both protein and ligand adjust their conformations upon binding, optimizing the interface.
The strength of binding is quantified by the equilibrium dissociation constant, Kd:
| Kd | Binding strength | Examples |
|---|---|---|
| 10−12; M (picomolar) | Extremely tight | Antibody-antigen, biotin-streptavidin |
| 10−9; M (nanomolar) | Very strong | Hormone-receptor |
| 10−6; M (micromolar) | Moderate | Typical enzyme-substrate |
| 10−3; M (millimolar) | Weak, transient | Metabolite interactions |
A small Kd means tight binding (the complex is stable); a large Kd means weak binding. The Kd equals the ligand concentration at which half the binding sites are occupied.
Protein-Protein Interactions
Proteins bind to other proteins through several types of interfaces:
- Large, flat interfaces (>1,500 Ų) — characteristic of stable complexes like hemoglobin’s α-β interface
- Small, modular interfaces — mediated by interaction domains (SH2, SH3, PDZ, WW) that recognize short peptide motifs
- Coiled-coil interfaces — two or more α helices wind around each other, as in transcription factors like leucine zippers
Antibodies provide a spectacular example of binding specificity. Each antibody has a unique antigen-binding site formed by six hypervariable loops (complementarity-determining regions, CDRs) that can be shaped to recognize virtually any molecular surface — from small drug molecules to large protein antigens — with extraordinary affinity (Kd values in the picomolar to nanomolar range).
Sequence comparison between family members highlights critical ligand-binding sites. Residues that are conserved across a protein family are often those that directly contact the ligand or are essential for maintaining the binding site’s geometry. This principle — conservation implies functional importance — is one of the most powerful tools in bioinformatics.
Enzymes: Powerful and Specific Catalysts
Enzymes are the most functionally diverse proteins. They accelerate chemical reactions by factors of 10⁶ to 10¹⁴ while maintaining extraordinary specificity for their substrates.
Substrate binding is the first step in enzyme catalysis. The substrate enters the active site, a pocket or cleft on the enzyme surface with a shape complementary to the substrate. Binding positions the substrate’s reactive groups precisely relative to the enzyme’s catalytic residues.
Enzymes speed reactions by selectively stabilizing the transition state — the high-energy intermediate through which reactants must pass to become products. By lowering the activation energy of the transition state (without changing the overall ΔG of the reaction), enzymes make reactions kinetically accessible that would otherwise be prohibitively slow.
Enzymes employ several catalytic strategies simultaneously:
- Proximity and orientation — reactants are brought together in the correct geometry
- Acid-base catalysis — amino acid side chains donate or accept protons at critical moments
- Covalent catalysis — a transient covalent bond forms between enzyme and substrate
- Metal ion catalysis — metal cofactors (Zn²⁺, Mg²⁺, Fe²⁺) stabilize charges or activate water
Lysozyme illustrates these principles. It cleaves the glycosidic bond in bacterial cell wall polysaccharides by distorting the sugar ring into a high-energy conformation resembling the transition state, stabilized by a glutamic acid that donates a proton and an aspartate that stabilizes the resulting oxocarbenium ion.
The catalytic triad of serine proteases (Ser-His-Asp) is one of the best-studied active site motifs. We can examine the physical properties of these key residues computationally:
let triad = "SHD"
let props = Struct.protein_props(triad)
print("Catalytic triad (Ser-His-Asp) properties:")
print(props)
A larger fragment of the active site reveals how the surrounding residues contribute to the enzyme’s overall character:
let chymotrypsin_site = "GDSGGPVVCSG"
let site_props = Struct.protein_props(chymotrypsin_site)
print("Chymotrypsin active site region:")
print(site_props)
Exercise: Active Site Analysis
Trypsin and chymotrypsin are serine proteases with similar catalytic mechanisms but different substrate preferences. Trypsin cleaves after positively charged residues (Lys, Arg), while chymotrypsin prefers large hydrophobic residues. Compare the physical properties of their substrate-binding pockets to see how sequence differences shape specificity.
let trypsin_pocket = "DSCQGDSGGPV"
let chymo_pocket = "SSCMGDSGGPL"
let tryp_props = Struct.protein_props(trypsin_pocket)
let chymo_props = Struct.protein_props(chymo_pocket)
print("Trypsin binding pocket:")
print(tryp_props)
print("Chymotrypsin binding pocket:")
print(chymo_props)
print("Chymotrypsin pocket is more hydrophobic")
Many enzymes require tightly bound small molecules called cofactors or coenzymes to function. These add chemical capabilities beyond what amino acid side chains alone can provide — for example, heme (with its iron center) enables cytochrome enzymes to transfer electrons, and pyridoxal phosphate (vitamin B6) enables transaminases to shuttle amino groups.
Multienzyme Complexes and Metabolic Efficiency
Some enzymes are organized into multienzyme complexes that increase the rate of cell metabolism by channeling intermediates directly from one active site to the next, without releasing them into the cytoplasm. The pyruvate dehydrogenase complex (which converts pyruvate to acetyl CoA) contains three different enzyme activities and channels substrates through a series of reactions with remarkable efficiency.
This organization minimizes the diffusion distances for intermediates, prevents side reactions, and allows coordinate regulation of sequential steps.
Regulation of Enzyme Activity
Cells do not simply let all enzymes run at full speed. Multiple mechanisms regulate catalytic activity:
Allosteric regulation: many enzymes have two or more binding sites that interact. An allosteric effector binds at a regulatory site distinct from the active site and shifts the enzyme between active (R) and inactive (T) conformations. Allosteric activators stabilize the R state; allosteric inhibitors stabilize the T state.
A fundamental principle of allostery is that two ligands whose binding sites are coupled must reciprocally affect each other’s binding. If an activator promotes substrate binding, then substrate binding must, by the same logic, promote activator binding. This thermodynamic linkage is a general feature of allosteric systems.
In symmetric protein assemblies like hemoglobin (α&sub2;β&sub2;), allosteric transitions produce cooperativity: the binding of one oxygen molecule shifts the equilibrium toward the R state, increasing the affinity of the remaining subunits. The result is a sigmoidal binding curve that allows hemoglobin to load oxygen efficiently in the lungs and unload it in the tissues.
Protein Phosphorylation
One of the most pervasive regulatory mechanisms in eukaryotic cells is reversible protein phosphorylation. Protein kinases transfer the γ-phosphate from ATP to the hydroxyl group of serine, threonine, or tyrosine residues on target proteins. Protein phosphatases remove these phosphate groups, restoring the original state.
Phosphorylation can activate or inactivate an enzyme, create new binding sites for other proteins (e.g., phosphotyrosine is recognized by SH2 domains), alter protein localization, or trigger degradation. The human genome encodes approximately 500 protein kinases and ~150 protein phosphatases — roughly 2.5% of all genes — reflecting the central importance of this regulatory mechanism.
The Src protein kinase illustrates how a protein can function as a microprocessor: its activity is controlled by the interplay of multiple phosphorylation sites, SH2 and SH3 domain interactions, and intramolecular binding. The balance between activating and inhibitory inputs determines whether Src is on or off — a molecular logic gate that integrates multiple upstream signals.
Proteins are also controlled by many other covalent modifications: ubiquitination (targeting for degradation), acetylation (especially of histones), methylation, glycosylation, lipidation (membrane anchoring), and SUMOylation. Each adds a layer of regulatory complexity.
GTP-Binding Proteins as Molecular Switches
GTP-binding proteins (G proteins) function as binary molecular switches that cycle between an active GTP-bound state and an inactive GDP-bound state:
- GEFs (guanine nucleotide exchange factors) activate them by promoting the exchange of GDP for GTP
- GAPs (GTPase-activating proteins) inactivate them by stimulating the intrinsic GTPase activity that hydrolyzes GTP to GDP
Two major families of G proteins regulate cell behavior:
Monomeric (small) GTPases — including the Ras superfamily — relay signals from cell-surface receptors to intracellular pathways. Mutations that lock Ras in its GTP-bound state (preventing GAP-stimulated hydrolysis) are found in approximately 30% of human cancers, making Ras one of the most important oncogenes.
Trimeric G proteins (αβγ complexes) couple seven-transmembrane receptors (GPCRs) to intracellular effectors like adenylyl cyclase and phospholipase C, mediating responses to hormones, neurotransmitters, and sensory stimuli.
Motor Proteins and Mechanical Work
Some proteins convert the chemical energy of ATP hydrolysis into mechanical movement — large movements generated from small conformational changes amplified by structural lever arms:
- Myosin walks along actin filaments, powering muscle contraction and cytokinesis
- Kinesin walks along microtubules toward the cell periphery (plus end), transporting vesicles and organelles
- Dynein walks along microtubules toward the cell center (minus end), moving cargo inward and powering cilia and flagella
Membrane-bound transporters also harness energy to pump molecules across membranes. The Na⁺/K⁺-ATPase, for example, uses ATP hydrolysis to pump 3 Na⁺ out and 2 K⁺ in per cycle, maintaining the ion gradients essential for nerve impulses and cell volume regulation.
Protein Machines, Scaffolds, and Interchangeable Parts
Many cellular processes are carried out by protein machines — large multi-protein assemblies where the coordinated action of many subunits achieves functions impossible for individual proteins. The ribosome (translating mRNA into protein), the proteasome (degrading damaged proteins), the spliceosome (removing introns), and the DNA replication machinery are all protein machines.
Scaffold proteins organize signaling cascades by physically bringing together sets of interacting proteins. The scaffold concentrates components at one location, increases reaction speed, prevents cross-talk with other pathways, and can be regulated by phosphorylation or localization.
Protein complexes often use interchangeable parts to make efficient use of genetic information. The same core complex may associate with different regulatory subunits to perform different functions in different cell types or at different times — a modular strategy that multiplies functional diversity from a limited gene set.
Gene Ontology and Functional Annotation
The Gene Ontology (GO) provides a standardized vocabulary for describing gene and protein function across all organisms. GO has three branches:
- Molecular Function — what the protein does biochemically (e.g., “protein kinase activity”)
- Biological Process — the larger process it participates in (e.g., “signal transduction”)
- Cellular Component — where in the cell it acts (e.g., “plasma membrane”)
GO annotations are assigned by manual curation, computational prediction, or experimental evidence, and are essential for large-scale genomic analyses. Protein function prediction from sequence and structure uses homology (GO transfer from annotated homologs), domain content (presence of specific Pfam domains implies specific functions), and machine learning approaches trained on known annotations.
Protein-Protein Interaction Networks
Cells are not bags of independent proteins — they are networks of interacting molecules. Protein-protein interaction (PPI) databases catalog these interactions:
- STRING — integrates experimental data, co-expression, text mining, and genomic context to score predicted interactions; widely used for network analysis
- BioGRID — curated from the experimental literature, covering physical and genetic interactions
- IntAct (EBI) — a curated molecular interaction database with detailed experimental evidence
Cytoscape is the standard software for visualizing and analyzing interaction networks. It can display networks, overlay expression data or other attributes on nodes, identify clusters (densely connected subnetworks), and run enrichment analyses to find overrepresented GO terms or pathways.
Domain-domain interaction prediction infers protein interactions from the known affinities of their constituent domains. If domain A is known to bind domain B, then any protein containing A may interact with any protein containing B.
Molecular Docking and Drug Targets
Molecular docking computationally predicts how a small molecule (drug candidate) fits into a protein’s binding site. Docking algorithms explore the orientations and conformations of the ligand within the binding pocket and score each pose by estimated binding energy.
Virtual screening applies docking to millions of compounds to identify potential drug candidates before any experimental testing — dramatically reducing the cost and time of early-stage drug discovery. Binding site prediction algorithms identify the most likely pockets on a protein surface for ligand binding, guiding docking efforts.
Protein kinase–substrate prediction uses sequence motifs, structural data, and machine learning to predict which proteins a given kinase will phosphorylate. Phosphoproteomics — the large-scale identification of phosphorylation sites by mass spectrometry — provides experimental validation and has revealed that tens of thousands of phosphorylation sites regulate protein activity in a typical eukaryotic cell.
Comparing the chemical properties of a substrate and a known inhibitor reveals what makes an effective drug mimic. Here we examine glucose (a typical enzyme substrate) and a transition-state analog inhibitor:
let glucose = "OC[C@H]1OC(O)[C@H](O)[C@@H](O)[C@@H]1O"
let inhibitor = "OC[C@H]1OC(=O)[C@H](O)[C@@H](O)[C@@H]1O"
let sub_props = Chem.properties(glucose)
let inh_props = Chem.properties(inhibitor)
print("Substrate (glucose):")
print(sub_props)
print("Transition-state analog inhibitor:")
print(inh_props)
We can quantify how structurally similar a drug candidate is to the natural substrate using the Tanimoto coefficient — a score between 0 (no similarity) and 1 (identical):
let glucose = "OC[C@H]1OC(O)[C@H](O)[C@@H](O)[C@@H]1O"
let inhibitor = "OC[C@H]1OC(=O)[C@H](O)[C@@H](O)[C@@H]1O"
let aspirin = "CC(=O)Oc1ccccc1C(O)=O"
let sim_inhibitor = Chem.tanimoto(glucose, inhibitor)
let sim_aspirin = Chem.tanimoto(glucose, aspirin)
print("Glucose vs transition-state analog: " + sim_inhibitor)
print("Glucose vs aspirin (unrelated): " + sim_aspirin)
Exercise: Substrate Specificity
A cytochrome P450 enzyme metabolizes both caffeine and theophylline. Compare their molecular properties and structural similarity to predict whether the same active site can accommodate both. Molecules with Tanimoto similarity above 0.7 are generally considered structurally related.
let caffeine = "Cn1c(=O)c2c(ncn2C)n(C)c1=O"
let theophylline = "Cn1c(=O)c2[nH]cnc2n(C)c1=O"
let caff_props = Chem.properties(caffeine)
let theo_props = Chem.properties(theophylline)
let similarity = Chem.tanimoto(caffeine, theophylline)
print("Caffeine:")
print(caff_props)
print("Theophylline:")
print(theo_props)
print("Tanimoto similarity: " + similarity)
print("Structurally similar substrates")
Exercise: Competitive vs Allosteric Inhibitor Comparison
Imatinib (Gleevec) is a competitive inhibitor that binds the ATP site of BCR-ABL kinase. GNF-2 is an allosteric inhibitor that binds a distant myristate pocket. Compare the chemical properties of these two drugs and the natural substrate (ATP) to understand why competitive inhibitors tend to resemble the substrate while allosteric inhibitors do not.
let atp = "c1nc(N)c2ncn(C3OC(COP(=O)(O)OP(=O)(O)OP(=O)(O)O)C(O)C3O)c2n1"
let imatinib = "Cc1ccc(NC(=O)c2ccc(CN3CCN(C)CC3)cc2)cc1Nc1nccc(-c2cccnc2)n1"
let gnf2 = "Oc1ccc(-c2cc3ccccc3[nH]2)cc1C(F)(F)F"
let sim_competitive = Chem.tanimoto(atp, imatinib)
let sim_allosteric = Chem.tanimoto(atp, gnf2)
let comp_props = Chem.properties(imatinib)
let allo_props = Chem.properties(gnf2)
print("Imatinib (competitive) properties:")
print(comp_props)
print("GNF-2 (allosteric) properties:")
print(allo_props)
print("Similarity to ATP — Imatinib: " + sim_competitive)
print("Similarity to ATP — GNF-2: " + sim_allosteric)
print("Competitive inhibitor resembles substrate more")
Enzyme Kinetics: Visualizing Inhibition
Enzyme activity varies with substrate concentration, following the classic Michaelis-Menten curve. Competitive inhibitors shift the apparent Km without affecting Vmax, while allosteric inhibitors reduce Vmax:
let kinetics_data = "[{\"substrate_uM\": 1, \"no_inhibitor\": 9.1, \"competitive\": 4.8, \"allosteric\": 5.5}, {\"substrate_uM\": 2, \"no_inhibitor\": 16.7, \"competitive\": 9.1, \"allosteric\": 10.0}, {\"substrate_uM\": 5, \"no_inhibitor\": 33.3, \"competitive\": 20.0, \"allosteric\": 20.0}, {\"substrate_uM\": 10, \"no_inhibitor\": 50.0, \"competitive\": 33.3, \"allosteric\": 30.0}, {\"substrate_uM\": 25, \"no_inhibitor\": 71.4, \"competitive\": 55.6, \"allosteric\": 42.9}, {\"substrate_uM\": 50, \"no_inhibitor\": 83.3, \"competitive\": 71.4, \"allosteric\": 50.0}, {\"substrate_uM\": 100, \"no_inhibitor\": 90.9, \"competitive\": 83.3, \"allosteric\": 55.6}]"
print("Michaelis-Menten curves: no inhibitor vs competitive vs allosteric")
let chart = Viz.scatter(kinetics_data)
print(chart)
Knowledge Check
Summary
In this lesson you covered how proteins function and the tools for studying them:
- All proteins function by binding to specific ligands at complementary binding sites; affinity is measured by Kd
- Surface conformation determines chemistry — the precise arrangement of the binding site matches the ligand
- Sequence conservation across protein families highlights critical binding residues
- Protein-protein interactions use large flat interfaces, modular domains, or coiled coils; antibodies exemplify exquisite specificity
- Enzymes are powerful catalysts that stabilize transition states using proximity, acid-base, covalent, and metal ion catalysis
- Cofactors (heme, B vitamins, metals) add chemical capabilities beyond amino acid side chains
- Multienzyme complexes channel intermediates for metabolic efficiency
- Allosteric regulation uses coupled binding sites; cooperative assemblies produce sigmoidal responses
- Phosphorylation by ~500 kinases (and removal by phosphatases) is the most pervasive eukaryotic regulatory mechanism; Src exemplifies multi-input protein logic
- GTP-binding proteins act as switches controlled by GEFs and GAPs; oncogenic Ras mutations drive ~30% of cancers
- Motor proteins (myosin, kinesin, dynein) convert ATP into mechanical movement; transporters pump molecules across membranes
- Protein machines and scaffolds organize complex cellular processes; interchangeable subunits multiply functional diversity
- Gene Ontology (GO) standardizes functional annotation across organisms
- PPI databases (STRING, BioGRID, IntAct) and Cytoscape map and visualize interaction networks
- Molecular docking and virtual screening predict drug-target interactions; phosphoproteomics maps the kinase signaling landscape
References
- Alberts B, Johnson A, Lewis J, Morgan D, Raff M, Roberts K, Walter P. Molecular Biology of the Cell, 7th ed. New York: W.W. Norton; 2022. Chapter 3: Proteins.
- Michaelis L, Menten ML. Die Kinetik der Invertinwirkung. Biochem Z. 1913;49:333–369.
- Koshland DE. Application of a theory of enzyme specificity to protein synthesis. Proc Natl Acad Sci USA. 1958;44(2):98–104.
- Trott O, Olson AJ. AutoDock Vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading. J Comput Chem. 2010;31(2):455–461.
- Olsen JV, Blagoev B, Gnad F, et al. Global, in vivo, and site-specific phosphorylation dynamics in signaling networks. Cell. 2006;127(3):635–648.
- Ashburner M, Ball CA, Blake JA, et al. Gene Ontology: tool for the unification of biology. Nat Genet. 2000;25(1):25–29.
- The UniProt Consortium. UniProt: the Universal Protein Knowledgebase in 2023. Nucleic Acids Res. 2023;51(D1):D523–D531. https://www.uniprot.org/
- Hornbeck PV, Zhang B, Murray B, et al. PhosphoSitePlus, 2014: mutations, PTMs and recalibrations. Nucleic Acids Res. 2015;43(D1):D512–D520.