Intracellular Organization and Protein Sorting
Discover how eukaryotic cells direct thousands of different proteins to the correct compartments — from nuclear pore complexes and mitochondrial import machines to the ER translocon, signal sequences, and the computational tools for predicting protein localization.
Introduction
A eukaryotic cell is not a homogeneous bag of enzymes. It is a system of membrane-enclosed compartments, each maintaining a distinct chemical environment and harboring a specific set of resident proteins. A human cell contains roughly 10 billion protein molecules distributed among the cytoplasm, nucleus, mitochondria, endoplasmic reticulum, Golgi apparatus, lysosomes, peroxisomes, and plasma membrane. Almost all of these proteins are synthesized on ribosomes in the cytosol, yet each must be delivered — accurately and efficiently — to the compartment where it functions.
This lesson examines the molecular mechanisms that direct proteins to their correct destinations: the signal sequences that act as molecular zip codes, the transport machinery at each organelle, and the quality-control systems that ensure accuracy. We also cover the computational tools — from signal peptide predictors to spatial proteomics — that allow researchers to predict and map protein localization across the cell.
12.1 The Compartmentalization of Cells
All Eukaryotic Cells Share the Same Basic Set of Organelles
Despite the enormous diversity of eukaryotic cell types — from yeast to neurons to plant guard cells — all share the same fundamental set of membrane-enclosed organelles:
| Compartment | Key functions | Approximate volume (%) |
|---|---|---|
| Cytosol | Protein synthesis, glycolysis, signaling | ~54% |
| Nucleus | DNA storage, transcription, RNA processing | ~6% |
| Endoplasmic reticulum (ER) | Protein folding, lipid synthesis, Ca²+ storage | ~12% |
| Golgi apparatus | Protein modification, sorting, secretion | ~3% |
| Mitochondria | ATP production (oxidative phosphorylation) | ~22% |
| Lysosomes/endosomes | Degradation of macromolecules | ~1% |
| Peroxisomes | Oxidative reactions, fatty acid β-oxidation | ~1% |
These percentages are approximate and vary by cell type. A hepatocyte (liver cell), for instance, has an exceptionally large ER to support its role in detoxification and secretory protein production, while a muscle cell is packed with mitochondria for energy production.
Evolutionary Origins of Organelles
The endosymbiont hypothesis explains the origin of mitochondria and chloroplasts. Mitochondria descended from an α-proteobacterium engulfed by an ancestral eukaryotic cell roughly 1.5–2 billion years ago. Chloroplasts similarly descended from an engulfed cyanobacterium. Evidence for this includes:
- Both organelles retain their own circular genomes and replicate by fission
- Both have double membranes (the inner membrane derives from the ancestral bacterium)
- Their ribosomes resemble bacterial 70S ribosomes more than eukaryotic 80S ribosomes
- Phylogenetic analysis of their rRNA genes places them firmly within bacterial lineages
Over evolutionary time, most genes originally encoded in these organellar genomes have been transferred to the nuclear genome. The human mitochondrial genome retains only 37 genes (13 proteins, 22 tRNAs, 2 rRNAs), while the ~1,500 proteins required for mitochondrial function are encoded in the nucleus, synthesized in the cytosol, and imported.
The endomembrane system (ER, Golgi, lysosomes, endosomes, nuclear envelope) likely arose through invagination of the ancestral plasma membrane. These organelles are topologically connected and exchange material via vesicular transport.
Proteins Move Between Compartments in Different Ways
There are three fundamentally different mechanisms by which proteins move from the cytosol to their destinations:
- Gated transport — proteins move through nuclear pore complexes between cytosol and nucleus; the pore acts as a selective gate
- Transmembrane transport — protein translocators thread unfolded polypeptides across a membrane (e.g., into mitochondria, ER, chloroplasts, peroxisomes)
- Vesicular transport — proteins enclosed in vesicles bud from one compartment and fuse with another (e.g., ER → Golgi → plasma membrane)
Signal Sequences Direct Proteins to the Correct Compartment
Most proteins carry a signal sequence — a short stretch of amino acids that acts as a molecular address label:
| Destination | Signal type | Characteristics |
|---|---|---|
| Nucleus | Nuclear localization signal (NLS) | Clusters of basic residues (Lys, Arg); internal |
| ER (secretory pathway) | N-terminal signal peptide | ~15–30 hydrophobic amino acids; cleaved |
| Mitochondrial matrix | N-terminal presequence | Amphipathic helix, positively charged; cleaved |
| Chloroplast stroma | Transit peptide | No common motif; cleaved |
| Peroxisome matrix | PTS1 (C-terminal) or PTS2 (N-terminal) | -SKL or variant; not cleaved (PTS1) |
Signal sequences are both necessary and sufficient for targeting: attaching a signal sequence to a cytosolic protein redirects it to the corresponding compartment. After arrival, most signal sequences are cleaved by specific peptidases, but some (notably the NLS) are retained.
Most Organelles Cannot Be Constructed De Novo
Mitochondria, chloroplasts, the ER, and the Golgi apparatus cannot be made from scratch. They grow and divide by incorporating new lipids and proteins into pre-existing structures, and they are inherited from one cell generation to the next. A cell that loses its mitochondria, for example, cannot regenerate them. This continuity of organelles adds an additional layer of information beyond the genome — a form of cytoplasmic inheritance.
Subcellular Localization Prediction
Predicting where a protein will localize within the cell is one of the most practically important tasks in bioinformatics. Several classes of tools address this problem:
Sequence-based localization predictors analyze the amino acid sequence for targeting signals and compositional features:
- DeepLoc 2.0 — a deep learning model that predicts localization to 10 compartments from sequence alone; achieves ~80% accuracy for single-location proteins
- WoLF PSORT — uses sorting signal features and amino acid composition to predict localization
- iLoc-Euk — handles multi-label prediction (proteins that localize to more than one compartment)
Signal peptide analysis:
- SignalP 6.0 — the gold standard for predicting N-terminal signal peptides and their cleavage sites; distinguishes Sec/SPI, Tat/SPI, Sec/SPII (lipoprotein) signals
Spatial proteomics experimentally maps protein localization at the proteome scale:
- LOPIT (Localization of Organelle Proteins by Isotope Tagging) and its high-resolution variant hyperLOPIT fractionate organelles by density-gradient centrifugation and use quantitative mass spectrometry to assign proteins to compartments
- These approaches have mapped >6,000 proteins to subcellular locations in a single experiment
Organelle-specific databases catalog experimentally validated residents:
- MitoCarta (mitochondria), The Human Protein Atlas (multiple compartments with antibody-based imaging), PeroxisomeDB (peroxisomes)
let signal_peptide = "MWRLLLLAFALASSALA"
let mature_protein = "DVSGTVCLSALPPEATDTLNL"
let full_protein = signal_peptide + mature_protein
print("Signal peptide: " + signal_peptide)
print("Mature protein: " + mature_protein)
print("Full length: " + Seq.length(full_protein) + " aa")
print("Signal peptide length: " + Seq.length(signal_peptide) + " aa")
Signal peptides are typically 15–30 residues with a hydrophobic core rich in leucine, alanine, and valine. The example above shows a classic secretory signal peptide with the characteristic pattern of hydrophobic residues that would be recognized by the signal recognition particle (SRP).
12.2 Nuclear Transport
Nuclear Pore Complexes Perforate the Nuclear Envelope
The nuclear envelope consists of two concentric membranes (the inner and outer nuclear membranes) perforated by nuclear pore complexes (NPCs). Each NPC is an enormous protein assembly of ~120 MDa in vertebrates, built from approximately 30 different proteins called nucleoporins (Nups), present in multiples of 8 reflecting the pore’s 8-fold rotational symmetry. A typical mammalian nucleus contains ~3,000–5,000 NPCs.
The central channel of the NPC is filled with FG-repeat nucleoporins — disordered protein domains rich in phenylalanine-glycine (FG) repeats that form a selective hydrogel-like barrier. Small molecules and proteins below ~40 kDa can diffuse through passively, but larger molecules require active, signal-mediated transport.
Nuclear Localization Signals Direct Nuclear Proteins
Proteins destined for the nucleus carry a nuclear localization signal (NLS). The classical NLS, first identified in the SV40 large T antigen, is a short stretch of basic amino acids (lysine and arginine):
- Monopartite NLS — a single cluster of basic residues (e.g., PKKKRKV)
- Bipartite NLS — two clusters separated by a 10–12 amino acid spacer (e.g., KR-spacer-KKKK)
Unlike ER signal peptides, the NLS is not cleaved after import. It can be located anywhere in the polypeptide chain — it functions as a surface-exposed patch on the folded protein.
Nuclear Import Receptors Bind NLS and NPC Proteins
Importin α acts as an adaptor that recognizes the NLS on the cargo protein. Importin β binds importin α and mediates interaction with the FG-repeat nucleoporins, allowing the cargo-receptor complex to traverse the NPC. The entire complex moves through the pore in a process that takes only a few milliseconds.
Nuclear Export Works Like Import in Reverse
Nuclear export signals (NES) are leucine-rich sequences that are recognized by the export receptor exportin 1 (also called CRM1). The mechanism is symmetrical to import: exportin binds the NES-bearing cargo in the nucleus and releases it in the cytoplasm.
The Ran GTPase Drives Directional Transport
The directionality of nuclear transport is conferred by the small GTPase Ran, which exists in two states:
| Location | Ran state | Key effector |
|---|---|---|
| Nucleus | Ran-GTP (high concentration) | RanGEF (RCC1, chromatin-bound) |
| Cytoplasm | Ran-GDP (high concentration) | RanGAP (cytoplasmic) |
This steep gradient drives transport:
- Import: The cargo–importin complex enters the nucleus. Ran-GTP binds importin β, releasing the cargo. The Ran-GTP–importin complex is exported, and RanGAP in the cytoplasm hydrolyzes GTP, releasing importin for reuse.
- Export: Ran-GTP promotes binding of exportin to its cargo in the nucleus. After traversing the pore, RanGAP triggers GTP hydrolysis, releasing the cargo in the cytoplasm.
Transport Can Be Regulated
Nuclear transport is a powerful point of regulation. Cells control protein localization by:
- Masking or unmasking the NLS through phosphorylation or ligand binding (e.g., NF-κB is retained in the cytoplasm by IκB, which masks its NLS)
- Regulated nuclear export (e.g., cyclin B1 is exported during interphase and accumulates in the nucleus at mitosis)
Nuclear Proteomics
Computational tools for analyzing nuclear transport include:
- NLS prediction tools (NLStradamus, cNLS Mapper, NucPred) identify potential nuclear localization signals from sequence
- NES prediction (NetNES, LocNES) identifies leucine-rich nuclear export signals
- Nuclear pore complex structural analysis — cryo-EM and integrative modeling have produced near-atomic structures of the entire NPC, revealing the architecture of the transport channel
- Nuclear/cytoplasmic fractionation proteomics — biochemical separation of nuclear and cytoplasmic fractions followed by mass spectrometry quantifies the nuclear/cytoplasmic ratio of thousands of proteins simultaneously
12.3 Mitochondrial and Chloroplast Import
Signal Sequences and Protein Translocators
More than 99% of mitochondrial proteins are encoded by nuclear genes, synthesized in the cytosol, and imported post-translationally. The typical mitochondrial targeting presequence is a 15–40 residue N-terminal peptide that forms an amphipathic α-helix: one face is positively charged (rich in arginine, lysine) and the other is hydrophobic. This is fundamentally different from the hydrophobic ER signal peptide.
Two multisubunit translocase complexes carry out the import:
- TOM complex (Translocase of the Outer Membrane) — the entry gate; Tom20 and Tom22 recognize the presequence, and Tom40 forms the translocation channel
- TIM complexes (Translocase of the Inner Membrane) — TIM23 imports proteins into the matrix or inner membrane; TIM22 inserts carrier proteins into the inner membrane
Proteins Are Imported as Unfolded Polypeptides
Mitochondrial import requires proteins to be in an unfolded state. Cytosolic chaperones (Hsp70 and Hsp90 family members) keep the polypeptide unfolded and competent for translocation. The protein is threaded through the TOM and TIM channels as a linear chain — a folded protein cannot pass through the narrow channels (~20 Å diameter).
ATP and the Membrane Potential Drive Import
Two energy sources power translocation into the matrix:
- The membrane potential (Δψ) across the inner membrane (negative inside) drives the positively charged presequence electrophoretically into the matrix
- Mitochondrial Hsp70 (mtHsp70/mortalin) in the matrix binds the emerging polypeptide and pulls it through the channel in an ATP-dependent ratchet mechanism
After import, the matrix processing peptidase (MPP) cleaves the N-terminal presequence, and chaperonins (Hsp60/Hsp10) assist the protein in folding to its native conformation.
Porins in the Outer Membrane
The mitochondrial outer membrane contains porins (voltage-dependent anion channels, or VDACs) — β-barrel proteins that form large aqueous channels permeable to molecules up to ~5 kDa. The outer membrane is therefore freely permeable to small molecules and ions, but not to proteins. The inner membrane, by contrast, is highly impermeable, maintaining the electrochemical gradient essential for ATP synthesis.
Multiple Routes to the Inner Membrane and Intermembrane Space
Not all mitochondrial proteins go to the matrix. Proteins have multiple routes depending on their destination:
- Matrix proteins — pass through both TOM and TIM23 channels
- Inner membrane proteins — can use TIM23 with a stop-transfer hydrophobic segment, or TIM22 for multi-pass carrier proteins
- Intermembrane space (IMS) proteins — many use the MIA pathway (Mitochondrial Intermembrane space Assembly), which relies on disulfide bond formation to trap proteins in the IMS
- Outer membrane β-barrel proteins — pass through TOM and are inserted by the SAM complex (Sorting and Assembly Machinery)
Two Signal Sequences for Thylakoid Proteins
In chloroplasts, proteins destined for the thylakoid lumen carry a bipartite signal: an N-terminal chloroplast transit peptide that directs them across the double envelope membrane, followed by a thylakoid transfer signal that directs them across the thylakoid membrane. Each signal is cleaved sequentially as the protein passes each barrier.
Organelle Genome and Proteome Analysis
Mitochondrial genome analysis is a cornerstone of population genetics and forensic science. The human mitochondrial genome (16,569 bp) is maternally inherited, has a high mutation rate, and has been classified into haplogroups that trace maternal lineage across human populations.
MitoCarta 3.0 is a curated inventory of ~1,136 human genes encoding the mitochondrial proteome, compiled from mass spectrometry, GFP tagging, and machine learning. It is the reference for mitochondrial research.
Targeting peptide prediction:
- TargetP 2.0 — predicts mitochondrial targeting peptides, chloroplast transit peptides, and secretory signal peptides; uses deep learning
- MitoFates — specifically predicts mitochondrial presequences and the cleavage site of MPP
let mito_targeting = "MLSLRQSIRFFKPATRTLCSSRYLL"
let er_signal = "MALWMRLLPLLALLALWGPDPAAA"
print("Mitochondrial targeting peptide:")
print(Struct.protein_props(mito_targeting))
print("ER signal peptide:")
print(Struct.protein_props(er_signal))
The human mitochondrial genome has a notably low GC content (~44%) compared to nuclear genes encoding mitochondrial proteins, reflecting the distinct mutational pressures on the organellar genome.
12.4 Peroxisomes
Peroxisomes are single-membrane-bound organelles that carry out oxidative reactions using molecular oxygen. They contain enzymes that generate hydrogen peroxide (H&sub2;O&sub2;) as a by-product of oxidation reactions, and catalase, which decomposes H&sub2;O&sub2; to water and oxygen, preventing oxidative damage.
Key peroxisomal functions include:
- β-oxidation of very-long-chain fatty acids (chains >C22 are shortened in peroxisomes before mitochondrial oxidation can proceed)
- Synthesis of plasmalogens (ether phospholipids essential for myelin in the nervous system)
- Bile acid synthesis and amino acid catabolism
Peroxisomal matrix proteins are imported in a folded state — uniquely among organelles. The import receptor Pex5 recognizes the C-terminal peroxisomal targeting signal 1 (PTS1), typically the tripeptide -SKL (Ser-Lys-Leu) or a conservative variant. A less common N-terminal signal (PTS2) is recognized by Pex7. Mutations in peroxisomal import machinery (peroxins) cause severe diseases such as Zellweger syndrome, characterized by failure to form functional peroxisomes.
12.5 The Endoplasmic Reticulum
The ER Is Structurally and Functionally Diverse
The ER is the largest membrane-enclosed organelle, forming a continuous network of cisternae (flattened sacs) and tubules that extends throughout the cytoplasm. It is divided into functional domains:
- Rough ER — studded with ribosomes; the site of synthesis for secreted, membrane-bound, and ER-resident proteins
- Smooth ER — lacks ribosomes; specializes in lipid synthesis, steroid hormone production, detoxification (especially in liver cells), and Ca²+ storage
The ER is the entry point for the secretory pathway: proteins that will eventually reside in the ER, Golgi, lysosomes, plasma membrane, or extracellular space all begin their journey here.
Signal Sequences Were Discovered in ER Import
Günter Blobel’s landmark experiments in the 1970s established the signal hypothesis: a hydrophobic N-terminal signal peptide directs nascent polypeptides to the ER membrane. This discovery, which earned the 1999 Nobel Prize in Physiology or Medicine, demonstrated that the information for protein sorting is encoded in the protein’s own sequence.
A typical ER signal peptide has three domains:
- A short positively charged n-region
- A hydrophobic h-region (7–15 residues) — the core of the signal
- A polar c-region containing the signal peptidase cleavage site
SRP Directs the Ribosome to the ER Membrane
The process of co-translational translocation proceeds through a precisely choreographed series of steps:
- Translation begins on a free ribosome in the cytosol
- As the signal peptide emerges from the ribosome, the signal recognition particle (SRP) — a ribonucleoprotein complex — binds it and pauses translation
- The SRP-ribosome-mRNA complex docks on the SRP receptor on the ER membrane
- The ribosome is transferred to the Sec61 translocon — the protein-conducting channel
- SRP releases, translation resumes, and the growing polypeptide is threaded into the ER lumen
- Signal peptidase cleaves the signal peptide on the lumenal side
The Polypeptide Passes Through the Sec61 Translocon
The Sec61 complex (SecY in bacteria) forms an aqueous channel through the ER membrane. It is composed of three subunits (α, β, γ) with the α-subunit forming the hourglass-shaped pore. A short α-helical plug seals the channel when it is not in active use, maintaining the membrane’s permeability barrier.
The channel has a lateral gate that can open toward the lipid bilayer, allowing hydrophobic transmembrane segments to exit the channel and integrate into the membrane. This mechanism is critical for the insertion of membrane proteins.
Single-Pass and Multipass Transmembrane Proteins
Type I transmembrane proteins have an N-terminal signal peptide (cleaved) and an internal stop-transfer anchor sequence — a hydrophobic segment that halts translocation and anchors the protein in the membrane with the N-terminus in the lumen and the C-terminus in the cytosol.
Type II transmembrane proteins lack a cleavable signal peptide. Instead, an internal signal-anchor sequence serves as both the signal for ER targeting and the transmembrane anchor, with the N-terminus remaining in the cytosol.
Multipass transmembrane proteins (such as G protein-coupled receptors and ion channels) use alternating start-transfer and stop-transfer signals to thread multiple segments back and forth across the membrane. Each hydrophobic segment exits the Sec61 lateral gate into the lipid bilayer.
N-Linked Glycosylation in the ER
As proteins enter the ER lumen, many are modified by the addition of a preassembled 14-sugar oligosaccharide (Glc&sub3;Man&sub9;GlcNAc&sub2;) to asparagine residues in the consensus sequence Asn-X-Ser/Thr (where X is any amino acid except proline). This N-linked glycosylation is catalyzed by oligosaccharyltransferase (OST), which is associated with the Sec61 translocon and modifies the protein co-translationally.
N-linked glycans serve multiple functions:
- Assist protein folding by promoting hydrophilicity and constraining backbone conformations
- Serve as quality-control tags (see below)
- Protect proteins from proteolysis on the cell surface
- Mediate cell-cell recognition and signaling
Oligosaccharides as Folding Tags
The ER uses the glycan structure as a folding sensor. Two glucose residues are rapidly trimmed from the N-linked glycan by glucosidases I and II. The chaperone calnexin (membrane-bound) and its soluble counterpart calreticulin recognize monoglucosylated glycans — glycans with exactly one remaining glucose residue — and retain the protein for further folding attempts.
If the protein folds correctly, the final glucose is removed, and the protein exits the ER. If it remains misfolded, the enzyme UGGT (UDP-glucose glycoprotein glucosyltransferase) re-adds a glucose residue, returning the protein to the calnexin/calreticulin cycle. This constitutes a quality-control timer: proteins that cannot fold are eventually directed to degradation.
ER-Associated Degradation (ERAD)
Proteins that fail to fold after repeated cycles are targeted for ER-associated degradation (ERAD):
- The misfolded protein is recognized by ERAD lectins and chaperones
- It is retrotranslocated (exported) back across the ER membrane through a protein channel
- In the cytosol, it is ubiquitinated by ER-associated E3 ligases
- The ubiquitinated protein is extracted by the p97/VCP ATPase and degraded by the 26S proteasome
ERAD is essential for preventing the accumulation of toxic misfolded proteins in the ER.
The Unfolded Protein Response (UPR)
When misfolded proteins accumulate in the ER beyond the capacity of the normal quality-control machinery, the cell activates the unfolded protein response (UPR) — a signaling network with three branches:
| Sensor | Mechanism | Downstream effect |
|---|---|---|
| IRE1 | Endoribonuclease splices XBP1 mRNA | Increases ER chaperones and ERAD components |
| PERK | Kinase phosphorylates eIF2α | Reduces global translation to lower ER load |
| ATF6 | Transcription factor released by proteolysis | Activates chaperone and ERAD gene expression |
If the UPR cannot resolve the stress, the cell initiates apoptosis (programmed cell death) — preventing the secretion of misfolded, potentially toxic proteins.
GPI Anchors
Some proteins are attached to the outer leaflet of the plasma membrane by a glycosylphosphatidylinositol (GPI) anchor rather than a transmembrane domain. These proteins enter the ER with a standard signal peptide and contain a C-terminal GPI-attachment signal. In the ER, a transamidase cleaves the C-terminal signal and simultaneously attaches the preformed GPI anchor. GPI-anchored proteins are found in lipid rafts and include cell-surface enzymes (e.g., alkaline phosphatase), adhesion molecules, and complement-regulatory proteins.
Protein Processing and Secretory Pathway Prediction
Computational tools for analyzing ER-targeted proteins and the secretory pathway include:
Signal peptide and transmembrane topology prediction:
- SignalP 6.0 predicts signal peptides and cleavage sites with high accuracy
- TMHMM and DeepTMHMM predict transmembrane helices and topology (which segments are lumenal vs. cytoplasmic)
- Phobius combines signal peptide and transmembrane prediction in a single model
Glycosylation prediction:
- NetNGlyc and NetOGlyc predict N-linked and O-linked glycosylation sites
- N-glycosylation occurs at the Asn-X-Ser/Thr sequon, but not all sequons are glycosylated — prediction tools assess the local sequence context
GPI-anchor prediction:
- PredGPI and big-PI Predictor identify C-terminal GPI-attachment signals
ER stress and UPR gene signatures:
- Transcriptomic analysis of UPR target genes (BiP/GRP78, CHOP, XBP1s) serves as a biomarker for ER stress in disease states including cancer, diabetes, and neurodegeneration
Secretome prediction:
- SecretomeP and SignalP together predict the classical (signal peptide-dependent) and non-classical (leaderless) secreted proteins of an organism — important for identifying drug targets, biomarkers, and vaccine candidates
let fasta = ">sp|P01308|INS_HUMAN Insulin\nMALWMRLLPLLALLALWGPDPAAAFVNQHLCGSHLVEALYLVCGERGFFYTPKT\n>sp|P02768|ALBU_HUMAN Serum albumin\nMKWVTFISLLFLFSSAYSRGVFRRDAHKSEVAHRFKDLGEENFKALVLIAFAQYLQQC"
let records = IO.parse_fasta(fasta)
print("Secretory proteins with signal peptides:")
print(records)
Both insulin and serum albumin carry N-terminal signal peptides (the initial hydrophobic stretches) that direct them into the ER lumen for eventual secretion. Tools like SignalP would identify the cleavage sites separating the signal peptide from the mature protein.
let signal_peptide = "MALWMRLLPLLALLALWGPDPAAA"
let mature_protein = "FVNQHLCGSHLVEALYLVCGERGFFYTPKT"
print("Signal peptide properties (hydrophobic core):")
print(Struct.protein_props(signal_peptide))
print("Mature protein properties (soluble):")
print(Struct.protein_props(mature_protein))
Signal peptides have high hydrophobicity scores (positive GRAVY values) because they are rich in leucine, alanine, and valine — the hydrophobic residues that interact with the SRP and the Sec61 translocon channel. The mature protein, by contrast, is hydrophilic and soluble in the ER lumen.
Exercise: Identify Signal Types from Sequence Features
Examine these two protein N-terminal sequences. One is a mitochondrial targeting presequence (positively charged, amphipathic) and the other is an ER signal peptide (hydrophobic). Determine which is which by analyzing their amino acid composition.
let seq_a = "MWRLLLLAFALASSALA"
let seq_b = "MLSRARKQNKNRLSSRL"
print("Sequence A: " + seq_a)
print("Sequence B: " + seq_b)
print("A length: " + Seq.length(seq_a) + " residues")
print("B length: " + Seq.length(seq_b) + " residues")
let answer = "ER signal peptide"
print("Sequence A is an: " + answer)
print(answer)
Exercise: Analyze a Mitochondrial Gene
The human mitochondrial genome uses a slightly modified genetic code (e.g., UGA encodes tryptophan instead of stop). Analyze this mitochondrial DNA sequence for its GC content and translate it using the standard code to see how the result would differ.
let mito_seq = "ATGACCCCAATACGCAAAACTAACCCCCTAATAAAATTAATTAACCACTCATTCATCGAC"
let gc = Seq.gc_content(mito_seq)
print("Mitochondrial sequence GC content: " + gc)
let protein = Seq.translate(mito_seq)
print("Standard code translation: " + protein)
print("Length: " + Seq.length(mito_seq) + " bp")
print(gc)
Exercise: Parse Secretory Pathway Proteins
Parse the following FASTA records containing proteins from the secretory pathway and examine their properties.
let fasta = ">SP_protein Signal peptide bearing\nMKWVTFISLLFLFSSAYSRGVFRR\n>TM_protein Transmembrane protein\nMGAAAAILLVVLLGLCCLAGPATA\n>GPI_protein GPI-anchored protein\nMGSSAGLLLLLTLHCSLGAWPRSG"
let records = IO.parse_fasta(fasta)
print("Parsed secretory pathway proteins:")
print(records)
let count = records.length
print(count)
Knowledge Check
Summary
In this lesson you covered the mechanisms of intracellular protein sorting and the computational tools for analyzing protein localization:
- Eukaryotic cells are compartmentalized into organelles that maintain distinct chemical environments; mitochondria and chloroplasts originated as endosymbiotic bacteria
- Three transport mechanisms move proteins: gated transport (nuclear pores), transmembrane transport (translocases), and vesicular transport
- Signal sequences are molecular zip codes — short amino acid stretches that direct proteins to the ER, nucleus, mitochondria, chloroplasts, or peroxisomes
- Nuclear transport through NPCs uses importins/exportins and is driven by the Ran GTPase gradient (Ran-GTP in nucleus, Ran-GDP in cytoplasm)
- Mitochondrial import requires unfolded proteins with N-terminal presequences, the TOM/TIM translocases, the membrane potential, and mitochondrial Hsp70
- Peroxisomes import folded proteins via a C-terminal PTS1 signal (-SKL); mutations in peroxins cause Zellweger syndrome
- ER targeting uses the SRP to direct ribosome-nascent chain complexes to the Sec61 translocon for co-translational translocation
- Transmembrane protein topology is determined by start-transfer and stop-transfer signals interacting with the Sec61 lateral gate
- N-linked glycosylation at Asn-X-Ser/Thr serves in folding, quality control, and cell-surface recognition
- Calnexin/calreticulin use glycan trimming as a folding sensor; ERAD retrotranslocates and degrades persistently misfolded proteins
- The unfolded protein response (UPR) (IRE1, PERK, ATF6) increases ER folding capacity or triggers apoptosis when stress is unresolvable
- GPI anchors attach proteins to the outer leaflet of the plasma membrane
- Localization prediction tools — SignalP (signal peptides), TargetP (targeting peptides), DeepLoc (subcellular localization), TMHMM (transmembrane topology)
- Spatial proteomics (LOPIT/hyperLOPIT) and organelle databases (MitoCarta, Human Protein Atlas) experimentally map the subcellular proteome
- Secretome prediction, glycosylation site prediction, and GPI-anchor prediction extend bioinformatic analysis of the secretory pathway
References
- Alberts B, Johnson A, Lewis J, Morgan D, Raff M, Roberts K, Walter P. Molecular Biology of the Cell, 7th ed. New York: W.W. Norton; 2022. Chapter 12: Intracellular Compartments and Protein Sorting.
- Blobel G, Dobberstein B. Transfer of proteins across membranes. I. Presence of proteolytically processed and unprocessed nascent immunoglobulin light chains on membrane-bound ribosomes of murine myeloma. J Cell Biol. 1975;67(3):835–851.
- Almagro Armenteros JJ, Tsirigos KD, Sønderby CK, et al. SignalP 5.0 improves signal peptide predictions using deep neural networks. Nat Biotechnol. 2019;37(4):420–423.
- Thul PJ, Åkesson L, Wiking M, et al. A subcellular map of the human proteome. Science. 2017;356(6340):eaal3321. https://www.proteinatlas.org/
- Emanuelsson O, Brunak S, von Heijne G, Nielsen H. Locating proteins in the cell using TargetP, SignalP and related tools. Nat Protoc. 2007;2(4):953–971.
- Horton P, Park KJ, Obayashi T, et al. WoLF PSORT: protein localization predictor. Nucleic Acids Res. 2007;35(Web Server issue):W585–W587.
- Pierleoni A, Martelli PL, Casadio R. PredGPI: a GPI-anchor predictor. BMC Bioinformatics. 2008;9:392.
- Steentoft C, Vakhrushev SY, Joshi HJ, et al. Precision mapping of the human O-GalNAc glycoproteome through SimpleCell technology. EMBO J. 2013;32(10):1478–1488.