From RNA to Protein: Translation

Intermediate Molecular Biology ~40 min

← Previous Next →

Understand how ribosomes decode mRNA into proteins — from tRNA adaptors and the genetic code to polyribosomes, protein folding, quality control, and the computational tools for proteomics and translation analysis.

Introduction

Translation — the decoding of mRNA into protein — is the final step in the flow of genetic information from DNA to functional product. The machinery of translation is extraordinarily complex: it requires the coordinated action of more than 200 different molecules, including messenger RNA, transfer RNAs, the ribosome (itself composed of ~80 proteins and several RNA molecules), and numerous accessory factors. Yet this machinery operates with remarkable speed and accuracy, producing proteins at a rate of ~20 amino acids per second in bacteria, with an error rate of only about 1 in 10,000 amino acids.

This lesson examines how mRNA sequences are decoded, the role of tRNAs as molecular adaptors, the mechanics of ribosome function, the quality-control systems that ensure protein integrity, and the computational tools — from codon usage analysis to ribosome profiling and mass spectrometry-based proteomics — that allow us to study translation and the proteome.

An mRNA Sequence Is Decoded in Sets of Three Nucleotides

The mRNA is read in codons — non-overlapping triplets of nucleotides. Each codon specifies one amino acid (or a stop signal). With 4 bases taken 3 at a time, there are 64 possible codons:

61 sense codons specify the 20 standard amino acids
3 stop codons (UAA, UAG, UGA) signal the end of translation
AUG is the near-universal start codon, encoding methionine

The genetic code is degenerate (or redundant): most amino acids are specified by more than one codon. For example, leucine has six codons (UUA, UUG, CUU, CUC, CUA, CUG) and serine also has six. Only methionine (AUG) and tryptophan (UGG) have a single codon each.

The code is also unambiguous (each codon specifies exactly one amino acid), non-overlapping (codons are read sequentially with no shared bases), and comma-free (there are no spacers between codons). The reading frame is established by the start codon and maintained throughout translation.

let gene = "ATGGCTAGCAAAGACTTCACCGAGTACCTGCAGAACCTGATCGGCAAATGA"
let protein = Seq.translate(gene)
print("DNA:     " + gene)
print("Protein: " + protein)
print("Codons:  " + Seq.length(gene) + " bp → " + Seq.length(protein) + " aa")
let props = Struct.protein_props(protein)
print("Protein properties:")
print(props)

Reading Frames

Because codons are triplets read without gaps, the reading frame — determined by the starting position — critically affects which amino acids are produced. A single DNA sequence has three possible forward reading frames and three reverse reading frames (on the complementary strand), for a total of six reading frames:

let dna = "AATGGCTAGCAAAGACTGA"
let frame1 = Seq.translate("AATGGCTAGCAAAGACTGA")
let frame2 = Seq.translate("ATGGCTAGCAAAGACTGA")
let frame3 = Seq.translate("TGGCTAGCAAAGACTGA")
print("Frame 1: " + frame1)
print("Frame 2: " + frame2)
print("Frame 3: " + frame3)

A shift of just one or two nucleotides produces a completely different protein sequence. Frameshift mutations (insertions or deletions that are not multiples of 3) are therefore almost always devastating, because they alter every codon downstream of the mutation.

tRNA Molecules Match Amino Acids to Codons in mRNA

Transfer RNA (tRNA) molecules are the adaptors that bridge the nucleic acid language of mRNA and the amino acid language of proteins. Each tRNA is a small RNA molecule (~76–90 nucleotides) that folds into a characteristic L-shaped three-dimensional structure (visualized as a cloverleaf in two dimensions).

Each tRNA has two critical features:

An anticodon (three nucleotides) that base-pairs with a complementary mRNA codon
A 3′ acceptor stem where the corresponding amino acid is covalently attached (via an ester bond to the 3′-terminal adenine)

The degeneracy of the genetic code is partly explained by wobble base pairing at the third codon position, where the first base of the anticodon can form non-standard pairs (e.g., G-U, inosine-U/C/A). This reduces the number of different tRNAs a cell needs — about 45 tRNAs in human cells can decode all 61 sense codons.

tRNAs Are Covalently Modified

tRNAs undergo extensive post-transcriptional modification before they become functional. More than 100 different chemical modifications have been identified in tRNAs across all domains of life, including methylation, pseudouridylation, deamination, and thiolation. These modifications affect tRNA folding, stability, and codon recognition.

Particularly important is the modification of adenosine to inosine at the wobble position of certain anticodons, catalyzed by adenosine deaminases. Inosine can pair with U, C, or A, expanding the decoding capacity of a single tRNA.

Aminoacyl-tRNA Synthetases Ensure Accuracy

The correct pairing of amino acids to tRNAs is achieved by aminoacyl-tRNA synthetases — a family of 20 enzymes, one for each amino acid. Each synthetase must accomplish two demanding tasks: select the correct amino acid (from a pool of chemically similar molecules) and attach it to the correct tRNA(s).

The charging reaction proceeds in two steps:

The amino acid is activated by reaction with ATP, forming an aminoacyl-AMP intermediate
The activated amino acid is transferred to the 3′ end of the appropriate tRNA

Some synthetases possess an editing site (a separate catalytic pocket) that hydrolyzes incorrectly charged tRNAs. For example, isoleucyl-tRNA synthetase must distinguish isoleucine from the structurally very similar valine — its editing site hydrolyzes Val-tRNA^Ile, correcting mistakes. This double-sieve mechanism (selection in the synthetic site, proofreading in the editing site) achieves an error rate of approximately 1 in 40,000 — critical because the ribosome cannot check whether the correct amino acid is attached to a tRNA.

The Ribosome Is a Ribozyme

The ribosome is the molecular machine that synthesizes proteins. It is a large complex of rRNA and proteins, organized into two subunits:

	Bacteria	Eukaryotes
Small subunit	30S (16S rRNA + 21 proteins)	40S (18S rRNA + 33 proteins)
Large subunit	50S (23S + 5S rRNA + 31 proteins)	60S (28S + 5.8S + 5S rRNA + 47 proteins)
Complete ribosome	70S	80S

The ribosome has three tRNA-binding sites:

A site (aminoacyl): where incoming charged tRNAs bind
P site (peptidyl): where the growing polypeptide chain is held
E site (exit): where deacylated tRNAs leave the ribosome

The key catalytic activity of the ribosome — peptide bond formation — is carried out by the 23S rRNA (28S in eukaryotes) in the large subunit, not by any protein component. The ribosome is therefore a ribozyme (an RNA enzyme). The ribosomal proteins serve primarily structural roles, stabilizing the rRNA in its catalytically active conformation. This is powerful evidence for the RNA world hypothesis — the idea that RNA preceded proteins as the primary catalytic molecule in early life.

Nucleotide Sequences in mRNA Signal Where to Start Protein Synthesis

In bacteria, the start codon is identified by the Shine-Dalgarno sequence — a purine-rich sequence 5–10 nucleotides upstream of the AUG that base-pairs with the 3′ end of the 16S rRNA, positioning the ribosome correctly.

In eukaryotes, the small ribosomal subunit binds to the 5′ cap of the mRNA (aided by initiation factor eIF4E) and scans along the mRNA in the 5′→3′ direction until it encounters the first AUG in a favorable sequence context — the Kozak consensus (GCCA/GCCAUGG, where the AUG is the start codon). The surrounding nucleotides influence how efficiently the AUG is recognized.

This difference in initiation mechanism has practical consequences. Bacterial mRNAs can be polycistronic (encoding multiple proteins from a single mRNA, each with its own Shine-Dalgarno sequence), while eukaryotic mRNAs are almost always monocistronic (encoding a single protein), because scanning from the 5′ cap typically identifies only the first AUG.

Translation: Initiation, Elongation, and Termination

Initiation is the rate-limiting step and the primary point of translational regulation. In eukaryotes, it requires at least 12 initiation factors (eIFs) that coordinate cap binding, scanning, AUG recognition, and subunit joining.

Elongation proceeds through a three-step cycle, taking about 60 milliseconds per codon in bacteria:

Decoding: a charged tRNA (bound to elongation factor EF-Tu/eEF1A as a ternary complex with GTP) enters the A site; GTP hydrolysis occurs only if the codon-anticodon match is correct (kinetic proofreading)
Peptide bond formation: the peptidyl transferase center (23S rRNA) catalyzes transfer of the growing peptide chain from the P-site tRNA to the A-site amino acid
Translocation: elongation factor EF-G/eEF2 drives GTP-dependent movement of the ribosome one codon along the mRNA; the deacylated tRNA moves to the E site and the peptidyl-tRNA moves to the P site

Termination occurs when a stop codon (UAA, UAG, or UGA) enters the A site. No tRNA recognizes stop codons. Instead, release factors (RF1/RF2 in bacteria, eRF1 in eukaryotes) bind the A site, mimicking the shape of a tRNA, and trigger hydrolysis of the peptidyl-tRNA bond, releasing the completed polypeptide.

Proteins Are Made on Polyribosomes

Multiple ribosomes can translate the same mRNA simultaneously, forming a polyribosome (polysome). As one ribosome moves along the mRNA, another initiates behind it. A typical mRNA may carry 5–40 ribosomes actively translating at once, greatly increasing the rate of protein production. Polysome profiling (separation by sucrose gradient) and more recently ribosome profiling provide snapshots of translation activity.

There Are Minor Variations in the Standard Genetic Code

While the genetic code is nearly universal, a few variations exist:

Mitochondria use a slightly modified code (e.g., UGA encodes tryptophan instead of stop in human mitochondria; AGA and AGG are stop codons rather than arginine)
Mycoplasma species use UGA for tryptophan
Some organisms use UAG or UGA to encode the non-standard amino acids selenocysteine (the “21st amino acid,” incorporated via a special SECIS element in the mRNA) and pyrrolysine (the “22nd amino acid,” found in some methanogenic archaea)

These variations are rare but important when analyzing sequences from diverse organisms or organellar genomes.

Inhibitors of Prokaryotic Protein Synthesis Are Used as Antibiotics

Because bacterial ribosomes (70S) differ structurally from eukaryotic ribosomes (80S), drugs can selectively inhibit bacterial translation without affecting the host:

Antibiotic	Target	Mechanism
Tetracycline	30S subunit	Blocks tRNA binding to the A site
Chloramphenicol	50S subunit	Inhibits peptidyl transferase
Erythromycin	50S subunit	Blocks the exit tunnel for the growing polypeptide
Streptomycin	30S subunit	Causes misreading of the genetic code
Puromycin	A site	Mimics aminoacyl-tRNA, causing premature chain termination

Understanding translation is therefore essential for antibiotic development and for understanding antibiotic resistance mechanisms (ribosomal mutations, methylation of rRNA, efflux pumps).

Quality Control Operates at Many Stages of Translation

Cells employ multiple quality-control mechanisms to prevent the accumulation of aberrant proteins:

Nonsense-mediated mRNA decay (NMD) — destroys mRNAs with premature stop codons (typically detected by ribosomes encountering a stop codon upstream of an exon-exon junction)
Non-stop decay (NSD) — targets mRNAs lacking a stop codon (the ribosome reaches the poly-A tail)
No-go decay (NGD) — resolves ribosomes stalled on damaged or structured mRNA
Ribosome-associated quality control (RQC) — ubiquitinates and degrades incomplete polypeptides from stalled ribosomes

Molecular Chaperones Guide Protein Folding

The newly synthesized polypeptide must fold into its correct three-dimensional structure. Molecular chaperones assist this process:

Hsp70 family (DnaK in bacteria, BiP in the ER) — bind exposed hydrophobic patches on newly synthesized or unfolded proteins, preventing aggregation and giving the protein time to fold
Chaperonins (GroEL/GroES in bacteria, TRiC/CCT in eukaryotes) — barrel-shaped complexes that provide an enclosed chamber where a single protein can fold in isolation, shielded from aggregation
Hsp90 — assists the final maturation of signaling proteins, including kinases and transcription factors

Exposed hydrophobic regions are the critical signal for quality control: in a properly folded protein, hydrophobic residues are buried in the interior, so their exposure on the surface signals misfolding.

The Proteasome Degrades Misfolded and Regulated Proteins

Proteins that fail to fold correctly — or whose regulated destruction is required — are targeted for degradation by the ubiquitin-proteasome system:

Ubiquitin (a small 76-amino-acid protein) is covalently attached to the target protein through a cascade of enzymes: E1 (ubiquitin-activating enzyme), E2 (ubiquitin-conjugating enzyme), and E3 (ubiquitin ligase, which provides substrate specificity). A chain of four or more ubiquitins signals degradation.
The 26S proteasome — a large barrel-shaped protease complex with sequestered active sites — recognizes the ubiquitin chain, unfolds the protein, feeds it into its central chamber, and degrades it into short peptides (8–10 amino acids).

This system also controls the levels of many regulatory proteins — cyclins that drive cell cycle progression, transcription factors like p53, and signaling molecules. Regulated protein destruction is as important as regulated protein synthesis for cellular control.

There Are Many Steps from DNA to Protein

The path from gene to active protein involves many regulated steps, each offering an opportunity for control:

DNA → pre-mRNA (transcription) → mature mRNA (processing) → exported mRNA (transport) → polypeptide (translation) → folded protein (chaperones) → modified protein (PTMs) → localized protein (targeting) → active protein (regulation) → degraded protein (proteasome)

The abundance of any protein in the cell reflects the balance of its rates of synthesis and degradation, both of which are regulated.

let cytoplasmic_gene = "ATGGCTAGCAAAGACTTCACCGAGTACCTGCAGAACCTGATCGGCAAAGCCTTCGACTTTAAACAGATCGAAAACGCCCTGGAATGA"
let secreted_gene = "ATGAAACTGCTGCTGCTGGCCCTGGTGCTGGCTTACGCTAGCAAAGACTTCACCGAGTGA"
let cyto_prot = Seq.translate(cytoplasmic_gene)
let sec_prot = Seq.translate(secreted_gene)
print("Cytoplasmic protein properties:")
print(Struct.protein_props(cyto_prot))
print("Secreted protein properties:")
print(Struct.protein_props(sec_prot))

Secreted proteins typically begin with a hydrophobic signal peptide (15-30 amino acids) that targets them to the ER. Comparing protein properties reveals differences in hydrophobicity and charge distribution between cytoplasmic and secreted proteins.

Codon Usage Bias Analysis and Codon Optimization

Although the genetic code is degenerate, organisms do not use all synonymous codons equally. This codon usage bias reflects the abundance of corresponding tRNAs: highly expressed genes preferentially use preferred codons that match abundant tRNAs, enabling faster and more accurate translation.

Codon optimization — redesigning a gene to use the host organism’s preferred codons without changing the protein sequence — is a standard technique in biotechnology for improving expression of heterologous proteins.

let human_style = "ATGGCTAGCAAAGACTTCACCGAGTACCTGCAGAACCTGATCGGCAAATGA"
let ecoli_style = "ATGGCGAGCAAAGATTTTACCGAATATCTGCAGAATCTGATTGGCAAATGA"
print("Human-optimized codons:")
print(Seq.codon_usage(human_style))
print("E. coli-optimized codons:")
print(Seq.codon_usage(ecoli_style))
let human_kmers = Seq.kmer_count(human_style, 3)
let ecoli_kmers = Seq.kmer_count(ecoli_style, 3)
let correlation = Stats.pearson(human_kmers, ecoli_kmers)
print("Codon usage correlation (Pearson r): " + correlation)

Both genes encode similar proteins, but the codon choices differ — reflecting adaptation to different tRNA pools in human cells versus E. coli. The Pearson correlation of their codon (3-mer) usage profiles quantifies this divergence: a value well below 1.0 indicates significant codon bias between organisms.

let gene = "ATGGCTAGCAAAGACTTCACCGAGTACCTGCAGAACCTGATCGGCAAATGA"
let codons = Seq.codon_usage(gene)
let labels = '["ATG", "GCT", "AGC", "AAA", "GAC", "TTC", "ACC", "GAG", "TAC", "CTG", "CAG", "AAC", "ATC", "GGC", "TGA"]'
let values = '[1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1]'
print("Codon usage distribution:")
print(Viz.bar(labels, values))

Visualizing codon frequencies reveals which codons dominate a gene’s sequence. In highly expressed genes, preferred codons matching abundant tRNAs are overrepresented — a pattern that codon optimization exploits to boost heterologous protein production.

Ribosome Profiling (Ribo-seq)

Ribosome profiling (Ribo-seq) is a powerful technique that captures a snapshot of all ribosomes actively translating mRNAs in a cell. Ribosomes are frozen on their mRNAs (by cycloheximide or flash-freezing), the mRNA not protected by ribosomes is digested by nuclease, and the ~28-nucleotide ribosome footprints are sequenced.

Ribo-seq reveals:

Which mRNAs are being translated (and how efficiently)
The precise positions of ribosomes on each mRNA (codon-level resolution)
Translation initiation sites (including non-AUG starts and upstream open reading frames)
Ribosome pausing and stalling sites
The translational response to stimuli (e.g., stress, drugs)

Mass Spectrometry-Based Proteomics

Proteomics directly measures the protein complement of cells. The dominant approach is shotgun proteomics:

Proteins are extracted and digested into peptides (typically with trypsin)
Peptides are separated by liquid chromatography (LC)
Peptides are ionized and their mass-to-charge ratios measured by mass spectrometry (MS)
Fragmentation spectra (MS/MS or MS2) are matched to protein databases to identify proteins

Quantitative proteomics approaches include:

Label-free quantification (LFQ) — compares peptide intensities between runs
TMT/iTRAQ — isobaric chemical tags for multiplexed comparison of up to 18 samples
SILAC — metabolic labeling with heavy amino acids for precise quantification

Analysis pipelines such as MaxQuant and Proteome Discoverer handle peptide identification, protein inference, and quantification.

Post-Translational Modification Prediction and Analysis

Proteins are extensively modified after translation. Post-translational modifications (PTMs) expand the functional repertoire far beyond what the genetic code encodes:

Phosphorylation (kinases/phosphatases) — the most common regulatory modification
Ubiquitination — marks proteins for degradation or serves signaling roles
Acetylation — modifies lysines on histones and other proteins
Glycosylation — adds sugar chains, critical for secreted and membrane proteins
Methylation, sumoylation, lipidation, and many others

Computational prediction tools (NetPhos for phosphorylation, GPS for kinase-specific sites, NetOGlyc/NetNGlyc for glycosylation) predict likely modification sites from protein sequence. Experimental detection relies on enrichment strategies (e.g., phosphopeptide enrichment with TiO2 or IMAC) combined with mass spectrometry.

Protein Localization Prediction

The destination of a protein is often encoded in its sequence:

SignalP — predicts N-terminal signal peptides that target proteins to the secretory pathway (ER)
TargetP — predicts targeting to the mitochondria, chloroplast, or secretory pathway
DeepLoc — uses deep learning to predict subcellular localization from sequence alone, classifying proteins into 10 compartments

Signal peptides are typically 15–30 amino acids long, hydrophobic, and cleaved after translocation into the ER. Nuclear localization signals (NLS), mitochondrial targeting sequences, and peroxisomal targeting signals direct proteins to other compartments.

Exercise: Protein Property Prediction from Sequence

Translate the following gene and analyze the resulting protein’s physical and chemical properties. Use the properties to predict whether this protein is likely soluble in the cytoplasm.

let gene = "ATGGCTAGCAAAGACTTCACCGAGTACCTGCAGAACCTGATCGGCAAAGCCTTCGACTTTAAACAGATCGAAAACGCCCTGGAATGA"
let protein = Seq.translate(gene)
print("Protein: " + protein)
print("Properties:")
print(Struct.protein_props(protein))
print(protein)

Exercise: Codon Bias Analysis

Compare the codon usage of two genes encoding similar proteins but optimized for different organisms. Compute the Pearson correlation of their 3-mer profiles to quantify the divergence in codon preference.

let human_gene = "ATGGCTAGCAAAGACTTCACCGAGTACCTGCAGAACTGA"
let ecoli_gene = "ATGGCGAGCAAAGATTTTACCGAATATCTGCAGAATTGA"
print("Human protein: " + Seq.translate(human_gene))
print("E. coli protein: " + Seq.translate(ecoli_gene))
let h_kmers = Seq.kmer_count(human_gene, 3)
let e_kmers = Seq.kmer_count(ecoli_gene, 3)
let r = Stats.pearson(h_kmers, e_kmers)
print("Pearson correlation of 3-mer usage: " + r)
let answer = "different"
print(answer)

Exercise: Signal Sequence Identification

Translate two genes and compare their protein properties. One encodes a cytoplasmic protein; the other encodes a secreted protein with an N-terminal signal peptide (hydrophobic leader sequence). Identify which is the secreted protein based on its properties.

let gene_a = "ATGGCTAGCAAAGACTTCACCGAGGAAAACCTGATCGAAAACTGA"
let gene_b = "ATGAAACTGCTGCTGCTGGCCCTGGTGCTGGCTTACGCTAGCTGA"
let prot_a = Seq.translate(gene_a)
let prot_b = Seq.translate(gene_b)
print("Protein A: " + prot_a)
print(Struct.protein_props(prot_a))
print("Protein B: " + prot_b)
print(Struct.protein_props(prot_b))
let answer = "Gene B"
print(answer)

Knowledge Check

Summary

In this lesson you covered translation and proteomics in depth:

The mRNA is read in non-overlapping codons of three nucleotides; the reading frame is set by the start codon AUG
The genetic code has 61 sense codons and 3 stop codons; it is degenerate, unambiguous, and nearly universal
tRNAs are molecular adaptors with an anticodon and an attached amino acid; wobble base pairing reduces the number of tRNAs needed
Aminoacyl-tRNA synthetases (20 enzymes) charge tRNAs with extraordinary accuracy using a double-sieve mechanism
The ribosome is a ribozyme — peptide bond formation is catalyzed by rRNA, not protein
Bacterial (Shine-Dalgarno) and eukaryotic (cap-scanning, Kozak) initiation differ fundamentally
Translation proceeds through initiation (rate-limiting), elongation (~60 ms/codon), and termination (release factors)
Polyribosomes increase protein production rate; 5–40 ribosomes can translate one mRNA simultaneously
Minor code variations exist in mitochondria and some organisms; selenocysteine and pyrrolysine are the 21st and 22nd amino acids
Many antibiotics target bacterial ribosomes (tetracycline, chloramphenicol, erythromycin, streptomycin)
Quality-control systems (NMD, NSD, NGD, RQC) eliminate aberrant mRNAs and proteins
Molecular chaperones (Hsp70, chaperonins, Hsp90) assist protein folding; exposed hydrophobic regions signal misfolding
The ubiquitin-proteasome system degrades misfolded proteins and regulates protein levels
Codon usage bias reflects tRNA abundance; codon optimization improves heterologous expression
Ribosome profiling (Ribo-seq) maps ribosome positions at codon resolution
Shotgun proteomics (LC-MS/MS) identifies and quantifies proteins; tools include MaxQuant and Proteome Discoverer
PTM prediction and protein localization prediction (SignalP, TargetP, DeepLoc) extend sequence analysis beyond the primary structure

References

Alberts B, Johnson A, Lewis J, Morgan D, Raff M, Roberts K, Walter P. Molecular Biology of the Cell, 7th ed. New York: W.W. Norton; 2022. Chapter 6: How Cells Read the Genome: From DNA to Protein.
Ramakrishnan V. Ribosome structure and the mechanism of translation. Cell. 2002;108(4):557–572.
Ban N, Nissen P, Hansen J, Moore PB, Steitz TA. The complete atomic structure of the large ribosomal subunit at 2.4 Å resolution. Science. 2000;289(5481):905–920.
Ingolia NT, Ghaemmaghami S, Newman JRS, Weissman JS. Genome-wide analysis in vivo of translation with nucleotide resolution using ribosome profiling. Science. 2009;324(5924):218–223.
Almagro Armenteros JJ, Tsirigos KD, Sønderby CK, et al. SignalP 5.0 improves signal peptide predictions using deep neural networks. Nat Biotechnol. 2019;37(4):420–423.
Thul PJ, Åkesson L, Wiking M, et al. A subcellular map of the human proteome. Science. 2017;356(6340):eaal3321. https://www.proteinatlas.org/
Hornbeck PV, Zhang B, Murray B, et al. PhosphoSitePlus, 2014: mutations, PTMs and recalibrations. Nucleic Acids Res. 2015;43(D1):D512–D520.
Blom N, Gammeltoft S, Brunak S. Sequence and structure-based prediction of eukaryotic protein phosphorylation sites. J Mol Biol. 1999;294(5):1351–1362.

Powered by

cyanea-seq cyanea-struct cyanea-stats

translation ribosome genetic code tRNA aminoacyl-tRNA synthetase protein synthesis codons start codon stop codon polyribosome protein folding chaperones proteasome ubiquitin antibiotics codon usage Ribo-seq proteomics mass spectrometry post-translational modification protein localization