Skip to main content
Alpha Cyanea is in public alpha. We're building in the open — expect rough edges and rapid iteration. See what's live

From DNA to RNA: Transcription

Intermediate Molecular Biology ~40 min

Learn how RNA polymerase reads DNA to produce RNA, how eukaryotic pre-mRNAs are capped, spliced, and polyadenylated, and the computational methods for analyzing transcriptomes at single-cell and spatial resolution.

Introduction

The central dogma of molecular biology describes the flow of genetic information from DNA to RNA to protein. Transcription — the copying of DNA into RNA — is the first step in gene expression and the point at which most genes are switched on or off. Understanding transcription and the elaborate processing of RNA in eukaryotic cells is fundamental to understanding how cells work, how they differentiate, and how gene expression goes wrong in disease.

This lesson examines how portions of DNA are transcribed into RNA, the machinery that accomplishes this, the extensive processing that eukaryotic mRNAs undergo before translation, and the computational tools — from bulk RNA-seq to single-cell and spatial transcriptomics — that allow us to measure and analyze transcription at genome scale.

Portions of DNA Sequence Are Transcribed into RNA

Only a fraction of the genome is transcribed at any given time, and which portions are transcribed differs between cell types. A liver cell and a neuron contain the same DNA, but they transcribe different sets of genes. Transcription is therefore the primary mechanism by which cells become specialized.

RNA polymerase copies one strand of the DNA double helix — the template strand (also called the antisense strand) — synthesizing an RNA molecule that is complementary to the template and identical in sequence to the other strand, the coding strand (also called the sense or non-template strand), except that RNA contains uracil (U) where DNA has thymine (T).

let coding_strand = "ATGGCTAGCAAAGACTTCACCGAGTGA"
let mrna = Seq.transcribe(coding_strand)
print("Coding strand (5'→3'): " + coding_strand)
print("mRNA (5'→3'):          " + mrna)

By convention, gene sequences in databases are written as the coding strand in the 5′→3′ direction. The mRNA sequence matches this strand (with U for T), because both the mRNA and the coding strand are complementary to the template strand.

Transcription Produces RNA Complementary to One Strand of DNA

RNA synthesis, like DNA synthesis, proceeds in the 5′→3′ direction. RNA polymerase reads the template strand 3′→5′ while building the new RNA chain 5′→3′. Unlike DNA polymerase, RNA polymerase does not require a primer — it can initiate a new chain de novo.

The substrates for RNA polymerase are the four ribonucleoside triphosphates: ATP, GTP, CTP, and UTP. As each nucleotide is added, pyrophosphate is released and hydrolyzed, providing the energy that drives the reaction forward. The error rate of transcription (~10−4;, or 1 error per 10,000 nucleotides) is much higher than that of DNA replication (~10−9;), but this is tolerable because many RNA copies are made from each gene, and RNA molecules are short-lived.

Cells Produce Several Types of RNA

Cells produce a remarkable diversity of RNA molecules, each with a different role:

RNA typeAbbreviationFunction
Messenger RNAmRNACarries protein-coding information to the ribosome
Ribosomal RNArRNAStructural and catalytic component of ribosomes
Transfer RNAtRNAAdaptor that matches amino acids to codons during translation
Small nuclear RNAsnRNAComponent of the spliceosome; guides intron removal
Small nucleolar RNAsnoRNAGuides chemical modification of rRNA and other RNAs
MicroRNAmiRNARegulates gene expression by targeting mRNAs for degradation or translational repression
Long non-coding RNAlncRNADiverse regulatory roles in chromatin organization, transcription, and splicing

Although mRNA receives the most attention, it constitutes only about 3–5% of total cellular RNA. The vast majority is rRNA (~80%) and tRNA (~15%), reflecting the enormous demand for protein synthesis machinery.

Signals Encoded in DNA Tell RNA Polymerase Where to Start and Stop

Transcription does not begin at random positions along the genome. Promoter sequences upstream of a gene define where RNA polymerase binds and initiates transcription. Terminator sequences signal where transcription stops.

In bacteria, the promoter contains two conserved elements: the −10 element (TATAAT, also called the Pribnow box) and the −35 element (TTGACA). These are recognized by the sigma (σ) factor, a dissociable subunit that directs the RNA polymerase holoenzyme to the promoter. Different sigma factors recognize different promoter sequences, allowing bacteria to rapidly switch which genes are transcribed — for example, during heat shock or sporulation.

Transcription start and stop signals are heterogeneous in nucleotide sequence — they show considerable variation around the consensus. This variation is functionally important: promoters that match the consensus more closely are generally stronger (drive more transcription) than those with more mismatches.

In bacteria, transcription terminates either by intrinsic termination (a hairpin followed by a poly-U stretch in the RNA) or by Rho-dependent termination (the Rho protein uses ATP to translocate along the RNA and dissociate the polymerase). In eukaryotes, termination is more complex and is linked to 3′ end processing.

RNA Polymerase II Requires General Transcription Factors

Eukaryotes have three RNA polymerases, each dedicated to different RNA types:

PolymeraseProducts
RNA Pol ILarge ribosomal RNAs (28S, 18S, 5.8S)
RNA Pol IImRNAs, most snRNAs, miRNAs, lncRNAs
RNA Pol IIItRNAs, 5S rRNA, other small RNAs

RNA polymerase II (Pol II) transcribes all protein-coding genes and is the most intensively studied. Unlike bacterial RNA polymerase, Pol II cannot recognize promoters on its own. It requires a set of general transcription factors (GTFs) — TFIIA, TFIIB, TFIID, TFIIE, TFIIF, and TFIIH — that assemble at the promoter in a specific order to form the preinitiation complex.

TFIID binds first, recognizing the TATA box (~30 bp upstream of the transcription start site) through its TBP (TATA-binding protein) subunit. Not all promoters contain a TATA box — many housekeeping genes use CpG island promoters that lack this element. Additional promoter elements include the Inr (initiator, spanning the start site) and the DPE (downstream promoter element).

Pol II Also Requires Activator, Mediator, and Chromatin-Modifying Proteins

In living cells, the general transcription factors and Pol II alone are usually not sufficient for transcription. Activator proteins (transcription factors) bind to regulatory sequences called enhancers, which can be located thousands of base pairs away from the promoter. Activators recruit the Mediator complex — a large (~30-subunit) coactivator that serves as a bridge between activators and the Pol II machinery.

Transcription in eukaryotes also requires chromatin-remodeling complexes (SWI/SNF, ISWI families) that move or eject nucleosomes to expose the promoter DNA, and histone-modifying enzymes (acetyltransferases, methyltransferases) that add marks associated with active transcription (e.g., H3K4me3 at promoters, H3K36me3 in gene bodies).

Transcription Elongation Produces Superhelical Tension

As RNA polymerase moves along the DNA, it generates superhelical tension — positive supercoils ahead of the polymerase and negative supercoils behind it. This is because the RNA polymerase tracks the helical groove of the DNA, effectively twisting the double helix as it advances. Topoisomerases relieve this tension: Topoisomerase I relaxes both positive and negative supercoils, while Topoisomerase II can remove supercoils and disentangle DNA.

In genes that are very actively transcribed, the torsional stress can become extreme. Failure to resolve supercoiling impairs elongation and can cause DNA damage.

Transcription Elongation Is Tightly Coupled to RNA Processing

A distinguishing feature of eukaryotic gene expression is that transcription and RNA processing are coupled — the pre-mRNA is processed while it is still being synthesized. The C-terminal domain (CTD) of the largest Pol II subunit plays a critical role: it consists of multiple repeats of the heptapeptide sequence YSPTSPS, and different phosphorylation patterns on the CTD recruit different processing factors at different stages of transcription.

This coupling ensures that processing occurs efficiently and in the correct order: 5′ capping occurs first (when the transcript is only ~20–30 nucleotides long), then splicing begins (while transcription continues), and finally 3′ cleavage and polyadenylation terminate the process.

RNA Capping Is the First Modification

The 5′ cap is added when the nascent RNA is only about 20–30 nucleotides long. Capping enzymes add a 7-methylguanosine residue linked to the RNA by an unusual 5′–5′ triphosphate bridge. This cap:

  • Protects the mRNA from degradation by 5′ exonucleases
  • Is recognized by the nuclear cap-binding complex (CBC), which aids nuclear export
  • Is recognized by eIF4E during translation initiation, promoting ribosome recruitment
  • Helps distinguish mRNA from other RNA species

RNA Splicing Removes Intron Sequences

Most eukaryotic genes contain introns — non-coding sequences that interrupt the coding region. The average human gene has ~8 introns, and some genes have far more (the titin gene has 363 exons). Introns are removed from the pre-mRNA by RNA splicing, and the flanking exons are joined together to produce the mature mRNA.

Splice sites are defined by conserved sequences at the boundaries between introns and exons:

  • The 5′ splice site: GU (almost invariant) at the start of the intron
  • The 3′ splice site: AG (almost invariant) at the end of the intron
  • The branch point: an adenine residue 18–40 nt upstream of the 3′ splice site
  • The polypyrimidine tract: a stretch of pyrimidines between the branch point and the 3′ splice site

The Spliceosome Performs RNA Splicing

The spliceosome is one of the largest and most complex molecular machines in the cell, composed of five small nuclear RNAs (U1, U2, U4, U5, U6 snRNAs) and more than 200 associated proteins. The snRNAs, packaged with proteins as snRNPs (small nuclear ribonucleoprotein particles, pronounced “snurps”), recognize the splice sites through RNA-RNA base pairing.

The splicing reaction involves two sequential transesterification reactions that proceed through a lariat intermediate:

  1. The 2′-OH of the branch-point adenine attacks the 5′ splice site, cutting the RNA and forming the lariat
  2. The free 3′-OH of the upstream exon attacks the 3′ splice site, joining the exons and releasing the lariat intron

The spliceosome requires ATP hydrolysis to drive the complex series of RNA–RNA rearrangements that bring the splice sites together and ensure accuracy. Multiple DExD/H-box RNA helicases participate in these rearrangements.

The catalytic core of the spliceosome is RNA-based — the splicing reaction is catalyzed by the snRNAs, not the protein components. This supports the idea that spliceosomal splicing evolved from self-splicing ribozymes (group II introns), which catalyze the same two-step transesterification mechanism and are found in bacterial and organellar genomes.

Chromatin Structure Affects RNA Splicing

The rate of Pol II elongation and the local chromatin environment influence splice site selection. When Pol II pauses (for example, at a nucleosome), the spliceosome has more time to recognize a weak splice site, promoting inclusion of the upstream exon. Histone modifications can also recruit splicing factors: for example, H3K36me3 in gene bodies recruits the MRG15 protein, which influences splice site choice.

Alternative Splicing Creates Protein Diversity

A single gene can produce multiple mRNA variants — and therefore multiple protein isoforms — through alternative splicing. The main types of alternative splicing include:

  • Exon skipping — an exon is included in some mRNAs and excluded from others
  • Alternative 5′ or 3′ splice sites — the boundary of an exon shifts
  • Intron retention — an intron is retained in the mature mRNA
  • Mutually exclusive exons — one of two or more exons is always included, but never both

Over 95% of human multi-exon genes undergo alternative splicing. This is a major source of protein diversity, allowing ~20,000 genes to produce an estimated 80,000–100,000 distinct protein isoforms. The Drosophila DSCAM gene holds the record: through alternative splicing, it can potentially produce 38,016 different mRNA variants from a single gene.

Alternative splicing is regulated by splicing regulatory proteinsSR proteins that generally promote exon inclusion and hnRNP proteins that generally promote exon skipping. These proteins bind to exonic and intronic splicing enhancers (ESE, ISE) and silencers (ESS, ISS) in the pre-mRNA.

3′ End Processing: Cleavage and Polyadenylation

The 3′ end of a eukaryotic mRNA is generated by cleavage of the pre-mRNA followed by the addition of a poly(A) tail — a stretch of 100–250 adenine residues added by poly(A) polymerase without a DNA template. The cleavage/polyadenylation signal includes the AAUAAA hexamer (~10–30 nt upstream of the cleavage site) and a GU-rich or U-rich element downstream.

The poly(A) tail:

  • Stabilizes the mRNA against 3′ exonuclease degradation
  • Is required for efficient nuclear export
  • Enhances translation initiation (through interaction with poly(A)-binding protein and eIF4G)
  • Serves as a timer: gradual shortening of the tail eventually triggers mRNA decay

Mature mRNAs Are Selectively Exported from the Nucleus

Only fully processed mRNAs are exported from the nucleus to the cytoplasm. The nuclear pore complex, aided by export factors (notably TAP/NXF1 and Aly/REF), selects mRNAs that have been properly capped, spliced, and polyadenylated. Incompletely processed transcripts are retained and degraded in the nucleus by the nuclear exosome, a multi-subunit RNA degradation complex.

This quality-control checkpoint ensures that defective mRNAs — those with retained introns, premature stop codons, or other errors — do not reach the ribosome.

Non-coding RNAs Are Also Synthesized and Processed in the Nucleus

Beyond mRNAs, the nucleus is a factory for producing non-coding RNAs:

  • rRNA is transcribed by Pol I in the nucleolus as a large precursor (45S in humans) that is then cleaved and modified to produce the 18S, 5.8S, and 28S rRNAs
  • tRNAs are transcribed by Pol III and processed by RNase P (a ribozyme) and other enzymes
  • snRNAs (U1–U6) are transcribed by Pol II or Pol III and assembled with proteins into snRNPs
  • miRNAs are transcribed as long primary transcripts (pri-miRNAs), processed in the nucleus by Drosha to pre-miRNAs, exported, and then processed by Dicer in the cytoplasm to mature ~22-nt miRNAs
  • lncRNAs are transcribed by Pol II and can regulate gene expression through diverse mechanisms

The Nucleolus Is a Ribosome-Producing Factory

The nucleolus is the most prominent structure within the nucleus and the site of ribosome biogenesis. It forms around the rDNA repeat clusters (located on five human chromosomes), where Pol I transcribes the 45S precursor rRNA at extremely high rates. The 45S pre-rRNA is processed, modified (by snoRNAs that guide 2′-O-methylation and pseudouridylation), and assembled with ribosomal proteins imported from the cytoplasm.

A growing human cell produces approximately 7,500 ribosomal subunits per minute, requiring a massive commitment of cellular resources. The size and appearance of the nucleolus reflect the cell’s rate of growth — nucleoli are large in rapidly dividing cells and small in quiescent cells.

let promoter = "GCGCGCTATAAAGCGCATCGCGCGCGCTATAAATGCGCGC"
let kmers = Seq.kmer_count(promoter, 4)
print("4-mer counts in a TATA-box promoter region:")
print(kmers)

Promoter sequences contain recurring short motifs that transcription factors recognize. Counting k-mers (short subsequences of length k) reveals which motifs are overrepresented. The TATA box consensus (TATAA) and related motifs stand out in regions upstream of the transcription start site.

let tata_promoter = "GCGCTATAAAGCGCTATAAAGCGCGC"
let cpg_promoter = "CGCGCGCGCGCGCGCGCGCGCGCGCG"
let tata_kmers = Seq.kmer_count(tata_promoter, 2)
let cpg_kmers = Seq.kmer_count(cpg_promoter, 2)
let tata_values = "[" + Stats.shannon(tata_kmers) + "]"
let cpg_values = "[" + Stats.shannon(cpg_kmers) + "]"
print("TATA promoter 2-mer diversity (Shannon): " + tata_values)
print("CpG island promoter 2-mer diversity (Shannon): " + cpg_values)

The Shannon entropy of motif frequencies measures the diversity of short sequences in a region. A TATA box promoter has lower entropy (dominated by AT-rich motifs) compared to a CpG island promoter, which has a more uniform base composition.

RNA-seq: Measuring the Transcriptome

RNA-seq (RNA sequencing) is the standard method for measuring gene expression genome-wide. The basic workflow:

  1. Extract RNA from cells or tissue
  2. Convert to cDNA using reverse transcriptase
  3. Fragment and sequence on a high-throughput platform (Illumina, PacBio, or Oxford Nanopore)
  4. Align reads to a reference genome or transcriptome
  5. Count reads per gene to quantify expression levels

Key alignment tools include STAR and HISAT2 for genome-based alignment (splice-aware aligners that can map reads spanning exon-exon junctions) and Salmon and Kallisto for transcript-level quantification using pseudoalignment, which is much faster than traditional alignment.

Transcript assembly tools such as StringTie reconstruct full-length transcript models from RNA-seq reads, identifying novel splice variants and non-coding transcripts.

Differential Gene Expression Analysis

A central application of RNA-seq is identifying genes whose expression changes between conditions (e.g., treated vs. untreated, tumor vs. normal). Differential expression analysis uses statistical models to test whether the observed difference in read counts exceeds what would be expected from technical and biological variability.

The most widely used tools include:

  • DESeq2 — uses a negative binomial model with shrinkage estimation; robust for small sample sizes
  • edgeR — similar statistical framework, widely used in bioinformatics
  • limma-voom — adapts the limma linear model framework to RNA-seq data using variance-stabilizing transformation

These tools handle the key statistical challenges: RNA-seq data are count-based (not normally distributed), overdispersed (variance exceeds the mean), and affected by differences in library size (total reads per sample) that must be normalized.

Splice Variant Detection and Alternative Splicing Analysis

Specialized computational tools detect and quantify alternative splicing events from RNA-seq data:

  • rMATS (replicate Multivariate Analysis of Transcript Splicing) — detects differential splicing between conditions, classifying events by type (exon skipping, alternative 5′/3′ splice sites, intron retention, mutually exclusive exons)
  • SUPPA2 — quantifies percent spliced-in (PSI, or Ψ) for each splicing event across conditions
  • Leafcutter — identifies differential splicing by analyzing clusters of intron-excision events without requiring transcript annotations

Promoter and Transcription Start Site Analysis

Precisely mapping where transcription begins is critical for understanding gene regulation:

  • CAGE-seq (Cap Analysis of Gene Expression) — sequences the 5′ ends of capped mRNAs to map transcription start sites at single-nucleotide resolution
  • PRO-seq (Precision Run-On sequencing) — maps the positions of actively engaged RNA polymerases genome-wide

These methods reveal that many genes have multiple transcription start sites, producing mRNA variants with different 5′ UTRs that may differ in translational efficiency or stability.

Long Non-coding RNA Identification and Functional Prediction

The human genome encodes tens of thousands of lncRNAs — non-coding transcripts longer than 200 nucleotides. Computational identification of lncRNAs from RNA-seq data relies on coding potential assessment (tools like CPC2 and CPAT) to distinguish lncRNAs from mRNAs. Functional prediction is more challenging and uses guilt-by-association (co-expression with known genes), chromatin signatures, and conservation analysis.

Single-Cell RNA-seq Analysis

Single-cell RNA-seq (scRNA-seq) measures gene expression in individual cells, revealing cell-type heterogeneity that bulk RNA-seq averages away. A single scRNA-seq experiment can profile thousands to millions of cells.

The major analysis frameworks are:

  • Seurat (R) and Scanpy (Python) — provide comprehensive pipelines for quality control, normalization, dimensionality reduction (PCA, UMAP, t-SNE), clustering, and differential expression
  • Typical workflow: filter low-quality cells → normalize counts → identify highly variable genes → reduce dimensions → cluster cells → identify marker genes for each cluster → annotate cell types

scRNA-seq has transformed our understanding of tissue composition, developmental trajectories, and disease heterogeneity, particularly in cancer biology and immunology.

Spatial Transcriptomics

Spatial transcriptomics combines gene expression measurement with tissue architecture, preserving the spatial location of each measured transcript. Technologies include Visium (10x Genomics), which captures mRNA on barcoded spots on a tissue section, and MERFISH and seqFISH, which use multiplexed fluorescence in situ hybridization to detect individual RNA molecules in intact tissue.

Computational analysis of spatial transcriptomics data integrates gene expression with spatial coordinates to identify spatially variable genes, tissue domains, and cell-cell communication patterns that depend on physical proximity.

let labels = '["rRNA (Pol I)", "mRNA (Pol II)", "tRNA (Pol III)", "snRNA (Pol II)", "lncRNA (Pol II)"]'
let rates = '[45, 12, 8, 3, 2]'
print("Relative transcription rates by RNA type:")
print(Viz.bar(labels, rates))

Different RNA types are transcribed at vastly different rates. rRNA genes, transcribed by Pol I in the nucleolus, are the most actively transcribed sequences in the genome, reflecting the enormous demand for ribosomes. This visualization shows the relative transcription output by RNA class.

let strong_promoter = "GCGCGCTATAAAGCGATCGATCGCGC"
let weak_promoter = "GCGCGCTTTACAGCGATCGATCGCGC"
let strong_kmers = Seq.kmer_count(strong_promoter, 4)
let weak_kmers = Seq.kmer_count(weak_promoter, 4)
print("Strong promoter (consensus TATA) 4-mer profile:")
print(strong_kmers)
print("Weak promoter (divergent TATA) 4-mer profile:")
print(weak_kmers)

Different genes have different base compositions in their promoter regions, which directly affects transcription efficiency. Promoters that match the consensus TATA box more closely are generally stronger (drive more transcription) than those with mismatches, as reflected in different k-mer signatures.

Exercise: Promoter Motif Analysis

Analyze the k-mer composition of a promoter region to identify the TATA box motif. Count 4-mers and find the motif that includes the TATA consensus.

let promoter = "GCGATCGCGCTATAAAGCGATCG"
let kmers = Seq.kmer_count(promoter, 4)
print("4-mer counts in promoter region:")
print(kmers)
let tata_region = "TATAAAGCGATCG"
print(tata_region)

Exercise: Transcription Factor Binding Site Identification

Compare k-mer profiles of two genomic regions to determine which is more likely to contain a transcription factor binding site. Use the Shannon entropy of k-mer frequencies — a region dominated by a specific motif will have lower diversity.

let region_a = "GCCGCCGCCGCCGCCGCCGCCGCC"
let region_b = "ATCGATCGATCGATCGATCGATCG"
let kmers_a = Seq.kmer_count(region_a, 3)
let kmers_b = Seq.kmer_count(region_b, 3)
let entropy_a = Stats.shannon(kmers_a)
let entropy_b = Stats.shannon(kmers_b)
print("Region A entropy: " + entropy_a)
print("Region B entropy: " + entropy_b)
let answer = "Region A"
print(answer)

Exercise: Information Content of Promoter Elements

Calculate the information content (Shannon entropy) of a TATA box promoter versus a CpG island promoter to quantify the difference in sequence complexity. Then visualize the comparison using a bar chart.

let tata = "TATATATATATATATATATATATATAT"
let cpg = "CGCGCGATCGCGATCGCGCGATCGCG"
let tata_ent = Stats.shannon(Seq.kmer_count(tata, 2))
let cpg_ent = Stats.shannon(Seq.kmer_count(cpg, 2))
print("TATA promoter entropy: " + tata_ent)
print("CpG island entropy: " + cpg_ent)
let labels = '["TATA box", "CpG island"]'
let values = "[" + tata_ent + ", " + cpg_ent + "]"
print(Viz.bar(labels, values))
let higher = "CpG island"
print(higher)

Knowledge Check

Summary

In this lesson you covered transcription and RNA processing in depth:

  • Transcription copies DNA into RNA, with RNA polymerase reading the template strand 3′→5′ and synthesizing RNA 5′→3′
  • Promoters signal where transcription starts; start/stop signals are heterogeneous and vary in strength
  • Eukaryotes have three RNA polymerases: Pol I (rRNA), Pol II (mRNA, miRNA, lncRNA), Pol III (tRNA, 5S rRNA)
  • Pol II requires general transcription factors (TFIIA–TFIIH), activators, Mediator, and chromatin remodeling complexes
  • Transcription elongation generates superhelical tension resolved by topoisomerases
  • Eukaryotic mRNA processing is coupled to transcription via the Pol II CTD
  • 5′ capping (7-methylguanosine), splicing (intron removal by the spliceosome), and 3′ polyadenylation produce mature mRNA
  • The spliceosome uses two transesterification reactions via a lariat intermediate; its catalytic core is RNA-based
  • Alternative splicing affects >95% of human genes, generating enormous protein diversity
  • Chromatin structure influences splice site selection
  • Only fully processed mRNAs are exported from the nucleus; defective transcripts are degraded
  • The nucleolus produces ~7,500 ribosomal subunits per minute in growing cells
  • RNA-seq with tools like STAR, HISAT2, Salmon, and Kallisto measures transcriptomes at genome scale
  • Differential expression analysis (DESeq2, edgeR, limma-voom) identifies gene expression changes between conditions
  • Splice variant detection (rMATS, SUPPA2) quantifies alternative splicing from RNA-seq data
  • CAGE-seq and PRO-seq map transcription start sites and active polymerases
  • scRNA-seq (Seurat, Scanpy) profiles gene expression in individual cells
  • Spatial transcriptomics preserves tissue architecture alongside gene expression measurements

References

  1. Alberts B, Johnson A, Lewis J, Morgan D, Raff M, Roberts K, Walter P. Molecular Biology of the Cell, 7th ed. New York: W.W. Norton; 2022. Chapter 6: How Cells Read the Genome: From DNA to Protein.
  2. Cramer P, Bushnell DA, Kornberg RD. Structural basis of transcription: RNA polymerase II at 2.8 ångstrom resolution. Science. 2001;292(5523):1863–1876.
  3. ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature. 2012;489(7414):57–74.
  4. Dobin A, Davis CA, Schlesinger F, et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics. 2013;29(1):15–21.
  5. Pertea M, Pertea GM, Antonescu CM, Chang TC, Mendell JT, Salzberg SL. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat Biotechnol. 2015;33(3):290–295.
  6. Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 2014;15(12):550.
  7. Ståhl PL, Salmén F, Vickovic S, et al. Visualization and analysis of gene expression in tissue sections by spatial transcriptomics. Science. 2016;353(6294):78–82.
  8. Stark R, Grzelak M, Hadfield J. RNA sequencing: the teenage years. Nat Rev Genet. 2019;20(11):631–656.

Powered by

cyanea-seq cyanea-stats
transcription RNA polymerase promoter mRNA processing 5' cap splicing spliceosome poly-A tail alternative splicing introns exons RNA-seq differential expression scRNA-seq spatial transcriptomics non-coding RNA nucleolus transcription factors mediator supercoiling