Quantitative Image Analysis and Spatial Biology

Intermediate Experimental Methods ~35 min

← Previous Next →

Learn how computational tools extract quantitative data from microscopy images — from cell segmentation and morphological profiling to high-content screening, spatial transcriptomics, and AI-driven image analysis.

Introduction

The previous lesson surveyed the microscopy techniques used to visualize cells and molecules — from phase-contrast light microscopy to cryo-EM. But generating images is only half the challenge. Modern microscopy experiments routinely produce millions of images containing billions of cells, far more data than any human could analyze by eye. Quantitative image analysis uses computational tools to extract reproducible, objective measurements from images — transforming pixels into biological insight.

This lesson covers the computational pipeline for analyzing microscopy images: segmentation (finding individual cells), feature extraction (measuring their properties), and downstream analysis (clustering, classification, and statistical testing). We then explore three frontiers that are transforming cell biology: morphological profiling and the Cell Painting assay, high-content screening for drug discovery, and spatial transcriptomics — technologies that measure gene expression while preserving spatial context within tissues.

The Image Analysis Pipeline

Every quantitative microscopy experiment follows the same general pipeline:

Image acquisition — collect raw images on the microscope (brightfield, fluorescence, or a combination)
Preprocessing — correct for illumination unevenness, remove noise, subtract background
Segmentation — identify individual cells (or nuclei, organelles, or other objects) and draw boundaries around them
Feature extraction — measure properties of each segmented object (size, shape, intensity, texture)
Analysis — use statistical or machine learning methods to classify cells, detect phenotypes, or test hypotheses

Each step must be optimized and validated for the specific biological system. Errors in early steps (especially segmentation) propagate through the entire pipeline and can invalidate downstream conclusions.

Let’s examine a typical dataset of cell measurements and compute basic descriptive statistics to understand the distribution of cell properties:

let cell_areas = [185, 210, 195, 220, 175, 205, 230, 190, 215, 200, 225, 180, 240, 198, 212]
let stats = Stats.describe(cell_areas)
print("Cell area measurements (μm²):")
print(stats)

The descriptive statistics reveal the central tendency (mean), spread (standard deviation), and range of cell areas in our population. In a typical image analysis experiment, we would compute dozens of such features for each of thousands of cells.

Cell Segmentation: From Thresholding to Deep Learning

Segmentation — accurately identifying individual cells in an image — is the most critical and challenging step in the pipeline. The quality of all downstream analysis depends on segmentation accuracy.

Classical Methods

Intensity thresholding is the simplest approach: pixels above a threshold intensity are classified as “object” and pixels below as “background.” Methods like Otsu’s algorithm automatically determine the optimal threshold by maximizing the between-class variance. Thresholding works well for bright, well-separated objects (fluorescent nuclei against a dark background) but fails when cells are dim, densely packed, or unevenly illuminated.

Watershed segmentation treats the image as a topographic surface and “floods” it from local minima, placing boundaries where “water” from different basins meets. Combined with a distance transform, watershed effectively separates touching nuclei — a problem that defeats simple thresholding.

Deep Learning Methods

The limitations of classical methods drove the development of deep learning approaches that learn to segment cells directly from annotated training data:

U-Net (2015) is a convolutional neural network architecture designed for biomedical image segmentation. Its encoder-decoder structure with skip connections preserves fine spatial details while capturing large-scale context. U-Net achieves excellent segmentation with remarkably small training sets (as few as 30 annotated images), making it practical for biologists who cannot curate massive training datasets.

Cellpose (2020) is a generalist segmentation model trained on a diverse collection of cell images across many microscopy modalities and cell types. Its key innovation is predicting the gradient field pointing from each pixel toward the cell center, then following these gradients to group pixels into cells. Cellpose generalizes across image types without retraining — a major advance over U-Net, which typically requires task-specific training.

StarDist (2018) is optimized for convex-shaped objects (nuclei). It predicts a star-convex polygon for each nucleus by estimating radial distances from the center to the boundary at fixed angles. StarDist is fast, accurate, and handles touching nuclei better than watershed methods.

Evaluating Segmentation Quality

The Intersection over Union (IoU) metric quantifies segmentation accuracy by comparing the predicted segmentation to a ground-truth annotation:

IoU = Area of overlap / Area of union

An IoU of 1.0 means perfect agreement; 0 means no overlap. An IoU ≥ 0.5 is typically considered a successful detection (a “true positive”). The average precision (AP) at different IoU thresholds summarizes overall segmentation quality.

Let’s compute IoU scores for a simulated segmentation experiment comparing three methods. We’ll model each method’s performance as a distribution of per-cell IoU scores:

let threshold_iou = [0.45, 0.52, 0.38, 0.61, 0.49, 0.55, 0.42, 0.50, 0.47, 0.58]
let watershed_iou = [0.62, 0.71, 0.55, 0.68, 0.73, 0.65, 0.59, 0.70, 0.66, 0.74]
let cellpose_iou = [0.82, 0.88, 0.79, 0.91, 0.85, 0.87, 0.83, 0.90, 0.86, 0.92]
print("Thresholding IoU scores:")
print(Stats.describe(threshold_iou))
print("Watershed IoU scores:")
print(Stats.describe(watershed_iou))
print("Cellpose IoU scores:")
print(Stats.describe(cellpose_iou))

The results show a clear progression: deep learning methods like Cellpose consistently achieve higher IoU scores than classical approaches, especially on challenging images with touching or irregularly shaped cells.

Exercise: Compare Segmentation Methods

You have IoU scores from two segmentation algorithms applied to the same images. Compute the mean IoU for each and determine which method performs better. Also compute the Pearson correlation between the two methods’ per-image IoU scores to see if they agree on which images are easy or hard:

let method_a = [0.55, 0.62, 0.48, 0.70, 0.58, 0.63, 0.52, 0.67]
let method_b = [0.78, 0.85, 0.72, 0.91, 0.80, 0.86, 0.75, 0.89]
print("Method A IoU scores:")
print(Stats.describe(method_a))
print("Method B IoU scores:")
print(Stats.describe(method_b))
let r = Stats.pearson(method_a, method_b)
print("Pearson correlation (A vs B): " + r)
// Which method has higher mean IoU?
let answer = "method_B"
print(answer)

Morphological Profiling and Cell Painting

The Cell Painting Assay

Cell Painting (Bray et al., 2016) is a standardized morphological profiling assay that uses six fluorescent dyes to stain eight cellular compartments:

Dye	Target	Compartment(s)
Hoechst 33342	DNA	Nucleus
Concanavalin A	ER glycoproteins	Endoplasmic reticulum
SYTO 14	Nucleolar RNA	Nucleoli
Phalloidin	F-actin	Actin cytoskeleton
Wheat germ agglutinin	Golgi, plasma membrane	Golgi, cell surface
MitoTracker	Mitochondria	Mitochondria

From each cell, CellProfiler extracts approximately 1,500 features describing the size, shape, intensity, texture, and spatial relationships of these compartments. The resulting morphological profile — a 1,500-dimensional vector for each cell — serves as a rich, unbiased readout of cellular state.

Dimensionality Reduction for Morphological Profiles

With 1,500 features per cell, direct visualization is impossible. Dimensionality reduction methods compress the data into two or three dimensions for visualization and analysis:

PCA (Principal Component Analysis) finds the axes of greatest variance in the data. The first few principal components often capture biologically meaningful variation — for example, PC1 might separate treated from untreated cells, while PC2 separates cell cycle stages.

UMAP and t-SNE are nonlinear methods that preserve local neighborhood structure, producing clusters of similar cells. They are better than PCA at revealing distinct subpopulations but do not preserve global distances.

Let’s perform PCA on a set of cell morphological measurements to see how treated and untreated cells separate in reduced dimensions:

// 8 cells, 6 features each: [area, perimeter, intensity_mean, intensity_std, texture, roundness]
// Cells 1-4: untreated control; Cells 5-8: drug-treated
let features = '[120, 42, 0.65, 0.12, 0.30, 0.91, 125, 44, 0.62, 0.14, 0.28, 0.89, 118, 41, 0.68, 0.11, 0.32, 0.93, 122, 43, 0.64, 0.13, 0.29, 0.90, 280, 68, 0.35, 0.25, 0.72, 0.55, 295, 72, 0.32, 0.28, 0.75, 0.51, 270, 65, 0.38, 0.23, 0.69, 0.58, 288, 70, 0.34, 0.26, 0.73, 0.53]'
let pca_result = ML.pca(features, 6, 2)
print("PCA of cell morphological profiles:")
print(pca_result)

The PCA clearly separates drug-treated cells (larger, more irregular, altered intensity) from untreated controls. In a real Cell Painting experiment with 1,500 features and thousands of cells, this same approach reveals subtle morphological changes caused by drugs, gene knockouts, or disease states.

Exercise: Cluster Cells by Morphological Profile

Use k-means clustering to group cells based on their morphological features. Determine how many distinct phenotypic clusters exist in a mixed population:

// 9 cells, 4 features each: [area, intensity, roundness, texture]
// 3 phenotypes: small/round, large/elongated, medium/irregular
let data = '[80, 0.82, 0.95, 0.20, 85, 0.79, 0.93, 0.22, 82, 0.81, 0.94, 0.21, 250, 0.45, 0.52, 0.68, 260, 0.42, 0.48, 0.72, 255, 0.44, 0.50, 0.70, 150, 0.60, 0.70, 0.45, 155, 0.58, 0.68, 0.48, 148, 0.62, 0.72, 0.43]'
let clusters = ML.kmeans(data, 4, 3)
print("K-means clustering (k=3):")
print(clusters)
// How many distinct phenotypic clusters did we identify?
let answer = "3"
print(answer)

High-Content Screening

High-content screening (HCS) combines automated microscopy with quantitative image analysis to measure the effects of thousands of perturbations (drugs, siRNAs, CRISPR guides) on cell morphology and behavior. Unlike traditional high-throughput screening, which measures a single readout (e.g., viability), HCS captures rich, multidimensional phenotypic information from every cell.

The HCS Workflow

Plate cells in 384- or 1536-well microplates
Add perturbations (compound library, genetic perturbations)
Fix, stain, and image (automated microscope images every well at multiple wavelengths)
Segment and extract features (CellProfiler pipeline)
Analyze — compare perturbation profiles to controls, cluster similar perturbations, identify hits

Assay Quality: The Z′ Factor

The Z′ factor measures the statistical quality of a screening assay — the separation between positive and negative controls:

Z′ = 1 − (3σ₊ + 3σ₋) / |μ₊ − μ₋|

A Z′ ≥ 0.5 indicates an excellent assay with good separation between controls. Z′ between 0 and 0.5 is marginal. Z′ < 0 means the signal and noise distributions overlap — the assay is unreliable.

Let’s compute Z′ for two screening assays to evaluate their quality:

let pos_ctrl = [0.85, 0.88, 0.82, 0.90, 0.87, 0.84, 0.89, 0.86]
let neg_ctrl = [0.12, 0.15, 0.10, 0.18, 0.14, 0.11, 0.16, 0.13]
let pos_stats = Stats.describe(pos_ctrl)
let neg_stats = Stats.describe(neg_ctrl)
print("Positive control statistics:")
print(pos_stats)
print("Negative control statistics:")
print(neg_stats)

Dose-Response and Mechanism of Action

In drug screening, compounds that produce a significant effect are called hits. Hits are validated by measuring their dose-response relationship — the effect as a function of drug concentration. The key parameter is the IC₅₀ (or EC₅₀) — the concentration at which the compound produces half its maximal effect.

Mechanism of action (MoA) prediction uses morphological profiles to infer how a drug works. Compounds with the same mechanism produce similar morphological fingerprints. By comparing a novel compound’s profile to a reference library of compounds with known mechanisms, computational methods can predict its MoA — even for completely novel chemical scaffolds.

Let’s simulate dose-response data for two compounds and visualize their potency:

let doses = '[{"label": "0.01 μM", "value": 5}, {"label": "0.1 μM", "value": 15}, {"label": "1 μM", "value": 45}, {"label": "10 μM", "value": 82}, {"label": "100 μM", "value": 95}]'
let chart = Viz.bar(doses, '{"title": "Dose-Response: Compound A (% Inhibition)", "color": "#8B5CF6"}')
print(chart)

Exercise: Evaluate Screening Assay Quality

Given positive and negative control data from a high-content screen, compute the mean and standard deviation for each control group. A well-designed assay has tight distributions (low standard deviation) and large separation between controls:

let positive = [0.92, 0.89, 0.95, 0.91, 0.88, 0.93, 0.90, 0.94]
let negative = [0.08, 0.11, 0.06, 0.10, 0.09, 0.07, 0.12, 0.08]
print("Positive control:")
print(Stats.describe(positive))
print("Negative control:")
print(Stats.describe(negative))
// Based on the separation between controls, is this assay quality excellent, marginal, or poor?
let answer = "excellent"
print(answer)

Spatial Transcriptomics

Traditional transcriptomics (RNA-seq) measures gene expression in bulk tissue homogenates or dissociated single cells — but destroys the spatial context. Spatial transcriptomics preserves the physical location of gene expression within intact tissue, revealing how cells communicate with their neighbors and how gene expression varies across tissue architecture.

Major Spatial Transcriptomics Technologies

Technology	Approach	Resolution	Genes measured	Key application
10x Visium	Spatially barcoded capture spots (~55 µm)	~55 µm (multiple cells)	Whole transcriptome	Tissue-scale gene expression mapping
MERFISH	Multiplexed error-robust FISH	Subcellular	100–10,000 genes	Single-molecule, subcellular resolution
seqFISH+	Sequential hybridization	Subcellular	10,000+ genes	Dense subcellular profiling
CODEX	Iterative antibody staining	Subcellular	40–60 proteins	Spatial proteomics of immune cells
Slide-seq	DNA-barcoded beads	~10 µm	Whole transcriptome	Near-single-cell spatial resolution

Spatial Analysis Methods

Spatial autocorrelation measures whether cells expressing a particular gene tend to cluster together (positive autocorrelation) or be dispersed (negative autocorrelation). Moran’s I is the most common metric: I = +1 indicates perfect spatial clustering, 0 indicates random distribution, and −1 indicates perfect dispersion.

Cell-cell communication analysis uses spatial transcriptomics to identify ligand-receptor interactions between neighboring cells. Tools like CellChat and COMMOT map signaling networks in tissue context, revealing which cell types are talking to each other and through which pathways.

Spatial domain identification uses unsupervised clustering on spatially resolved expression profiles to identify tissue domains — regions with distinct gene expression programs that correspond to anatomical structures or functional zones.

Let’s analyze the spatial distribution of gene expression across tissue regions using Shannon entropy to quantify expression heterogeneity:

let region_a_expr = [12.5, 11.8, 13.2, 12.1, 12.8]
let region_b_expr = [2.1, 8.5, 15.3, 0.8, 22.4]
print("Region A (homogeneous) gene expression:")
print(Stats.describe(region_a_expr))
print("Region B (heterogeneous) gene expression:")
print(Stats.describe(region_b_expr))
let entropy_a = Stats.shannon(region_a_expr)
let entropy_b = Stats.shannon(region_b_expr)
print("Shannon entropy (Region A): " + entropy_a)
print("Shannon entropy (Region B): " + entropy_b)

Region A shows low variability (homogeneous expression), while Region B shows high variability (heterogeneous expression) — as reflected in the standard deviation and Shannon entropy. In a real spatial transcriptomics experiment, this analysis reveals tissue regions with distinct expression programs.

Exercise: Identify Spatially Correlated Genes

Two genes are measured across spatial positions in a tissue section. Compute the Pearson correlation between their spatial expression patterns to determine if they are co-expressed in the same regions:

let gene_a = [8.2, 7.5, 9.1, 8.8, 7.9, 2.1, 1.5, 2.8, 1.9, 2.3]
let gene_b = [7.8, 7.1, 8.5, 8.2, 7.4, 2.5, 1.8, 3.1, 2.2, 2.6]
let r = Stats.pearson(gene_a, gene_b)
print("Pearson r (Gene A vs Gene B): " + r)
print("Gene A expression:")
print(Stats.describe(gene_a))
print("Gene B expression:")
print(Stats.describe(gene_b))
// Are Gene A and Gene B co-expressed or anti-correlated?
let answer = "co-expressed"
print(answer)

AI in Microscopy

Artificial intelligence is transforming every stage of the microscopy pipeline:

Convolutional Neural Networks for Image Classification

CNNs trained on annotated microscopy images can classify cell phenotypes with superhuman accuracy. A CNN processes an image through layers of learned filters that detect progressively more complex features — from edges and textures to organelle shapes and cell morphologies. Applications include identifying cell cycle stages, classifying cancer vs. normal tissue in histopathology, and detecting rare cell types in heterogeneous populations.

Virtual Staining

Virtual staining uses deep learning to predict what a fluorescently stained image would look like from a label-free (brightfield or phase-contrast) input image. A neural network trained on paired labeled/unlabeled images learns the mapping between transmitted light and fluorescence. This eliminates the need for fluorescent labels — avoiding phototoxicity, spectral overlap, and sample preparation artifacts — while still providing the same information content.

Foundation Models for Microscopy

Foundation models — large neural networks pretrained on massive unlabeled image datasets using self-supervised learning — are being adapted for microscopy. These models learn general visual representations that transfer to diverse downstream tasks:

CellSAM and Segment Anything for Microscopy — adapt the Segment Anything Model (SAM) to biological images, enabling zero-shot segmentation without task-specific training
MAE-based models — use masked autoencoder pretraining on millions of microscopy images, then fine-tune for classification, segmentation, or feature extraction

The key advantage of foundation models is generalization: a single pretrained model can be adapted to many different microscopy tasks with minimal additional training data.

Let’s visualize the performance comparison of different AI approaches for a cell classification task:

let performance = '[{"label": "Manual annotation", "value": 78}, {"label": "Classical ML", "value": 84}, {"label": "CNN (trained)", "value": 93}, {"label": "Foundation model (fine-tuned)", "value": 96}]'
let chart = Viz.bar(performance, '{"title": "Cell Classification Accuracy (%)", "color": "#10B981"}')
print(chart)

Foundation models fine-tuned with even small amounts of domain-specific data outperform both classical machine learning and purpose-trained CNNs, representing the current state of the art in bioimage analysis.

Knowledge Check

Summary

In this lesson you covered computational image analysis and spatial biology:

The image analysis pipeline proceeds through acquisition, preprocessing, segmentation, feature extraction, and analysis — errors in early steps propagate downstream
Cell segmentation evolved from intensity thresholding and watershed to deep learning methods: U-Net (encoder-decoder CNN), Cellpose (gradient-field prediction, generalizes without retraining), and StarDist (star-convex polygons for nuclei)
Intersection over Union (IoU) quantifies segmentation accuracy by measuring overlap between predicted and ground-truth boundaries
Cell Painting uses six dyes and eight compartments to extract ~1,500 morphological features per cell, creating a rich unbiased phenotypic readout
Dimensionality reduction (PCA, UMAP) compresses high-dimensional profiles for visualization; k-means clustering identifies phenotypic subpopulations
High-content screening combines automated microscopy with quantitative analysis; the Z′ factor measures assay quality; dose-response curves yield IC₅₀ values
Mechanism of action prediction compares drug morphological profiles to reference libraries of compounds with known mechanisms
Spatial transcriptomics (10x Visium, MERFISH, seqFISH, CODEX) measures gene expression while preserving tissue architecture
Spatial autocorrelation (Moran’s I) tests whether gene expression clusters spatially; cell-cell communication analysis maps ligand-receptor interactions between neighbors
CNNs classify cell phenotypes with high accuracy; virtual staining predicts fluorescence from label-free images
Foundation models (CellSAM, MAE-based) pretrained on massive microscopy datasets generalize to diverse tasks with minimal fine-tuning

References

Alberts B, Johnson A, Lewis J, Morgan D, Raff M, Roberts K, Walter P. Molecular Biology of the Cell, 7th ed. New York: W.W. Norton; 2022. Chapter 9: Visualizing Cells.
Carpenter AE, Jones TR, Lamprecht MR, et al. CellProfiler: image analysis software for identifying and quantifying cell phenotypes. Genome Biol. 2006;7:R100.
Stringer C, Wang T, Michaelos M, Pachitariu M. Cellpose: a generalist algorithm for cellular segmentation. Nat Methods. 2021;18(1):100–106.
Bray MA, Singh S, Han H, et al. Cell Painting, a high-content image-based assay for morphological profiling using multiplexed fluorescent dyes. Nat Protoc. 2016;11(9):1757–1774.
Schmidt U, Weigert M, Broaddus C, Myers G. Cell detection with star-convex polygons. In: MICCAI. 2018:265–273.
Ronneberger O, Fischer P, Brox T. U-Net: convolutional networks for biomedical image segmentation. In: MICCAI. 2015:234–241.
Moses L, Bhatt R, Pachter L. Museum of spatial transcriptomics. Nat Methods. 2022;19(5):534–546.
Ståhl PL, Salmén F, Vickovic S, et al. Visualization and analysis of gene expression in tissue sections by spatial transcriptomics. Science. 2016;353(6294):78–82.

Powered by

cyanea-ml cyanea-stats cyanea-viz

image analysis cell segmentation CellProfiler Cellpose StarDist U-Net morphological profiling Cell Painting high-content screening phenotypic screening Z-prime dose-response spatial transcriptomics MERFISH seqFISH CODEX 10x Visium spatial autocorrelation AI microscopy virtual staining foundation models deep learning CNN dimensionality reduction PCA k-means