cyanea-stats

v0.3.0 Analysis

From t-tests to Tajima's D — statistics built for biology.

Analysis layer Apache-2.0 11 functions Interactive playground

Statistical computing for bioinformatics — descriptive stats, hypothesis testing, multiple testing correction, survival analysis, population genetics, and ecological diversity.

Playground

Loading playground…

Overview

cyanea-stats is the statistical engine for the ecosystem. It covers four domains that are essential in bioinformatics:

Descriptive statistics — Summarize data distributions with mean, median, standard deviation, and quartiles.
Hypothesis testing — Parametric tests (t-test, Pearson) and non-parametric tests (Mann-Whitney, Spearman) with multiple testing correction (Bonferroni, Benjamini-Hochberg).
Survival analysis — Kaplan-Meier curves and log-rank tests for clinical and time-to-event data.
Population genetics and ecology — Wright-Fisher drift simulation, Tajima’s D, Shannon and Simpson diversity indices, and F-statistics.

Key Concepts

Multiple Testing Correction

When testing thousands of genes for differential expression, the family-wise error rate explodes. Bonferroni controls the probability of any false positive (strict); Benjamini-Hochberg controls the false discovery rate (less strict, more powerful). Both accept a vector of p-values and return adjusted p-values.

Kaplan-Meier Estimation

The Kaplan-Meier estimator produces a step-function survival curve from censored data. Each event (death, relapse) causes a step down; censored observations (patients lost to follow-up) are accounted for without introducing bias. The result includes survival probabilities and confidence intervals at each time point.

Tajima’s D

Tajima’s D compares two estimators of the population mutation rate: the number of segregating sites (S) and the average number of pairwise differences (π). Under neutrality they should be equal. Departures indicate selection, population expansion, or bottleneck.

Code Examples

Rust

use cyanea_stats::{describe, t_test, kaplan_meier};

let summary = describe(&data);
let test = t_test(&data, 0.0);
let km = kaplan_meier(&times, &status);

Python

import cyanea

summary = cyanea.describe([2.1, 3.4, 5.6, 1.2, 4.3])
km = cyanea.kaplan_meier(times=[1,2,3,4,5], status=[1,0,1,1,0])

JavaScript (WASM)

import { describe, t_test, kaplan_meier, benjamini_hochberg } from '/wasm/cyanea_wasm.js';

const summary = JSON.parse(describe(JSON.stringify([2.1, 3.4, 5.6, 1.2, 4.3])));
const km = JSON.parse(kaplan_meier(JSON.stringify(times), JSON.stringify(status)));

Use Cases

Differential expression — Test thousands of genes, correct with BH, filter by adjusted p-value.
Clinical trials — Estimate survival curves and compare treatment arms with log-rank tests.
Metagenomics — Compute Shannon diversity across samples to measure community richness.
Population genetics — Simulate drift with Wright-Fisher and test for selection with Tajima’s D.

API Surface

describe (data: JSON) -> JSON Descriptive statistics — mean, median, std, quartiles

pearson (x, y: JSON) -> JSON Pearson correlation coefficient and p-value

spearman (x, y: JSON) -> JSON Spearman rank correlation and p-value

t_test (data: JSON, mu: f64) -> JSON One-sample t-test against a hypothesized mean

mann_whitney_u (x, y: JSON) -> JSON Non-parametric Mann-Whitney U test

bonferroni (p: JSON) -> JSON Bonferroni multiple testing correction

benjamini_hochberg (p: JSON) -> JSON Benjamini-Hochberg FDR correction

kaplan_meier (times, status: JSON) -> JSON Kaplan-Meier survival curve estimation

wright_fisher (N, freq, gen, seed) -> JSON Wright-Fisher genetic drift simulation

shannon_index (counts: JSON) -> f64 Shannon diversity index

tajimas_d (S, n, pi) -> f64 Tajima's D test for neutrality

Depends on

cyanea-core

Depended on by

cyanea-ml cyanea-omics cyanea-wasm