cyanea-stats
v0.3.0 AnalysisFrom t-tests to Tajima's D — statistics built for biology.
Statistical computing for bioinformatics — descriptive stats, hypothesis testing, multiple testing correction, survival analysis, population genetics, and ecological diversity.
Playground
Overview
cyanea-stats is the statistical engine for the ecosystem. It covers four domains that are essential in bioinformatics:
- Descriptive statistics — Summarize data distributions with mean, median, standard deviation, and quartiles.
- Hypothesis testing — Parametric tests (t-test, Pearson) and non-parametric tests (Mann-Whitney, Spearman) with multiple testing correction (Bonferroni, Benjamini-Hochberg).
- Survival analysis — Kaplan-Meier curves and log-rank tests for clinical and time-to-event data.
- Population genetics and ecology — Wright-Fisher drift simulation, Tajima’s D, Shannon and Simpson diversity indices, and F-statistics.
Key Concepts
Multiple Testing Correction
When testing thousands of genes for differential expression, the family-wise error rate explodes. Bonferroni controls the probability of any false positive (strict); Benjamini-Hochberg controls the false discovery rate (less strict, more powerful). Both accept a vector of p-values and return adjusted p-values.
Kaplan-Meier Estimation
The Kaplan-Meier estimator produces a step-function survival curve from censored data. Each event (death, relapse) causes a step down; censored observations (patients lost to follow-up) are accounted for without introducing bias. The result includes survival probabilities and confidence intervals at each time point.
Tajima’s D
Tajima’s D compares two estimators of the population mutation rate: the number of segregating sites (S) and the average number of pairwise differences (π). Under neutrality they should be equal. Departures indicate selection, population expansion, or bottleneck.
Code Examples
Rust
use cyanea_stats::{describe, t_test, kaplan_meier};
let summary = describe(&data);
let test = t_test(&data, 0.0);
let km = kaplan_meier(×, &status);
Python
import cyanea
summary = cyanea.describe([2.1, 3.4, 5.6, 1.2, 4.3])
km = cyanea.kaplan_meier(times=[1,2,3,4,5], status=[1,0,1,1,0])
JavaScript (WASM)
import { describe, t_test, kaplan_meier, benjamini_hochberg } from '/wasm/cyanea_wasm.js';
const summary = JSON.parse(describe(JSON.stringify([2.1, 3.4, 5.6, 1.2, 4.3])));
const km = JSON.parse(kaplan_meier(JSON.stringify(times), JSON.stringify(status)));
Use Cases
- Differential expression — Test thousands of genes, correct with BH, filter by adjusted p-value.
- Clinical trials — Estimate survival curves and compare treatment arms with log-rank tests.
- Metagenomics — Compute Shannon diversity across samples to measure community richness.
- Population genetics — Simulate drift with Wright-Fisher and test for selection with Tajima’s D.