Skip to main content
Alpha Cyanea is in public alpha. We're building in the open — expect rough edges and rapid iteration. See what's live

cyanea-align

v0.2.4 Analysis

Pairwise and multiple sequence alignment with full CIGAR support.

Analysis layer Apache-2.0 7 functions Interactive playground

Sequence alignment algorithms — Smith-Waterman, Needleman-Wunsch, banded alignment, progressive MSA, POA consensus, and CIGAR string manipulation.

Playground

Loading playground…

Overview

cyanea-align implements the classic dynamic-programming alignment algorithms and wraps them in a clean API that returns structured results with alignment strings, scores, and CIGAR representations.

It supports three modes — local (Smith-Waterman), global (Needleman-Wunsch), and semi-global — for both DNA and protein sequences. Banded alignment trades a small accuracy margin for dramatic speedups on long sequences. For multiple sequences, progressive MSA and partial-order alignment (POA) are available.

Key Concepts

Scoring Models

DNA alignment uses match/mismatch/gap-open/gap-extend parameters. Protein alignment uses substitution matrices (BLOSUM62, PAM250, etc.) that reflect evolutionary substitution rates between amino acid pairs.

Banded Alignment

Full dynamic programming is O(mn) in time and space. Banded alignment restricts computation to a diagonal band of width 2k+1, reducing cost to O(k·max(m,n)). This is appropriate when you expect the alignment to be roughly diagonal — which is the common case for similar sequences.

CIGAR Strings

A CIGAR string (e.g., 8M2I4M1D3M) is a compact encoding of an alignment. M = match/mismatch, I = insertion in query, D = deletion in query. cigar_stats extracts counts; parse_cigar returns the full operations array.

Multiple Sequence Alignment

Progressive MSA builds a guide tree from pairwise distances, then aligns sequences along it. POA consensus constructs a partial-order graph and extracts the consensus path — useful for error correction in long reads.

Code Examples

Rust

use cyanea_align::{align_dna, AlignMode};

let result = align_dna("ACGTACGT", "ACGACGT", AlignMode::Global)?;
println!("Score: {}, CIGAR: {}", result.score, result.cigar);

Python

import cyanea

result = cyanea.align_dna("ACGTACGT", "ACGACGT", mode="global")
print(f"Score: {result['score']}, CIGAR: {result['cigar']}")

JavaScript (WASM)

import { align_dna_custom, progressive_msa } from '/wasm/cyanea_wasm.js';

const result = JSON.parse(align_dna_custom("ACGTACGT", "ACGACGT", "global", 2, -1, -5, -2));
console.log(`Score: ${result.ok.score}, CIGAR: ${result.ok.cigar}`);

Use Cases

  • Variant discovery — Align reads to a reference to find SNPs and indels.
  • Homology search — Score protein alignments with BLOSUM62 to detect remote homologs.
  • Consensus building — Combine noisy long reads into a polished consensus via POA.
  • SAM/BAM QC — Parse and validate CIGAR strings from alignment files.

API Surface

align_dna_custom (q, t, mode, m, mm, go, ge) -> JSON Align two DNA sequences with custom scoring
align_protein (q, t, mode, matrix) -> JSON Align protein sequences using substitution matrices
align_banded (q, t, mode, bw, m, mm, go, ge) -> JSON Banded alignment for speed on long sequences
progressive_msa (seqs, m, mm, go, ge) -> JSON Progressive multiple sequence alignment
poa_consensus (seqs: JSON) -> String Partial-order alignment consensus sequence
cigar_stats (cigar: &str) -> JSON Compute match/mismatch/indel counts from CIGAR
parse_cigar (cigar: &str) -> JSON Parse CIGAR string into operations array

Depended on by

Tags

Alignment Smith-Waterman Needleman-Wunsch MSA CIGAR