cyanea-align
v0.2.4 AnalysisPairwise and multiple sequence alignment with full CIGAR support.
Sequence alignment algorithms — Smith-Waterman, Needleman-Wunsch, banded alignment, progressive MSA, POA consensus, and CIGAR string manipulation.
Playground
Overview
cyanea-align implements the classic dynamic-programming alignment algorithms and wraps them in a clean API that returns structured results with alignment strings, scores, and CIGAR representations.
It supports three modes — local (Smith-Waterman), global (Needleman-Wunsch), and semi-global — for both DNA and protein sequences. Banded alignment trades a small accuracy margin for dramatic speedups on long sequences. For multiple sequences, progressive MSA and partial-order alignment (POA) are available.
Key Concepts
Scoring Models
DNA alignment uses match/mismatch/gap-open/gap-extend parameters. Protein alignment uses substitution matrices (BLOSUM62, PAM250, etc.) that reflect evolutionary substitution rates between amino acid pairs.
Banded Alignment
Full dynamic programming is O(mn) in time and space. Banded alignment restricts computation to a diagonal band of width 2k+1, reducing cost to O(k·max(m,n)). This is appropriate when you expect the alignment to be roughly diagonal — which is the common case for similar sequences.
CIGAR Strings
A CIGAR string (e.g., 8M2I4M1D3M) is a compact encoding of an alignment. M = match/mismatch, I = insertion in query, D = deletion in query. cigar_stats extracts counts; parse_cigar returns the full operations array.
Multiple Sequence Alignment
Progressive MSA builds a guide tree from pairwise distances, then aligns sequences along it. POA consensus constructs a partial-order graph and extracts the consensus path — useful for error correction in long reads.
Code Examples
Rust
use cyanea_align::{align_dna, AlignMode};
let result = align_dna("ACGTACGT", "ACGACGT", AlignMode::Global)?;
println!("Score: {}, CIGAR: {}", result.score, result.cigar);
Python
import cyanea
result = cyanea.align_dna("ACGTACGT", "ACGACGT", mode="global")
print(f"Score: {result['score']}, CIGAR: {result['cigar']}")
JavaScript (WASM)
import { align_dna_custom, progressive_msa } from '/wasm/cyanea_wasm.js';
const result = JSON.parse(align_dna_custom("ACGTACGT", "ACGACGT", "global", 2, -1, -5, -2));
console.log(`Score: ${result.ok.score}, CIGAR: ${result.ok.cigar}`);
Use Cases
- Variant discovery — Align reads to a reference to find SNPs and indels.
- Homology search — Score protein alignments with BLOSUM62 to detect remote homologs.
- Consensus building — Combine noisy long reads into a polished consensus via POA.
- SAM/BAM QC — Parse and validate CIGAR strings from alignment files.