Skip to main content
Alpha Cyanea is in public alpha. We're building in the open — expect rough edges and rapid iteration. See what's live

Introduction to Sequence Alignment

Beginner Genomics & Bioinformatics ~25 min

Learn the fundamentals of pairwise sequence alignment — how and why we compare biological sequences.

Why Align Sequences?

Sequence alignment is one of the most fundamental operations in bioinformatics. By placing two sequences side by side and introducing gaps to maximize similarity, we can:

  • Identify homology — sequences that share a common ancestor
  • Find conserved regions — functionally important parts of genes or proteins
  • Predict function — a new gene that aligns well to a known gene likely has a similar function
  • Study evolution — mutations, insertions, and deletions reveal evolutionary relationships

What Is an Alignment?

An alignment pairs up positions in two sequences, inserting gaps (-) where needed:

Sequence 1:  A T G G C T A - G C
Sequence 2:  A T G - C T A G G C
                   *       *

The * marks positions where the sequences differ. The - represents an insertion or deletion (collectively called indels).

Scoring Alignments

Every alignment has a score that measures its quality. The scoring scheme assigns:

EventTypical score
Match+1 or +2
Mismatch-1
Gap-2 (gap penalty)

Let’s compute a simple alignment score:

let seq1 = "ATGGCTAG"
let seq2 = "ATGCCTAG"
let result = Align.global(seq1, seq2)
print("Score: " + result.score)
print(result.alignment)

Global vs. Local Alignment

There are two fundamental approaches:

Global Alignment (Needleman-Wunsch)

Aligns sequences end to end. Best when sequences are similar in length and expected to be related across their full length.

Use case: comparing two homologous genes from related species.

let seq1 = "GCATGCG"
let seq2 = "GATTACA"
let result = Align.global(seq1, seq2)
print("Global alignment (score: " + result.score + ")")
print(result.alignment)

Local Alignment (Smith-Waterman)

Finds the best matching subsequence. Best when you suspect only part of one sequence matches part of another.

Use case: finding a conserved domain within a larger protein, or searching a database for similar sequences.

let seq1 = "AAATTTGCATGCGAAATTT"
let seq2 = "GATTACA"
let result = Align.local(seq1, seq2)
print("Local alignment (score: " + result.score + ")")
print(result.alignment)

Notice how local alignment finds the best-matching region and ignores the flanking sequences that don’t match.

Gap Penalties

Gap penalties strongly influence alignment results. There are two common models:

Linear gap penalty: Each gap position costs the same (e.g., -2 per gap).

Affine gap penalty: Opening a gap is expensive, but extending it is cheaper:

  • Gap open: -5
  • Gap extend: -1

Affine penalties produce more biologically realistic alignments because real insertions/deletions tend to occur in blocks rather than as scattered single-base events.

let seq1 = "ATGCCCTAGCG"
let seq2 = "ATGTAGCG"
let result = Align.global(seq1, seq2)
print("Alignment:")
print(result.alignment)
print("Score: " + result.score)

Scoring Matrices

For protein alignment, simple match/mismatch scoring isn’t enough. Some amino acid substitutions are more common than others (e.g., leucine → isoleucine is conservative, while glycine → tryptophan is rare).

BLOSUM62 is the most widely used protein scoring matrix. It was derived from observed substitution frequencies in aligned protein blocks.

Key insight: BLOSUM62 assigns positive scores to amino acid pairs that substitute more often than expected by chance, and negative scores to pairs that substitute less often.

Exercise: Compare Two Sequences

Align these two DNA sequences using both global and local alignment. Which gives a higher score?

let seq1 = "AATCGATCGATCG"
let seq2 = "TCGATCG"
let global_result = Align.global(seq1, seq2)
let local_result = Align.local(seq1, seq2)
print("Global score: " + global_result.score)
print("Local score: " + local_result.score)

Exercise: Find the Best Match

Given a short query and a longer reference, use local alignment to find where the query best matches.

let reference = "AAACCCTTTGGGAAATTTCCCGGG"
let query = "TTTGGG"
let result = Align.local(reference, query)
print("Best match score: " + result.score)
print(result.alignment)

Knowledge Check

Summary

In this unit you learned:

  • Sequence alignment compares sequences by inserting gaps to maximize similarity
  • Global alignment (Needleman-Wunsch) aligns end-to-end — best for full-length comparisons
  • Local alignment (Smith-Waterman) finds the best matching subsequence — best for partial matches
  • Scoring uses match/mismatch scores and gap penalties
  • Affine gap penalties (open + extend) produce more realistic biological alignments
  • Protein alignments use substitution matrices like BLOSUM62

Powered by

cyanea-seq cyanea-align
Sequence Alignment Needleman-Wunsch Smith-Waterman Bioinformatics