Skip to main content
Alpha Cyanea is in public alpha. We're building in the open — expect rough edges and rapid iteration. See what's live

Biological Sequences

Beginner Genomics & Bioinformatics ~20 min

Understand DNA, RNA, and protein sequences — the foundational data types of bioinformatics.

What Are Biological Sequences?

At the heart of bioinformatics is a simple idea: biological information can be represented as sequences of characters. Just as a sentence is a string of letters, a gene is a string of nucleotides, and a protein is a string of amino acids.

There are three primary types of biological sequences:

TypeAlphabetExample
DNAA, T, G, CATGGCTAGCAAAGAC
RNAA, U, G, CAUGGCUAGCAAAGAC
Protein20 amino acidsMASKD

DNA Sequences

DNA (deoxyribonucleic acid) uses a four-letter alphabet: Adenine, Thymine, Guanine, and Cytosine. These bases pair in a specific way: A with T, and G with C.

Let’s start by working with a simple DNA sequence:

let dna = "ATGGCTAGCAAAGACTGA"
print("Sequence: " + dna)
print("Length: " + Seq.length(dna))

Complementary Strands

DNA is double-stranded. The complement of a sequence replaces each base with its pair: A↔T and G↔C.

let dna = "ATGGCTAGCAAAGAC"
let comp = Seq.complement(dna)
print("Original:    " + dna)
print("Complement:  " + comp)

The reverse complement is the complement read in reverse — this is the sequence of the opposite strand in the 5’→3’ direction.

let dna = "ATGGCTAGCAAAGAC"
let rc = Seq.reverse_complement(dna)
print("Original:           " + dna)
print("Reverse complement: " + rc)

From DNA to RNA: Transcription

During transcription, DNA is copied into RNA. The key change is that thymine (T) is replaced by uracil (U).

let dna = "ATGGCTAGCAAAGAC"
let rna = Seq.transcribe(dna)
print("DNA: " + dna)
print("RNA: " + rna)

Exercise: Transcribe and Translate

Given a DNA sequence, transcribe it to RNA and then translate it to determine the protein it encodes.

let dna = "ATGGCTAGCATTTGA"
let rna = Seq.transcribe(dna)
let protein = Seq.translate(dna)
print("RNA: " + rna)
print(protein)

From RNA to Protein: Translation

The genetic code maps three-nucleotide codons to amino acids. Translation reads the mRNA in triplets, starting from a start codon (AUG) and ending at a stop codon (UAA, UAG, or UGA).

let dna = "ATGGCTAGCAAAGACTGA"
let protein = Seq.translate(dna)
print("DNA:     " + dna)
print("Protein: " + protein)

GC Content

The GC content is the proportion of G and C bases in a DNA sequence. It affects DNA stability (G-C pairs have three hydrogen bonds vs. two for A-T) and is characteristic of different organisms.

let dna = "ATGGCTAGCAAAGACTGA"
let gc = Seq.gc_content(dna)
print("GC content: " + gc)

Exercise: Dinucleotide Frequencies

K-mers are subsequences of length k. Dinucleotides (k=2) reveal patterns that simple base frequencies miss. Use Seq.kmer_count() to find the most common dinucleotide in this AT-rich sequence.

let dna = "AATAAAGATAATAAATTTAA"
let kmers = Seq.kmer_count(dna, 2)
print(kmers)
let most_common = "AA"
print(most_common)

Exercise: Analyze a Sequence

Given a DNA sequence, compute its reverse complement and GC content.

let dna = "ATGGCTAGCGACTGGAAGC"
let rc = Seq.reverse_complement(dna)
print(rc)

Knowledge Check

Summary

In this unit you learned:

  • DNA, RNA, and protein are all representable as character sequences
  • DNA bases pair: A↔T and G↔C
  • The reverse complement gives the opposite strand in 5’→3’ direction
  • Transcription replaces T with U (DNA → RNA)
  • Translation reads codons (triplets) to produce amino acids
  • GC content measures sequence stability and varies across organisms

Powered by

cyanea-seq
Sequences DNA RNA Protein Bioinformatics