Skip to main content

igsr/1000 Genomes 30x

Demo
H. sapiens 8.2 TB CC0 vPhase 3 GRCh38 Updated 1 week ago
This is a demo page showing what a dataset detail page looks like on Cyanea. The data shown is illustrative.

High-coverage whole genome sequences for 3,202 samples across 26 populations worldwide.

The 1000 Genomes Project produced high-coverage (30x) whole genome sequencing data for 3,202 samples from 26 populations across 5 continents. This dataset is one of the most widely used references for population genetics and serves as a foundational imputation panel.

What’s included

  • CRAM alignments for all 3,202 samples against GRCh38
  • Joint-called VCFs with SNPs, indels, and structural variants
  • Population metadata including super-population and sub-population labels
  • Phased haplotypes suitable for imputation reference panels

Use cases

This dataset serves as the gold-standard reference for population allele frequencies, imputation panels, and benchmarking variant callers. It is commonly used in GWAS, pharmacogenomics, and ancestry estimation.

Files

GRCh38_full_analysis_set/ 7.1 TB
integrated_call_samples_v3.panel 124 KB
ALL.wgs.shapeit2_integrated.v1.GRCh38.vcf.gz 890 GB
sequence_index.tsv 2.1 MB
README.md 6.2 KB

Formats

CRAM VCF FASTQ

Tags

population genetics reference panel WGS diversity