igsr/1000 Genomes 30x
DemoHigh-coverage whole genome sequences for 3,202 samples across 26 populations worldwide.
The 1000 Genomes Project produced high-coverage (30x) whole genome sequencing data for 3,202 samples from 26 populations across 5 continents. This dataset is one of the most widely used references for population genetics and serves as a foundational imputation panel.
What’s included
- CRAM alignments for all 3,202 samples against GRCh38
- Joint-called VCFs with SNPs, indels, and structural variants
- Population metadata including super-population and sub-population labels
- Phased haplotypes suitable for imputation reference panels
Use cases
This dataset serves as the gold-standard reference for population allele frequencies, imputation panels, and benchmarking variant callers. It is commonly used in GWAS, pharmacogenomics, and ancestry estimation.
Files
GRCh38_full_analysis_set/
7.1 TB
integrated_call_samples_v3.panel
124 KB
ALL.wgs.shapeit2_integrated.v1.GRCh38.vcf.gz
890 GB
sequence_index.tsv
2.1 MB
README.md
6.2 KB