Advanced Ancestry report
Discover your origins like never before through your DNA, with powerful insights from ChatGPT – only 15 EUR!
Get My Report
OpenAI
K190 Ancient Civilizations DNA Report
Compare your DNA to 1,262 ancient samples from 190 civilizations. Only 14 EUR!
Get My Report

Introducing the Y-chromosomal Ancestral-like Reference Sequence—Improving the Capture of Human Evolutionary Information

Share on Facebook
Introducing the Y-chromosomal Ancestral-like Reference Sequence—Improving the Capture of Human Evolutionary Information
World
2025

Reference sequences are essential for reproducible genetic analyses but are often chosen without regard to evolutionary relevance within the analyzed species. The human Y chromosome is widely used in evolutionary studies, yet current references represent evolutionarily young sequences, which can cause misleading variant calling. To address this issue, we constructed a Y-chromosomal ancestral-like reference sequence to improve the detection of evolutionarily informative variants on the Y chromosome. The Y-chromosomal ancestral-like reference sequence was constructed by applying a weighted maximum parsimony approach to human and primate Y chromosome sequences. To benchmark the performance of the Y-chromosomal ancestral-like reference sequence, 40 Y chromosome short-read sequences from diverse haplogroups were aligned to Y-chromosomal ancestral-like reference sequence and existing references (GRCh37, GRCh38, and T2T-CHM13). Overall, the Y-chromosomal ancestral-like reference sequence yielded the highest and most consistent number of SNPs per sample (mean = 1,400; SD = 77), while other references yielded on average fewer variants (mean = 866 to 968) and showed greater variability across samples (SD = 457 to 531) depending on their phylogenetic distance from the reference. Additionally, alignments to the Y-chromosomal ancestral-like reference sequence resulted in calling solely SNPs with evolutionarily derived alleles, while alignments to other references resulted in calling on average 46% SNPs with ancestral alleles. This study demonstrates how the existing reference sequences fail to capture the full range of evolutionary information on the Y chromosome. The Y-chromosomal ancestral-like reference sequence improves capturing evolutionary information on the Y chromosome, making it a valuable resource for various evolutionary applications, such as TMRCA estimations and phylogenetic analyses. Finally, alongside the Y-chromosomal ancestral-like reference sequence, we provide a publicly available tool, polaryzer, to annotate variants as ancestral or derived in pre-aligned Y chromosome data.