Allele frequency patterns of 78,122,254 SNPs illuminate diverse possible evolutionary trajectories of human populations
World
Study Information
Abstract
The history of human population migration and mixing is manifested in the allele frequency variations of all SNPs in the genomes. We proposed an algorithm to infer the population allele frequency patterns of individual loci and identified 71 unique patterns from the 78,122,254 loci of the 1000 Genomes (1000G) data. In nearly 90% of the loci, the major alleles dominate the subjects from all populations. Yet the majority of the remaining loci possess 38 “linear” patterns where the population frequencies of homozygote major and minor alleles form a sequential relation (e.g., ABC) on a 2D plane. The inferred patterns of over 70% of the loci are verified in an independent Human Genome Population Diversity (HGDP) data. The loci of most allele frequency patterns are unevenly distributed along the chromosomes and concentrated on narrow hot spots. The hot spots of some allele frequency patterns yield significant enrichments of the loci associated with selected phenotypes and gene functions, such as the hot spot on chr6 32–33 Mb harboring Human Leukocyte Antigen (HLA) genes. Based on the distributed linear patterns at continental and population levels, we develop a local ancestry inference algorithm to dissect chromosomes of a mixed subject into the tracts of different ancestral origins, and validate the algorithm on 1000G and HGDP data. Finally, most but not all observed linear patterns are compatible with the simulation outcomes of a population genetics model driven by neutral drifts. Yet the small differences of the top-ranking patterns’ occurrences between the empirical and simulated data remain consistent and statistically significant, implying the presence of evolutionary processes beyond neutral drift. The causes of the observed linear patterns are by no means conclusive, as consistence with the neutral evolution hypothesis is tentative and model-dependent, and the dominance of neutral drifts in genome evolution has been challenged by theoretical and empirical studies. To sum up, we demonstrate the diverse population allele frequency patterns of individual loci and indicate their structural and functional implications on human genome.