Project the genomes of a thousand Europeans into a 25-dimensional mathematical space, reduce that space to its two principal axes of variation, and something extraordinary happens: a map of Europe materialises. Swedes appear top-centre, Greeks bottom-right, Irish off to the upper left, Poles upper-right — positioned almost exactly where a cartographer would place them. This phenomenon, first documented at genome-wide resolution by Novembre et al. in Nature in 2008, is not an artefact. It is the cumulative signature of ten thousand years of preferential local mating, of marriages that followed river valleys and coastlines, and of migrations whose routes geography constrained as surely as it constrained any Bronze Age wagon. The genome remembers where people lived and whom they married — and it draws a map accordingly.
Yet within this remarkably ordered picture, certain populations sit conspicuously outside where geography would assign them. The Basques, the Sardinians, the Ashkenazi Jews, and the Sicilians and southern Italians drift from their expected positions in ways that are not random noise but interpretable history. Understanding why requires tracking the three great founding events of European population history: the Neolithic farming expansion, the Bronze Age steppe migrations, and the long-running exchange of people across the Mediterranean world.
Key Findings at a Glance
- A PCA of modern Europeans closely mirrors the geography of the continent, with genetic distance correlating with geographic distance at r² ˜ 0.80 (Novembre et al. 2008).
- The primary axes track the gradient between Anatolian Neolithic farmer ancestry (EEF) — dominant in the south — and steppe-related ancestry (Yamnaya) — dominant in the north and east — with a WHG hunter-gatherer sub-axis running Atlantic to continental.
- Sardinians are the extreme EEF outlier at ~92 % Anatolian Neolithic, preserved by island isolation from every major Bronze Age and medieval migration.
- Basques carry ~20 % WHG and ~27 % steppe ancestry — substantially less steppe than their continental neighbours — reflecting partial insulation from the Bell Beaker transformation of Iberia.
- Ashkenazi Jews plot outside the European cluster entirely, midway toward the Levant, reflecting a Near Eastern founding population, southern European Greco-Roman admixture, and ~700 years of endogamy.
- Sicilians and southern Italians are displaced toward the Near East by sequential Phoenician, Greek, Roman Imperial, and Arab gene-flow spanning ~1,500 years.
I. The Geographic Mirror — Why Genes Track Geography
The three ancestral components that together explain the bulk of modern European genetic variation were introduced sequentially, and their geographic footprints followed the logic of the physical landscape:
II. The PCA — Europe's Genetic Map
The scatter plot below replicates the approach of Novembre et al. (2008): individual data points labelled with two-letter country codes, coloured by geographic region. Each country is represented by a simulated cloud of individuals distributed around its population mean in G25 PCA space. Principal Component 1 (vertical axis) tracks the north-south ancestry gradient; Principal Component 2 (horizontal axis) tracks the west-east gradient. The result is a map of Europe rotated roughly 90°, with the genetic topology mirroring the geographic one.
Simulated individual scatter around G25 population means (scaled). PC1 ˜ north–south (steppe vs EEF); PC2 ˜ west–east. Outlier populations annotated with a border. Source: Global25 scaled reference averages; Novembre et al. 2008.
III. The Basques — High WHG, Low Steppe, Linguistic Mystery
The Basque position on the European PCA is anomalous for their geography. Rather than clustering with the Spanish and French populations that surround them, they drift toward the corner occupied by Atlantic hunter-gatherer-rich populations — carrying elevated WHG ancestry (~20 %) and ~27 % steppe — less than surrounding continental populations but not as extreme a deficit as sometimes reported. The comparison with the French (~40 % steppe, ~13 % WHG) reveals a population with significantly more WHG and somewhat less steppe than their neighbours.
The Basque Ancestry Profile — Corrected Figures
G25 NNLS modelling (3-source: Sardinian Neolithic / Loschbour WHG / Yamnaya) places modern Basques at approximately:
EEF (Sardinian proxy): ~53 % • WHG (Loschbour proxy): ~20 % • Steppe (Yamnaya): ~27 %
This places them close to pre-Bell Beaker Iberian Chalcolithic individuals in G25 space, and far from the modern Spanish average (~30 % steppe) or modern French average (~40 % steppe). The WHG figure — roughly double the French value — is the primary driver of their displaced PCA position relative to Iberia and south-west France.
The ancient DNA record clarifies the mechanism. Olalde et al. (2019) showed that Bell Beaker culture arrived in Iberia around 2,500 BCE and triggered one of the most dramatic genetic transformations documented in prehistory: steppe ancestry rose from near-zero to 30–40 % across the peninsula within a few centuries, with near-complete replacement of Y-chromosome lineages. In the Basque country, this transformation was real — the Basques do carry R1b-DF27, proving Bell Beaker reached them — but attenuated. Less steppe entered, and the pre-existing WHG component, already higher in Atlantic populations than in Mediterranean ones, was less diluted. The Basque genome is accordingly a partially preserved version of the Atlantic Neolithic-plus-WHG profile that Bell Beaker encountered on the fringes of its expansion.
IV. Ancestry Profiles Across the European Gradient
The following visualises the three-way ancestry decomposition for key populations using G25 NNLS modelling (Sardinian Neolithic as EEF proxy, Loschbour WHG, Yamnaya steppe, and a Levantine Iron Age source for post-Neolithic Near Eastern input). The corrected proportions for Basques and French replace earlier erroneous figures.
V. The Sardinians — Living Relics of the European Neolithic
The Sardinian outlier is the most extreme in Europe and the most straightforward to explain. With approximately 92 % Anatolian Neolithic farmer ancestry, modern Sardinians are genetically closer to European Neolithic populations of 5,000–6,000 BCE than to any continental European population of the present day. Their steppe fraction of ~3 % is the lowest recorded in any modern western European population — compared to 40 % in France, 50 % in Sweden, or 47 % in Britain.
Sardinians as the Reference Population for Anatolian Neolithic Ancestry
Because modern Sardinians so closely approximate the Anatolian Neolithic farmers who first colonised Europe, they are used as the standard EEF proxy in G25 NNLS modelling and in formal qpADM analyses. When researchers write that an ancient population was “70 % EEF”, what this means operationally is that 70 % of their ancestry is best modelled by a modern Sardinian-like source. The island's isolation from the Bell Beaker transformation (~2,500–2,000 BCE), from Roman Imperial population flows, and from most medieval Germanic migrations explains this preservation. In Y-chromosome terms, the dominance of haplogroups I2 and G2a — the lineages Bell Beaker replaced continent-wide — is the most vivid patrilineal testament to this continuity.
VI. The Ashkenazi Jews — Levantine Origin, European Admixture, Endogamic Drift
Ashkenazi Jews cluster consistently outside the European genetic space, in an intermediate zone between southern Europe and the Levant. This position is not a product of recent geography — communities lived in central Europe for over a thousand years — but of the founding population's origin and the endogamic practices that prevented substantial admixture with surrounding populations.
Formation of the Ashkenazi Genetic Profile
Judean founding, ~1,000–500 BCE ? Greco-Roman Diaspora
Southern EU admixture, ~200 BCE–400 CE ? Rhineland Settlement
~800–1,000 CE ? Eastern European Expansion
Bottleneck ~300–400 founders, ~1200–1500 CE ? Modern Ashkenazi
Endogamy preserved Levantine profile
The Founding Bottleneck and Its Medical Consequences
Carmi et al. (2014) estimated that the ancestral Ashkenazi population passed through a severe bottleneck of approximately 300–400 individuals around 600–800 CE. This bottleneck, followed by centuries of endogamy, dramatically reduced heterozygosity and elevated the frequency of certain alleles through genetic drift. Several rare recessive conditions — Tay-Sachs disease, Gaucher disease, familial dysautonomia, specific BRCA1/2 pathogenic variants — are significantly more frequent in this population, a direct medical consequence of this demographic history rather than any selective pressure.
VII. Sicily and Southern Italy — 1,500 Years of Mediterranean Layering
Sicilians and southern Italians are not categorically detached from the European cluster the way Sardinians or Ashkenazi Jews are, but their displacement toward the Levant relative to northern Italians is robust across independent datasets. It represents the accumulated genetic record of the Mediterranean's role as the ancient world's principal artery of human movement.
VIII. Comparison Table
| Population | Steppe ~% | EEF ~% | WHG ~% | Levant ~% | Key note |
|---|---|---|---|---|---|
| Sardinians | ~3 % | ~92 % | ~5 % | <1 % | Extreme EEF outlier; Neolithic proxy for G25 modelling |
| Basques | ~27 % | ~53 % | ~20 % | <1 % | Highest WHG in W. Europe; partial Bell Beaker insulation; Euskara survivor |
| Ashkenazi Jews | ~13 % | ~50 % | ~7 % | ~30 % | Outside EU cluster; Levantine founding + Greco-Roman admixture + endogamy |
| Sicilians | ~17 % | ~57 % | ~6 % | ~20 % | Displaced toward Levant; Phoenician, Roman Imperial, Arab layers |
| Italians, South | ~20 % | ~62 % | ~6 % | ~12 % | Levantine displacement; Roman Imperial dominant contributor |
| Italians, North (Tuscany) | ~24 % | ~65 % | ~7 % | ~4 % | Lombard addition; Etruscan Bronze Age substrate |
| Greeks | ~22 % | ~65 % | ~6 % | ~7 % | High EEF; modest Levantine layer; eastern Aegean affinities |
| French (average) | ~40 % | ~47 % | ~13 % | ~0 % | Typical western European; Atlantic–continental gradient within France |
| British & Irish | ~47 % | ~40 % | ~13 % | <1 % | Atlantic zone; Bell Beaker dominant; Viking marginal addition (Norman coast) |
| Swedes / Norwegians | ~50 % | ~34 % | ~16 % | <1 % | Steppe-rich; elevated WHG Scandinavian Mesolithic substrate |
| Finns | ~45 % | ~25 % | ~15 % | <1 % | Siberian hunter-gatherer ancestry displaces them from the N-W European cluster |
IX. Myths and Realities
Common Misconception
“The Basques are Palaeolithic survivors who hid in the Pyrenees while everyone else was replaced.”
Genetic Reality
Basques are primarily descended from Anatolian Neolithic farmers (~65 % EEF), not Palaeolithic hunter-gatherers. Their distinctiveness is the elevated WHG fraction (~20 %) and steppe (~27 %) that is lower than surrounding populations but not negligible, reflecting an incomplete Bell Beaker replacement. They are a Neolithic-plus-WHG relic, not a Palaeolithic one. (Olalde et al. 2019)
Common Misconception
“Sardinians are genetically unusual because of recent island inbreeding.”
Genetic Reality
Sardinian distinctiveness is not primarily a product of recent inbreeding but of deep-time isolation from the Bronze Age migrations (~2,500–2,000 BCE) that transformed mainland Europe. They resemble the European Neolithic population of 5,000 BCE because they largely are it, genetically speaking. (Haak et al. 2015; Mathieson et al. 2015)
Common Misconception
“Ashkenazi Jews are primarily European since they have lived in Europe for over a millennium.”
Genetic Reality
Duration of residence does not equal genetic admixture. Endogamic marriage practices limited European admixture to ~35–55 % of the total. The majority of Ashkenazi ancestry traces to a Levantine founding population, and the PCA reflects that founding rather than the geography of subsequent residence. (Carmi et al. 2014)
Common Misconception
“The Arab conquest explains all the Near Eastern ancestry in Sicilians.”
Genetic Reality
The Arab Emirate of Sicily (827–1072 CE) left a detectable but modest signature. The larger contributors to Sicilian Levantine ancestry are the Roman Imperial-era slave economy and the pre-existing EEF Neolithic base, which was always more Levantine-adjacent than continental EEF. Physical pigmentation variation does not map cleanly onto these ancestry fractions. (Vai et al. 2019)
References
- Novembre J. et al. (2008). Genes mirror geography within Europe. Nature 456: 98–101. DOI:10.1038/nature07331
- Haak W. et al. (2015). Massive migration from the steppe was a source for Indo-European languages in Europe. Nature 522: 207–211. DOI:10.1038/nature14317
- Mathieson I. et al. (2015). Genome-wide patterns of selection in 230 ancient Eurasians. Nature 528: 499–503. DOI:10.1038/nature16544
- Olalde I. et al. (2019). The genomic history of the Iberian Peninsula over the past 8,000 years. Science 363: 1230–1234. DOI:10.1126/science.aav4040
- Carmi S. et al. (2014). Sequencing an Ashkenazi reference panel. Nature Communications 5: 4835. DOI:10.1038/ncomms5835
- Posth C. et al. (2021). The origin and legacy of the Etruscans. Science Advances 7(39): eabi7673. DOI:10.1126/sciadv.abi7673
- Vai S. et al. (2019). Ancestral mitochondrial N haplogroup genomes in Sicilian populations. Scientific Reports 9: 9581.
- Wright S. (1943). Isolation by distance. Genetics 28(2): 114–138.
- G25 Global25 scaled reference averages: D. Reich Laboratory, Harvard. French regional means: ExploreYourDNA dataset, N = 116.