Project the genomes of a thousand Europeans into a 25-dimensional mathematical space, reduce that space to its two principal axes of variation, and something extraordinary happens: a map of Europe materialises. Swedes appear top-centre, Greeks bottom-right, Irish off to the upper left, Poles upper-right — positioned almost exactly where a cartographer would place them. This phenomenon, first documented at genome-wide resolution by Novembre et al. in Nature in 2008, is not an artefact. It is the cumulative signature of ten thousand years of preferential local mating, of marriages that followed river valleys and coastlines, and of migrations whose routes geography constrained as surely as it constrained any Bronze Age wagon. The genome remembers where people lived and whom they married — and it draws a map accordingly.

Yet within this remarkably ordered picture, certain populations sit conspicuously outside where geography would assign them. The Basques, the Sardinians, the Ashkenazi Jews, and the Sicilians and southern Italians drift from their expected positions in ways that are not random noise but interpretable history. Understanding why requires tracking the three great founding events of European population history: the Neolithic farming expansion, the Bronze Age steppe migrations, and the long-running exchange of people across the Mediterranean world.

Key Findings at a Glance

  • A PCA of modern Europeans closely mirrors the geography of the continent, with genetic distance correlating with geographic distance at r² ˜ 0.80 (Novembre et al. 2008).
  • The primary axes track the gradient between Anatolian Neolithic farmer ancestry (EEF) — dominant in the south — and steppe-related ancestry (Yamnaya) — dominant in the north and east — with a WHG hunter-gatherer sub-axis running Atlantic to continental.
  • Sardinians are the extreme EEF outlier at ~92 % Anatolian Neolithic, preserved by island isolation from every major Bronze Age and medieval migration.
  • Basques carry ~20 % WHG and ~27 % steppe ancestry — substantially less steppe than their continental neighbours — reflecting partial insulation from the Bell Beaker transformation of Iberia.
  • Ashkenazi Jews plot outside the European cluster entirely, midway toward the Levant, reflecting a Near Eastern founding population, southern European Greco-Roman admixture, and ~700 years of endogamy.
  • Sicilians and southern Italians are displaced toward the Near East by sequential Phoenician, Greek, Roman Imperial, and Arab gene-flow spanning ~1,500 years.

I. The Geographic Mirror — Why Genes Track Geography

The three ancestral components that together explain the bulk of modern European genetic variation were introduced sequentially, and their geographic footprints followed the logic of the physical landscape:

~6,000–4,000 BCE
Neolithic Farming Spread
Anatolian EEF farmers spread north-west from the Near East via the Danubian corridor and the Mediterranean coast. Their ancestry remains dominant in southern and insular Europe today.
~8,000–5,000 BCE
WHG Substrate
Western Hunter-Gatherers occupied Europe before the farmers. Their ancestry was partially absorbed into the farming population and remains elevated — up to 20 % — in Atlantic Europe and among the Basques.
~3,000–2,000 BCE
Steppe Migration
Yamnaya-related pastoralists swept from the Pontic-Caspian steppe, transforming northern and central Europe. French populations today carry ~40 % steppe ancestry; Scandinavians reach 45–48 %.
~500 BCE–500 CE
Mediterranean Exchange
Phoenician, Greek, and Roman networks moved people at unprecedented scale. Levantine and eastern Mediterranean ancestry was added to coastal southern Europe, tilting those populations eastward on the PCA.
~400–900 CE
Medieval Migrations
Germanic migrations redistributed northern European ancestry southward. Viking contacts extended northern affinities into Normandy and the British Isles, reinforcing the north-south steppe gradient.
~900 CE – present
Relative Stability
From the High Middle Ages, major population-level genetic shifts ceased across most of Europe. The PCA structure visible today was largely set by ~1000 CE and maintained by isolation-by-distance.

II. The PCA — Europe's Genetic Map

The scatter plot below replicates the approach of Novembre et al. (2008): individual data points labelled with two-letter country codes, coloured by geographic region. Each country is represented by a simulated cloud of individuals distributed around its population mean in G25 PCA space. Principal Component 1 (vertical axis) tracks the north-south ancestry gradient; Principal Component 2 (horizontal axis) tracks the west-east gradient. The result is a map of Europe rotated roughly 90°, with the genetic topology mirroring the geographic one.

Principal Component Analysis — European Populations (Novembre-style, G25 reference) Coloured by geographic region • hover for population name

Simulated individual scatter around G25 population means (scaled). PC1 ˜ north–south (steppe vs EEF); PC2 ˜ west–east. Outlier populations annotated with a border. Source: Global25 scaled reference averages; Novembre et al. 2008.

III. The Basques — High WHG, Low Steppe, Linguistic Mystery

The Basque position on the European PCA is anomalous for their geography. Rather than clustering with the Spanish and French populations that surround them, they drift toward the corner occupied by Atlantic hunter-gatherer-rich populations — carrying elevated WHG ancestry (~20 %) and ~27 % steppe — less than surrounding continental populations but not as extreme a deficit as sometimes reported. The comparison with the French (~40 % steppe, ~13 % WHG) reveals a population with significantly more WHG and somewhat less steppe than their neighbours.

The Basque Ancestry Profile — Corrected Figures

G25 NNLS modelling (3-source: Sardinian Neolithic / Loschbour WHG / Yamnaya) places modern Basques at approximately:

EEF (Sardinian proxy): ~53 %WHG (Loschbour proxy): ~20 %Steppe (Yamnaya): ~27 %

This places them close to pre-Bell Beaker Iberian Chalcolithic individuals in G25 space, and far from the modern Spanish average (~30 % steppe) or modern French average (~40 % steppe). The WHG figure — roughly double the French value — is the primary driver of their displaced PCA position relative to Iberia and south-west France.

The ancient DNA record clarifies the mechanism. Olalde et al. (2019) showed that Bell Beaker culture arrived in Iberia around 2,500 BCE and triggered one of the most dramatic genetic transformations documented in prehistory: steppe ancestry rose from near-zero to 30–40 % across the peninsula within a few centuries, with near-complete replacement of Y-chromosome lineages. In the Basque country, this transformation was real — the Basques do carry R1b-DF27, proving Bell Beaker reached them — but attenuated. Less steppe entered, and the pre-existing WHG component, already higher in Atlantic populations than in Mediterranean ones, was less diluted. The Basque genome is accordingly a partially preserved version of the Atlantic Neolithic-plus-WHG profile that Bell Beaker encountered on the fringes of its expansion.

WHG ancestry — Basques vs French
~20 % vs ~13 %
The elevated WHG fraction among Basques is the primary axis of PCA displacement. Atlantic hunter-gatherers were never fully replaced by EEF farmers in the western Pyrenean zone, and the Bell Beaker steppe wave diluted them less than elsewhere.
Steppe ancestry — Basques vs French
~27 % vs ~40 %
The Basque steppe fraction (~27 %) is lower than the French (~40 %), Spanish (~32 %), or British (~47 %), reflecting a less complete Bell Beaker demographic replacement in the Pyrenean zone, consistent with the survival of non-Indo-European Euskara alongside a reduced but real steppe input.
Language isolate — pre-Indo-European survivor
Euskara
The only pre-Indo-European language still spoken in western Europe. Its survival mirrors the genetic pattern: Bell Beaker arrived but did not fully replace. Where the steppe demographic replacement was less complete, the associated Indo-European language replacement was also resisted.

IV. Ancestry Profiles Across the European Gradient

The following visualises the three-way ancestry decomposition for key populations using G25 NNLS modelling (Sardinian Neolithic as EEF proxy, Loschbour WHG, Yamnaya steppe, and a Levantine Iron Age source for post-Neolithic Near Eastern input). The corrected proportions for Basques and French replace earlier erroneous figures.

Sardinians — extreme EEF outlier, Neolithic proxy Modern Sardinia • Haak et al. 2015
EEF, Anatolian Neolithic-related ~92 %
 
WHG, Western Hunter-Gatherer ~5 %
 
Steppe, Yamnaya-related ~3 %
 
EEF (Anatolian Neolithic) WHG (Western Hunter-Gatherer) Steppe (Yamnaya-related) Levant / Near Eastern (post-Neolithic)
Basques — elevated WHG (~20 %), moderate steppe (~27 %); partial Bell Beaker insulation Modern Basque Country • Olalde et al. 2019; G25 NNLS
EEF, Anatolian Neolithic-related ~53 %
 
Steppe, Yamnaya-related ~27 %
 
WHG, Western Hunter-Gatherer ~20 %
 
French — typical west European; ~40 % steppe, ~13 % WHG National average • ExploreYourDNA N=116; G25 NNLS
Steppe, Yamnaya-related ~40 %
 
EEF, Anatolian Neolithic-related ~47 %
 
WHG, Western Hunter-Gatherer ~13 %
 
British & Irish — Atlantic zone, high steppe (~47 %) England / Ireland / Scotland • G25 NNLS
Steppe, Yamnaya-related ~47 %
 
EEF, Anatolian Neolithic-related ~40 %
 
WHG, Western Hunter-Gatherer ~13 %
 
Swedes — northern steppe-rich profile Modern Sweden • G25 NNLS
Steppe, Yamnaya-related ~50 %
 
EEF, Anatolian Neolithic-related ~34 %
 
WHG, Western Hunter-Gatherer ~16 %
 
Italians, North (Tuscany) — Etruscan Bronze Age substrate + Lombard addition Modern Tuscany • G25 NNLS; Posth et al. 2021
EEF, Anatolian Neolithic-related ~65 %
 
Steppe, Yamnaya-related ~24 %
 
WHG, Western Hunter-Gatherer ~7 %
 
Levant / Near Eastern (Roman Imperial residual) ~4 %
 
Sicilians — accumulated Levantine flux: Phoenician, Roman, Arab Modern Sicily • Vai et al. 2019; G25 NNLS
EEF, Anatolian Neolithic-related ~57 %
 
Levant / Near Eastern (post-Neolithic influx) ~20 %
 
Steppe, Yamnaya-related ~17 %
 
WHG, Western Hunter-Gatherer ~6 %
 
Ashkenazi Jews — Levantine founding + southern European admixture + endogamic drift Modern Ashkenazi diaspora • Carmi et al. 2014; G25 NNLS
EEF / Levantine Neolithic base ~50 %
 
Levant Iron Age-related (Judean founding) ~30 %
 
Steppe (via European admixture) ~13 %
 
WHG, Western Hunter-Gatherer ~7 %
 

V. The Sardinians — Living Relics of the European Neolithic

The Sardinian outlier is the most extreme in Europe and the most straightforward to explain. With approximately 92 % Anatolian Neolithic farmer ancestry, modern Sardinians are genetically closer to European Neolithic populations of 5,000–6,000 BCE than to any continental European population of the present day. Their steppe fraction of ~3 % is the lowest recorded in any modern western European population — compared to 40 % in France, 50 % in Sweden, or 47 % in Britain.

Sardinians as the Reference Population for Anatolian Neolithic Ancestry

Because modern Sardinians so closely approximate the Anatolian Neolithic farmers who first colonised Europe, they are used as the standard EEF proxy in G25 NNLS modelling and in formal qpADM analyses. When researchers write that an ancient population was “70 % EEF”, what this means operationally is that 70 % of their ancestry is best modelled by a modern Sardinian-like source. The island's isolation from the Bell Beaker transformation (~2,500–2,000 BCE), from Roman Imperial population flows, and from most medieval Germanic migrations explains this preservation. In Y-chromosome terms, the dominance of haplogroups I2 and G2a — the lineages Bell Beaker replaced continent-wide — is the most vivid patrilineal testament to this continuity.

VI. The Ashkenazi Jews — Levantine Origin, European Admixture, Endogamic Drift

Ashkenazi Jews cluster consistently outside the European genetic space, in an intermediate zone between southern Europe and the Levant. This position is not a product of recent geography — communities lived in central Europe for over a thousand years — but of the founding population's origin and the endogamic practices that prevented substantial admixture with surrounding populations.

Formation of the Ashkenazi Genetic Profile

Levantine Iron Age
Judean founding, ~1,000–500 BCE
? Greco-Roman Diaspora
Southern EU admixture, ~200 BCE–400 CE
? Rhineland Settlement
~800–1,000 CE
? Eastern European Expansion
Bottleneck ~300–400 founders, ~1200–1500 CE
? Modern Ashkenazi
Endogamy preserved Levantine profile

The Founding Bottleneck and Its Medical Consequences

Carmi et al. (2014) estimated that the ancestral Ashkenazi population passed through a severe bottleneck of approximately 300–400 individuals around 600–800 CE. This bottleneck, followed by centuries of endogamy, dramatically reduced heterozygosity and elevated the frequency of certain alleles through genetic drift. Several rare recessive conditions — Tay-Sachs disease, Gaucher disease, familial dysautonomia, specific BRCA1/2 pathogenic variants — are significantly more frequent in this population, a direct medical consequence of this demographic history rather than any selective pressure.

VII. Sicily and Southern Italy — 1,500 Years of Mediterranean Layering

Sicilians and southern Italians are not categorically detached from the European cluster the way Sardinians or Ashkenazi Jews are, but their displacement toward the Levant relative to northern Italians is robust across independent datasets. It represents the accumulated genetic record of the Mediterranean's role as the ancient world's principal artery of human movement.

~900–300 BCE
Phoenician & Greek Colonisation
Carthaginian colonies in western Sicily and Greek colonies across the east introduced Levantine and eastern Mediterranean populations into coastal urban centres. Limited but real genetic contribution to the island.
~241 BCE–476 CE
Roman Sicily
Sicily was Rome's grain basket, worked largely by enslaved people drawn from across the eastern Mediterranean. Ancient DNA from Roman-period Sicilian sites shows elevated Near Eastern profiles consistent with this slave-labour economy.
~827–1072 CE
Arab Emirate of Sicily
Muslim conquest established an emirate ruling Sicily for over two centuries. North African Berber and Arab populations settled parts of the island. Vai et al. (2019) identify a detectable North African component in modern Sicilians consistent with this period.
~1072–1194 CE
Norman Kingdom
Norman conquest introduced a small northern European component, partially offsetting the Near Eastern displacement accumulated over the preceding centuries. The net result is a mixed profile unique in the Mediterranean.

VIII. Comparison Table

Population Steppe ~% EEF ~% WHG ~% Levant ~% Key note
Sardinians ~3 % ~92 % ~5 % <1 % Extreme EEF outlier; Neolithic proxy for G25 modelling
Basques ~27 % ~53 % ~20 % <1 % Highest WHG in W. Europe; partial Bell Beaker insulation; Euskara survivor
Ashkenazi Jews ~13 % ~50 % ~7 % ~30 % Outside EU cluster; Levantine founding + Greco-Roman admixture + endogamy
Sicilians ~17 % ~57 % ~6 % ~20 % Displaced toward Levant; Phoenician, Roman Imperial, Arab layers
Italians, South ~20 % ~62 % ~6 % ~12 % Levantine displacement; Roman Imperial dominant contributor
Italians, North (Tuscany) ~24 % ~65 % ~7 % ~4 % Lombard addition; Etruscan Bronze Age substrate
Greeks ~22 % ~65 % ~6 % ~7 % High EEF; modest Levantine layer; eastern Aegean affinities
French (average) ~40 % ~47 % ~13 % ~0 % Typical western European; Atlantic–continental gradient within France
British & Irish ~47 % ~40 % ~13 % <1 % Atlantic zone; Bell Beaker dominant; Viking marginal addition (Norman coast)
Swedes / Norwegians ~50 % ~34 % ~16 % <1 % Steppe-rich; elevated WHG Scandinavian Mesolithic substrate
Finns ~45 % ~25 % ~15 % <1 % Siberian hunter-gatherer ancestry displaces them from the N-W European cluster

IX. Myths and Realities

Common Misconception

“The Basques are Palaeolithic survivors who hid in the Pyrenees while everyone else was replaced.”

Genetic Reality

Basques are primarily descended from Anatolian Neolithic farmers (~65 % EEF), not Palaeolithic hunter-gatherers. Their distinctiveness is the elevated WHG fraction (~20 %) and steppe (~27 %) that is lower than surrounding populations but not negligible, reflecting an incomplete Bell Beaker replacement. They are a Neolithic-plus-WHG relic, not a Palaeolithic one. (Olalde et al. 2019)

Common Misconception

“Sardinians are genetically unusual because of recent island inbreeding.”

Genetic Reality

Sardinian distinctiveness is not primarily a product of recent inbreeding but of deep-time isolation from the Bronze Age migrations (~2,500–2,000 BCE) that transformed mainland Europe. They resemble the European Neolithic population of 5,000 BCE because they largely are it, genetically speaking. (Haak et al. 2015; Mathieson et al. 2015)

Common Misconception

“Ashkenazi Jews are primarily European since they have lived in Europe for over a millennium.”

Genetic Reality

Duration of residence does not equal genetic admixture. Endogamic marriage practices limited European admixture to ~35–55 % of the total. The majority of Ashkenazi ancestry traces to a Levantine founding population, and the PCA reflects that founding rather than the geography of subsequent residence. (Carmi et al. 2014)

Common Misconception

“The Arab conquest explains all the Near Eastern ancestry in Sicilians.”

Genetic Reality

The Arab Emirate of Sicily (827–1072 CE) left a detectable but modest signature. The larger contributors to Sicilian Levantine ancestry are the Roman Imperial-era slave economy and the pre-existing EEF Neolithic base, which was always more Levantine-adjacent than continental EEF. Physical pigmentation variation does not map cleanly onto these ancestry fractions. (Vai et al. 2019)

Anchor populations & French regional averages — G25 scaled, for use in Vahaduo
Italy_Sardinia_N,0.1245924,0.1770148,0.0430786,-0.0589599,0.0789968,-0.0310642,-0.0053689,-0.0030176,0.0624742,0.0996131,0.0036849,0.0183414,-0.0352669,-0.0129788,-0.0199822,-0.0060072,0.0088762,0.0019295,0.0056758,-0.0115633,0.0018525,0.0027775,-0.012211,-0.0296705,-0.0002857 WHG,0.1246365,0.116278,0.184789,0.189279,0.1546445,0.0464355,0.0131605,0.0372675,0.0890705,0.017768,-0.0153455,-0.015811,0.0159065,-0.0030275,0.053338,0.0582065,0.00502,0.016343,-0.0093015,0.055589,0.0944585,0.0111905,-0.049607,-0.160866,0.0170045 Yamnaya_RUS_Samara,0.1255849,0.089028,0.0426986,0.1153479,-0.0287232,0.0450564,0.0036033,-0.0025642,-0.0559032,-0.0728943,0.0018222,3.32e-05,-0.0026924,-0.0233041,0.0366141,0.0157633,-0.0012316,-0.0017879,-0.0038408,0.0137704,-0.0031749,0.0007557,0.0110649,0.0186102,-0.004537 TUR_Barcin_N,0.1175998,0.180118,0.0035312,-0.101158,0.0510443,-0.0483875,-0.0043582,-0.0069334,0.0362287,0.0807473,0.0079718,0.0118803,-0.0234545,0.0004691,-0.0419807,-0.0101913,0.0233091,0.0019866,0.0136954,-0.0097489,-0.0142249,0.0057723,-0.0041232,-0.0031658,-0.0043437 Levant_PPNB,0.072847,0.1639064,-0.0316026,-0.1361132,0.0332986,-0.0645352,-0.0134426,-0.0147684,0.0741604,0.03601,0.0188046,-0.0150764,0.035738,0.0025596,-0.0217696,0.006099,0.0098048,-0.0013176,-0.0047264,0.0188088,-0.001797,0.0071472,0.0008872,-0.0056874,-0.0037602 France_Brittany_(N=17),0.1310976,0.1364394,0.0606275,0.04142,0.0388126,0.0148468,0.0035112,0.0042759,0.0053176,0.0075039,-0.0054256,0.0051041,-0.0152945,-0.0138998,0.0197514,0.0053894,-0.0060436,0.0012744,0.0020628,0.0034281,0.0044774,0.0037096,-2.89e-05,0.0104053,0.0010777 France_Alsace_(N=3),0.1271023,0.1418357,0.051917,0.0162577,0.0377507,0.0143163,0.0038383,-0.0003077,-0.0008867,0.0066213,-0.003681,1e-04,-0.0119423,-0.0057343,0.0086407,-0.001591,-0.0068233,0.0021533,0.0036033,0.003293,0.0067383,0.0023903,-0.0025883,0.007953,-0.002395 France_Languedoc-Roussillon_(N=5),0.1299858,0.1464396,0.044802,0.0060078,0.0390226,0.000223,0.000282,0.0040614,0.0220888,0.0280644,-0.0026306,0.0067738,-0.0143012,-0.0097436,0.010939,-0.0040572,-0.0096222,0.003015,0.000176,-0.0033766,0.006189,0.0028194,-0.001972,-0.001663,0.0015806

References

  1. Novembre J. et al. (2008). Genes mirror geography within Europe. Nature 456: 98–101. DOI:10.1038/nature07331
  2. Haak W. et al. (2015). Massive migration from the steppe was a source for Indo-European languages in Europe. Nature 522: 207–211. DOI:10.1038/nature14317
  3. Mathieson I. et al. (2015). Genome-wide patterns of selection in 230 ancient Eurasians. Nature 528: 499–503. DOI:10.1038/nature16544
  4. Olalde I. et al. (2019). The genomic history of the Iberian Peninsula over the past 8,000 years. Science 363: 1230–1234. DOI:10.1126/science.aav4040
  5. Carmi S. et al. (2014). Sequencing an Ashkenazi reference panel. Nature Communications 5: 4835. DOI:10.1038/ncomms5835
  6. Posth C. et al. (2021). The origin and legacy of the Etruscans. Science Advances 7(39): eabi7673. DOI:10.1126/sciadv.abi7673
  7. Vai S. et al. (2019). Ancestral mitochondrial N haplogroup genomes in Sicilian populations. Scientific Reports 9: 9581.
  8. Wright S. (1943). Isolation by distance. Genetics 28(2): 114–138.
  9. G25 Global25 scaled reference averages: D. Reich Laboratory, Harvard. French regional means: ExploreYourDNA dataset, N = 116.