This article is a guest contribution by Steven Parker. It presents his interpretation of the genetic relationship between Ashkenazi Jews and Southern Italians, based on Global25 modeling and reference-panel analysis. The views and conclusions expressed here are those of the author and do not necessarily reflect the position of ExploreYourDNA.
This piece was written in response to our article "Why Ashkenazi Jews and Southern Italians Cluster Together Genetically". An editorial response from Jérôme follows at the end.
Ashkenazi Origins and Why Southern Italy Matters
Table of Contents
Context: The Article Under Discussion
This guest contribution responds to "Why Ashkenazi Jews and Southern Italians Cluster Together Genetically", which argues that the close PCA and G25 proximity between the two populations reflects convergent demographic history rather than direct shared ancestry. Steven Parker offers an alternative interpretation, treating Southern Italy as a direct ancestral source rather than a modeling proxy.
Introduction
I appreciate Jérôme and ExploreYourDNA for hosting substantive disagreement on a topic that is usually treated as closed. Public, testable arguments are more valuable than private email debates, especially when the disagreement is about modeling assumptions rather than about whether the clustering exists.
I am responding to Jérôme's post, "Why Ashkenazi Jews and Southern Italians Cluster Together Genetically." The empirical observation is not in dispute: across PCA, Global25 (G25), and standard genetic distance measures such as FST, Ashkenazi Jews repeatedly sit extremely close to Southern Italians and Sicilians. The dispute is what that proximity is allowed to mean.
My claim is straightforward: the tight Ashkenazi, Southern Italian overlap reflects shared ancestry transmitted through a common population history in Italy and the central Mediterranean. Explanations that reclassify this overlap as non-ancestral "convergence" do not match the behavior of genome-wide data once Southern Italy and Italian Jews are treated as real source populations rather than as interchangeable background.
1. What the geometry constrains
PCA and distance space are summaries of genome-wide covariance. When two populations sit nearly on top of each other across several independent summaries, that structure imposes hard constraints. If Ashkenazi Jews carried substantially more ancestry components absent in Southern Italians, that difference would manifest as displacement in PCA space and higher pairwise FST. When tight overlap persists across PCA, FST, and G25 distances, plausible ancestry proportions are bounded regardless of how any supervised components are labeled.
"Convergent demographic history" can be a real phenomenon, but it is a hypothesis with expectations, not a label that replaces measurement. Convergence means distinct historical pathways producing broadly similar mixtures. When that happens, similarity is usually incomplete: populations may approach one another along a coarse axis while remaining separable in PCA, maintaining nontrivial pairwise FST, or showing asymmetric distance profiles at higher resolution. Near-identity across multiple measures is the pattern more naturally expected under shared ancestry transmitted through a common population history.
Applied here, the key point is that large Levantine estimates are not free parameters. If a model claims Ashkenazim are dramatically more Levantine-shifted than Southern Italians, it must also explain why this does not produce a corresponding Levantine displacement in the unsupervised geometry. The geometry is the constraint; labels are downstream.
2. The proxy problem: how "Italian" gets defined
The core failure in the prevailing framework is not simply that Southern Italy is underemphasized. It is that Southern Italy is often treated as a dead zone in the modeling logic. Many pipelines omit Southern Italy and Italian Jews as explicit sources and instead define "Italian" ancestry using northern-shifted proxies, typically Northern Italian or Tuscan references.
Those proxies do not occupy the same genetic space as Southern Italians, Magna Graecia populations, or extant Italian Jewish groups. When Southern Italy and Italian Jews are excluded from the plausible source set, proxy substitution becomes unavoidable. Shared southern Mediterranean ancestry cannot be captured by the "Italian" component as defined, so it is displaced into residual components, most often labeled "Levantine." This displacement is a mechanical consequence of reference choice, not an inference about population history.
Stated simply, if the Italian endpoint is defined too far north, the model is forced to compensate elsewhere. That compensation typically appears as inflated Near Eastern terms that perform double duty, absorbing genuine Near Eastern ancestry alongside southern Mediterranean ancestry that northern-shifted Italian proxies cannot represent.
3. Southern Italy and Italian Jews are not optional
Italian Jewish communities (the Italkim) are the historically documented bridge between Roman and Late Antique Jewish communities in Southern Italy and later European Jewish populations. If the goal is to model how Ashkenazim formed, Italkim and Southern Italy cannot be treated as peripheral or interchangeable background. They are the relevant ancestral corridor.
Figure 1. Global25 similarity maps for Maltese and for Italkim Jews. In both cases, the strongest similarity localizes to Southern Italy, Sicily, and adjacent central Mediterranean regions. Neither map shows primary anchoring in Northern Italy or Tuscany, and the overall geographic structure of the two maps is closely comparable.
Figure 2. Global25 distance between Italkim Jews and Maltese is 0.0239. That value falls within the range observed between Maltese and multiple Southern Italian populations, which is consistent with normal intra-regional variation rather than with a distinct external source population.
Figure 3. Global25 distance rankings show Ashkenazi populations (for example, Germany and Russia) with their closest affinities to Maltese and to multiple Southern Italian populations. The distances are comparable to those observed among Southern Italians themselves, while non-European Jewish populations sit substantially farther away in the same ranking. This is what you expect if the southern Italian, central Mediterranean space is part of the direct ancestry history, not a coincidental destination created by parallel mixtures.
Figure 4. A control that clarifies asymmetric interpretation of identical geometric signals. It reports Global25 distances from an Ashkenazi German reference to Southern Italian and Sicilian populations (Maltese 0.0180; Calabria 0.0182; Campania 0.0207; Sicilian East 0.0245; Apulia 0.0278; Sicilian West 0.0288) and, in parallel, distances from a Yemenite Jewish reference to Yemeni Muslim populations (Al Bayda 0.0216; Dhamar 0.0217; Amran 0.0255). The magnitudes are comparable: in both cases, the Jewish reference lies essentially on top of the nearest regional populations in the same distance space.
In the Yemenite case, this pattern is routinely treated as evidence of local ancestry transmission through conversion and assimilation, with no special explanatory category invoked to avoid the obvious inference. Yet when Ashkenazim fall at comparable distances to Southern Italians and Sicilians, the same geometry is often reclassified as non-ancestral and explained away as "convergent demographic history." That asymmetry does not arise from the data. It arises from prior narrative commitments about Ashkenazi origins.
4. An ancient-only check
Ancient-only analysis, using only ancient samples as references, provides a separate way to test whether the central Mediterranean signal is merely a proxy effect. If large Levantine ancestry is truly required to anchor Ashkenazi similarity, then when Levantine ancient references are abundant, explicitly Levantine sources should emerge as necessary similarity anchors.
Figure 5. Ancient-only Global25 fit for Ashkenazi Jews using DNAGenics G25 Studio "G25 Official Ancient Individuals (Oct 2024)" (algorithm: Montecarlo v3; distance metric: Chebyshev; selected populations: 11,514 of 11,546; fit: 0.49, where lower values indicate a closer match between the modeled mixture and the target in Global25 space). The best fit is achieved using almost exclusively central and eastern Mediterranean sources. The largest contributors are Italy Medieval (28.60%) and Italy Lazio Viterbo Earlymedieval (23.60%), followed by Croatia Roman Lateimperial (12.60%) and Italy Imperial (10.60%). Additional signal is absorbed by Italy Collegno Langobards Earlymedieval (8.60%), Turkey Marmara Balikesir Byzantine (6.40%), Italy Medieval OAegean (5.60%), Italy Lateantiquity (2.00%), Italy Medieval Earlymodern (1.40%), and Italy Medieval O2 (0.60%).
The point is not that Levantine ancient populations are absent from the reference space. They are present in abundance, including broad pan-Levantine coverage and a large set of Bronze and Iron Age southern Levant individuals (Israelite-period and adjacent contexts), alongside many later Levantine samples. Rather, even under exhaustive Levantine representation, the best-fitting anchors for Ashkenazim localize to Roman, late antique, and medieval Italy and adjacent eastern Mediterranean provincial contexts. This outcome is difficult to reconcile with narratives that treat Ashkenazi similarity to Southern Italians as a coincidence produced by parallel mixtures of "Southern European plus Levantine."
Conclusion
The Ashkenazi, Southern Italian proximity is not disputed. What is disputed is the modeling framework that decides in advance which populations are allowed to count as sources. When Southern Italy and Italian Jews are omitted, and "Italian" is defined using northern-shifted proxies, misassignment is inevitable: shared southern Mediterranean ancestry is displaced into residual terms and the residual is labeled Levantine. The resulting proportions can look stable across papers because the same proxy substitutions and omissions are being repeated.
By contrast, if Southern Italy and Italian Jews are treated as real ancestral populations, the overall pattern becomes coherent without special pleading. Tight PCA overlap, low pairwise distances, and ancient-only similarity anchors align with a predominantly Italo-Mediterranean and Greco-Roman core, with later northern and eastern inputs layered on top.
A falsifiable prediction follows. Any model that includes Southern Italy and Italkim as explicit sources and still insists on very large Levantine proportions should also predict a measurable Levantine displacement relative to Southern Italians in the unsupervised geometry. If that displacement is not there, the claimed proportions are not consistent with the constraints imposed by the data structure.
For the full technical treatment, figures, and reference-panel discussion:
Read the full preprint on Preprints.org →Editorial Response, Jérôme (ExploreYourDNA)
Steven raises a valid methodological point: if Southern Italy is excluded as a source, the “Levantine” label in admixture models ends up doing double duty, absorbing both genuine Near Eastern ancestry and unmodeled southern Mediterranean signal. That concern is fair.
Where I disagree is on magnitude. Southern Italian proxies themselves carry significant Levantine-related ancestry from centuries of Phoenician, Greek, and Roman-era contact. Using them as a source mechanically compresses the Levantine estimate downward, which is the mirror image of the inflation problem Steven describes. Ancient DNA from Erfurt and Norwich, IBD analyses, and the broader literature all support a Levantine contribution well above ~20%.
The real answer probably lies between Steven’s estimate and older dual-ancestry framings. This article sharpens the question, and I encourage readers to engage with the data and draw their own conclusions.
, Jérôme, ExploreYourDNA