A few thousand polytheists in the high valleys of the Hindu Kush, the Kalash attract a particular kind of legend: lost soldiers of Alexander the Great, blue-eyed survivors of a pure European or Aryan past, a people somehow frozen in time. Ancient and modern DNA now let us test every part of that story. The short answer is double-edged. The Kalash are a real Indo-Iranian-speaking population carrying genuine steppe ancestry, and they sit firmly among their Hindu Kush neighbours. But the romantic relic, the Greek descent and the idea of a frozen ancient people are myths. What makes the Kalash look exotic on a genetic map is not antiquity. It is drift.
Key points
- The Kalash speak a Dardic (Indo-Aryan) language and carry the same three ancestral ingredients as everyone around them: an indigenous South Asian component (AASI), an Iranian-farmer component, and Bronze Age steppe (Steppe_MLBA).
- Their closest living relatives are right next door. The Kalash average sits about 23 Global25 units from the Nuristani, 27 to 30 from Pashtun groups, and 35 from the Kho. They are a Hindu Kush population, not a transplant from elsewhere.
- The Iron Age Swat valley (about 1000 BCE) is one of their closest ancient matches, about 49 units away, pointing to deep local continuity rather than a recent arrival.
- They are among the most steppe-shifted South Asians: about 37 percent Steppe_MLBA in a deep model, more than the Iron Age Swat itself and similar to Pashtun, with the lowest indigenous (AASI) share of the regional panel.
- Steppe_MLBA is itself about 80 percent Yamnaya. So in the standard model the Kalash carry roughly 30 percent Yamnaya-equivalent ancestry, and their steppe is, if anything, slightly more Yamnaya-like than their neighbours'.
- There is no support for Greek descent from Alexander's army: the Y-chromosome and mitochondrial lineages carry no Greek signal.
- Their "ancient and divergent" look on a PCA is a drift artefact of a tiny, long-isolated population (effective size on the order of 2,300 to 2,600), not the fingerprint of a people preserved unchanged.
1. A legend in the mountains
No South Asian population has collected more romance than the Kalash. Their pre-Islamic religion, their fair features in a handful of individuals, and their isolation in three remote valleys of Chitral have fed a recurring set of claims: that they descend from soldiers Alexander the Great left behind in the fourth century BCE, that they are a surviving fragment of an original European or Aryan stock, or that they are a living museum of an ancient people frozen by their mountains. These are testable claims, and genetics has tested them. The picture that emerges is more interesting than the legend, because it explains both why the Kalash really are distinctive and why almost everything the legend says about that distinctiveness is wrong.
2. Neighbours first: where the Kalash actually sit
The cleanest way to place a population is to ask what it is nearest to. The chart below gives scaled Global25 Euclidean distances (multiplied by 1000) from the modern Kalash average to a panel of living neighbours and to the ancient reference poles.
The order is unambiguous. The Kalash are nearest to the people who live around them: the Nuristani of eastern Afghanistan, the Pashtun, the Kho of Chitral. They are far from the indigenous South Asian pole (the Onge proxy, about 339 units), far from raw steppe (about 184), and far from Iranian farmers (about 154). They are a local Hindu Kush population, and the one ancient sample that sits almost among their living neighbours is the Iron Age Swat valley, about 49 units away. Nothing in this picture points to the Mediterranean or to a separate European stock.
3. The drift trap: why an isolate looks ancient
If the Kalash are so firmly local, why do they look like an outlier on so many plots? The answer is genetic drift. The Kalash have lived for many generations as a very small, endogamous community. The foundational study of their genomes estimated a long-term effective population size of only about 2,300 to 2,600 individuals and found a strong bottleneck (Ayub et al. 2015). In a population that small, allele frequencies wander far from their starting point purely by chance. On a PCA that wandering pulls the group out along its own private axis, and it inflates every distance measured to it. The result is a population that looks deeply diverged and "ancient" when in fact it is a recent local lineage that has simply drifted hard.
Our own models carry the same fingerprint. When we fit the Kalash with the standard ancient sources, the leftover error (the residual) is markedly higher for them than for their undrifted neighbours: about 51 units for the Kalash against about 42 for the Balochi or Brahui. That excess residual is exactly what drift produces, a private signal that no combination of real ancestral populations can capture. It is the honest measure of how much of the Kalash "uniqueness" is noise rather than ancestry.
4. The same three ingredients as everyone around them
Strip away the drift and model the Kalash with the three sources that the Reich laboratory showed describe essentially all South Asians since the Bronze Age (Narasimhan et al. 2019): an indigenous Ancient Ancestral South Asian component (AASI, here proxied by the Onge), an Iranian-farmer component (Ganj Dareh Neolithic), and Bronze Age steppe (Steppe_MLBA, here Sintashta and Andronovo). The Kalash resolve into the same trio as their neighbours, only at a slightly different setting.
Deep three-way ancestry (AASI + Iranian farmer + steppe)
The Kalash carry the lowest AASI share and one of the highest steppe shares of the whole regional panel, more steppe even than the Iron Age Swat valley that lies upstream of them. This is the kernel of truth behind the legend: the Kalash really are steppe-shifted. But so are the Pashtun, and the shift is a position on a shared gradient, not a foreign ancestry. Figures are proxy-dependent and carry a high residual for the Kalash; read them as directions, not exact percentages.
5. The Yamnaya question
A natural follow-up: do the Kalash carry Yamnaya ancestry, the Early Bronze Age steppe pastoralist ancestry of the western steppe? The answer is yes, but the route matters. In the standard model the steppe ancestry of South Asians is not raw Yamnaya. It is Steppe_MLBA, the Middle to Late Bronze Age steppe (Sintashta, then Andronovo) that carried Indo-Iranian languages south. And Steppe_MLBA is not pure Yamnaya: when we decompose it, it resolves to roughly four fifths Yamnaya plus a slice of early European farmer.
What the steppe source is made of
Steppe_MLBA is about 80 percent Yamnaya. So the Kalash share of about 37 percent Steppe_MLBA translates to roughly 30 percent Yamnaya-equivalent ancestry, transmitted through the Sintashta-Andronovo world rather than directly from the western steppe.
There is a small twist that is worth an honest look. If we swap raw Yamnaya in for Steppe_MLBA, the Kalash fit just as well (about 39 percent Yamnaya, with essentially the same error), and they barely call for the European-farmer slice that normally distinguishes Steppe_MLBA from Yamnaya. Their Pashtun neighbours behave differently: offered the same options, the Pashtun clearly prefer Steppe_MLBA and pull in more farmer ancestry.
Offered raw Yamnaya plus farmer instead of Steppe_MLBA
The Kalash take about 36 Yamnaya and only about 4 farmer; the Pashtun take about 29 Yamnaya and about 9 farmer. The Kalash steppe is a touch more Yamnaya-like, less shifted toward European farmer, than their neighbours'. This is consistent with their isolate history: having taken in little regional ancestry since, their Bronze Age signal is left comparatively bare. The effect is small and sits within the drift noise, so it is a tendency, not a hard result; the robust contrast is that the Pashtun firmly demand Steppe_MLBA while the Kalash can do without it.
6. The Iron Age Swat valley and the continuity debate
The Swat valley, just south of the Kalash lands, has yielded one of the richest ancient DNA transects in the region, more than a hundred individuals spanning roughly 1200 BCE to 1 CE (Narasimhan et al. 2019). Those Iron Age people already carried the full three-way mixture, with steppe ancestry that had arrived in the centuries before. In our distances the Iron Age Swat sits about 49 units from the modern Kalash, closer than most living Pakistani groups. The Kalash look like a drifted continuation of that Iron Age northwestern population, not a separate intrusion.
This bears on a genuine scientific disagreement. Ayub et al. (2015) read the Kalash as an ancient isolate, splitting from neighbouring populations on the order of twelve thousand years ago and receiving no detectable later gene flow. Hellenthal and colleagues (2016), reanalysing overlapping data, found instead a detectable pulse of admixture dated to roughly 990 to 210 BCE, the same window in which steppe-derived ancestry was spreading through the northwest. The two readings can be reconciled: the Kalash are an old local lineage that drifted heavily in isolation, but not a sealed vault. They took part, like their neighbours, in the Bronze and Iron Age reshaping of the region before isolation set in.
7. The Alexander myth, specifically
The most popular single claim, descent from Alexander's Macedonian and Greek soldiers, fails on the clearest evidence of all. If a few hundred Greek men had founded the Kalash line, it would show in the male lineages. It does not. The Y-chromosome haplogroups of Kalash men are South and Central Asian, dominated by lineages like R1a and L, with no Greek or Balkan signal; the mitochondrial picture is likewise West and South Asian (Ayub et al. 2015). The fair features that fuel the legend are simply the high-frequency end of variation that exists across the wider region, amplified by drift in a small population, not a Mediterranean inheritance.
8. So: relics or myth?
Both halves of the question have an answer, and they pull in opposite directions. Are the Kalash an Indo-European population with real steppe ancestry? Yes. They speak a Dardic language, they carry roughly a third Bronze Age steppe ancestry, and that steppe is genuinely Yamnaya-derived, like the Indo-Iranian ancestry of their neighbours. In that narrow sense they are exactly what the word "relic" wants them to be. But are they a frozen fragment of ancient Europeans, the children of Alexander, a people preserved outside of time? No. They are a local Hindu Kush population, nearest to the Nuristani and the Pashtun, descended from the same Iron Age northwestern stock as the Swat valley dead, and made to look exotic by the drift of a tiny isolated community rather than by any foreign or ancient pedigree. The legend mistook the signature of isolation for the signature of origin. The Kalash are not Europe's lost cousins. They are the Hindu Kush, distilled.
The story in six steps
Claim and reality
The Kalash descend from Greek soldiers left behind by Alexander the Great.
No Greek signal in the male or female lineages. Kalash Y-chromosomes and mitochondria are South and Central Asian. The legend has no genetic support.
They are a surviving fragment of an ancient European or pure Aryan stock.
They are a Hindu Kush population, nearest to the Nuristani, Pashtun and Kho, and far from any European pole. Their ancestry is the standard South Asian trio of AASI, Iranian farmer and steppe.
They are a people frozen in time, preserved unchanged by their mountains.
Their distinctiveness is drift, not antiquity. A tiny isolated community (effective size about 2,300 to 2,600) wandered along its own private axis, inflating every distance and leaving a high residual no real source can capture.
Their steppe and Yamnaya ancestry sets them apart as specially Indo-European.
They do carry real steppe ancestry, about 37 percent Steppe_MLBA, roughly 30 percent Yamnaya-equivalent. But so do the Pashtun. It is a shared northwestern signal, not a unique inheritance.
Reproduce it yourself
Paste the coordinates below into Vahaduo (the Global25 tool) to rebuild the comparisons in this article: the modern Kalash and their Hindu Kush neighbours, the Iron Age Swat average, and the source poles (AASI as Onge, Iranian farmer as Ganj Dareh Neolithic, steppe as a Sintashta-Andronovo average and as Yamnaya from Samara, and the Indus_Periphery proxy from Shahr-i-Sokhta). Modern averages are scaled Global25 from Davidski; the ancient averages were computed here from the individual coordinates.
Kalash_(n=23),0.083883,0.024991,-0.084032,0.066594,-0.071679,0.040306,0.003116,0.002017,-0.030928,-0.025157,-0.005592,-0.000495,-0.002392,-0.010902,0.016611,0.008762,-0.013821,0.002104,0.000672,-0.012832,-0.003922,-0.005392,0.002862,-0.003426,0.003353 Nuristani_(n=4),0.086221,0.026658,-0.079761,0.063227,-0.073552,0.039323,-0.000176,-0.002423,-0.030167,-0.026606,-0.008079,-0.000337,-0.001152,-0.016136,0.013843,0.017535,0.002738,-0.000348,-0.004148,-0.008035,-0.004118,-0.006801,0.003235,-0.002621,0.004191 Kho_(n=3),0.085367,0.008124,-0.071150,0.059109,-0.067500,0.035791,0.000705,0.000385,-0.018543,-0.020714,-0.005738,-0.002198,0.001635,-0.010597,0.010451,0.015469,0.001478,0.000718,0.003268,-0.003418,-0.006946,-0.001855,0.000698,-0.003735,0.004031 Pashtun_Yusufzai_(n=34),0.080078,0.019624,-0.089322,0.059204,-0.065189,0.037347,0.003539,0.002694,-0.013968,-0.012397,-0.005020,0.000101,-0.000988,-0.009043,0.010642,0.013883,-0.000732,0.000529,0.001287,-0.008692,-0.003134,-0.005961,0.002508,-0.003438,0.003596 Burusho_(n=20),0.071253,-0.037321,-0.086040,0.061548,-0.065904,0.038710,0.003407,0.003115,-0.010891,-0.010342,-0.013024,-0.001529,0.000178,-0.010411,0.009086,0.009732,0.000013,-0.001185,0.000138,-0.006497,-0.003849,-0.001150,0.003741,-0.000349,0.004514 Balochi_Pakistan_(n=14),0.068782,0.050777,-0.102119,0.031723,-0.073750,0.031654,0.005439,0.001681,-0.022731,-0.021178,-0.003596,-0.002152,0.004226,-0.007815,0.018225,0.030221,-0.011828,0.003430,0.006877,-0.028362,-0.000811,-0.018583,0.001074,-0.019745,0.015764 Brahui_(n=20),0.068863,0.049101,-0.103520,0.031929,-0.073275,0.029534,0.005887,0.002146,-0.021403,-0.020201,-0.003158,-0.003589,0.005500,-0.010322,0.015934,0.029786,-0.010418,0.004364,0.005725,-0.025981,-0.000499,-0.015401,-0.000863,-0.016792,0.012879 Onge_(n=19),-0.022525,-0.244529,-0.132429,0.095965,0.029933,-0.004756,-0.007644,0.007579,0.054823,0.024439,0.023495,0.003218,-0.004061,0.008475,-0.012693,-0.011145,0.010918,-0.001620,-0.005981,0.028829,-0.003711,0.009690,-0.012824,-0.001123,0.004343 Pakistan_Swat_IA_avg_(n=86),0.067354,-0.002043,-0.111825,0.068562,-0.081232,0.043500,0.003222,0.001825,-0.017722,-0.014600,-0.005687,0.001622,-0.002351,-0.008352,0.011733,0.011259,-0.001637,-0.000035,0.000780,-0.009499,-0.000961,-0.006016,0.001896,-0.005718,0.003244 Iran_GanjDareh_N_avg_(n=12),0.047331,0.066433,-0.154054,0.006352,-0.123023,0.022265,0.014160,-0.002327,-0.081179,-0.055400,-0.002246,0.000100,0.006281,-0.009519,0.032200,0.056759,-0.008399,0.008731,0.011239,-0.036059,0.006956,-0.028698,-0.008525,-0.035869,0.019968 Steppe_MLBA_Sintashta_Andronovo_avg,0.121814,0.097786,0.057359,0.083789,0.002394,0.031462,0.002198,0.001327,-0.017867,-0.032878,0.000163,-0.000434,-0.002365,-0.025907,0.024221,0.011106,-0.007392,-0.000968,-0.001205,-0.000484,-0.008849,0.003714,-0.000041,0.010192,-0.005620 Russia_Samara_EBA_Yamnaya_avg_(n=60),0.122303,0.088080,0.043004,0.114790,-0.028123,0.045585,0.003928,-0.001950,-0.054676,-0.074908,0.000579,-0.000412,-0.001658,-0.021841,0.036907,0.012061,-0.006041,-0.002082,-0.002755,0.010217,-0.004041,0.001616,0.009800,0.020621,-0.004069 Iran_ShahrISokhta_BA1_avg_(n=11),0.075227,0.062132,-0.132301,0.022170,-0.108048,0.031286,0.010319,-0.002434,-0.062027,-0.035702,-0.001978,0.003215,-0.005311,-0.006806,0.021543,0.030267,-0.004445,0.001981,0.003245,-0.023136,0.008201,-0.016502,-0.001972,-0.018699,0.012432
References and sources
- 1 Ayub, Q., Mezzavilla, M., Pagani, L., Haber, M., Mohyuddin, A., Khaliq, S., Mehdi, S. Q., Tyler-Smith, C. (2015). The Kalash Genetic Isolate: Ancient Divergence, Drift, and Selection. American Journal of Human Genetics 96(5), 775 to 783. link
- 2 Hellenthal, G., Falush, D., Myers, S., Reich, D., Busby, G. B., Lipson, M., Capelli, C., Patterson, N. (2016). The Kalash Genetic Isolate? The Evidence for Recent Admixture. American Journal of Human Genetics 98(2), 396 to 397. link
- 3 Hellenthal, G., Busby, G. B. J., Band, G., et al. (2014). A Genetic Atlas of Human Admixture History. Science 343, 747 to 751. The original analysis dating a Kalash admixture event to the first millennium BCE. link
- 4 Narasimhan, V. M., Patterson, N., Moorjani, P., et al. (2019). The formation of human populations in South and Central Asia. Science 365, eaat7487. Source of the AASI plus Indus_Periphery plus Steppe_MLBA framework and the Swat valley transect. link
- 5 Global25 coordinates: Davidski (Eurogenes), with modern and ancient averages drawn from the Moriopoulos 2025 collection. Global25 spreadsheet tooling: Vahaduo. G25
Modern and ancient Global25 coordinates: Davidski (Global25) and the Moriopoulos 2025 collection. Iron Age Swat, Ganj Dareh Neolithic, Sintashta-Andronovo, Yamnaya (Samara) and Shahr-i-Sokhta averages computed here from the individual coordinates. Global25 spreadsheet tooling: Vahaduo. Analysis: scaled Global25 Euclidean distances and non-negative least squares modelling in Python. Ancestry fractions are proxy-dependent and, given the Kalash drift, are best read as directions rather than exact percentages.