The Tajiks are the great Iranian-speaking people of Central Asia, spread across Tajikistan, Afghanistan and Uzbekistan, and they speak a form of Persian, a western branch of the Iranian languages. That speech is the historical newcomer. It spread into the region with the Persian and Islamic expansion of the early medieval centuries, while the people who adopted it had lived in the oases and mountain valleys of the Oxus for thousands of years before, speaking eastern Iranian tongues. The genome records that older story. A Tajik is not a product of the Iranian plateau to the west but of a local Bronze and Iron Age population, formed when the farmers of the Oxus civilisation mixed with steppe herders around 1500 BCE. Three branches show it with unusual clarity: the lowland Tajiks, the Sogdian-descended Yaghnobis who froze the Iron Age genome almost in place, and the highland Pamiris who added a signal of their own from the deep north of Eurasia.
Key points
- Tajiks speak Persian, a western Iranian language, but that language arrived only in the early medieval period. Genetically they descend from a much older eastern Iranian population of the Oxus, in place since the Bronze Age. The tongue shifted; the people did not.
- The shared foundation is a mix of two ancient sources: the farmers of the Bronze Age Oxus civilisation (BMAC, here Gonur Tepe) and the Andronovo steppe herders who arrived around 1500 BCE. Together these make up roughly 80 to 90 percent of every Tajik and Pamiri genome modelled here.
- The Yaghnobis, who speak the only living descendant of ancient Sogdian, are the closest thing to a living Iron Age population in the region. They model as about half BMAC and just over 40 percent Andronovo with around 8 percent East Asian and no measurable South Asian input, and they sit about 33 Global25 units from a Kushan-era sample of the first to fourth centuries CE.
- Lowland Tajiks carry the same Iron Age base with a little more eastward drift: a modest East Asian share near 11 percent and a few percent of South Asian ancestry, the trace of later contact along the Silk Road and with the subcontinent.
- The highland Pamiris (Wakhi, Sarikoli, Shugnan) carry a distinctive extra component related to the Bronze Age Tarim Basin mummies, a deeply isolated Ancient North Eurasian-rich population. It is absent in the Yaghnobis and lowland Tajiks and reflects the highland isolation of the Pamir, alongside genuine high-altitude adaptation.
- Set against their Turkic-speaking neighbours, the Tajiks are the side of the great Central Asian divide that kept the East Asian dilution to a minimum. The East Asian share climbs from about 8 percent in a Yaghnobi to 27 percent in a Turkmen, 34 percent in an Uzbek, 42 percent in an Uyghur and around 60 percent or more in Kazakhs and Kyrgyz.
- Figures are proxy-dependent and best read as directions. Where the model cannot separate older from later layers, the published qpADM estimates from the same data point the same way.
1. A Persian tongue on an older people
Spoken language places the Tajiks firmly in the Persian world. Tajik Persian is a western Iranian language, a close relative of the Farsi of Iran and the Dari of Afghanistan, and it carries the literary inheritance of Samarkand and Bukhara, the historic Tajik-speaking cities now inside Uzbekistan. But western Iranian speech is not native to Central Asia. It moved north and east with the spread of Persian administration and culture and with the Islamic period, taking lasting root only in roughly the last fifteen hundred years. Before that, the people of the Oxus valleys spoke eastern Iranian languages: Bactrian, Sogdian, the ancestors of the Pamir tongues. The shift from eastern to western Iranian was a change of language, not a replacement of population.
This is the same kind of question raised by other peoples whose language points one way and whose ancestry points another. We can ask it of the Tajiks in three steps. Who are they nearest to, in the living world and the ancient one? What is the deep foundation of the genome made of? And does any part of the ancestry track the recent Persian language, or did speech and descent part ways here as they have elsewhere?
2. Three branches of one people
The Tajik name covers more than the settled Persian-speakers of the lowlands. Two other branches sharpen the picture. The Yaghnobis are a small community of the upper Zarafshan valley who speak the only surviving descendant of Sogdian, the eastern Iranian lingua franca of the Silk Road. By tradition they are the heirs of Sogdians who withdrew into the mountains after the Arab conquest, and their isolation has preserved an archaic language. The Pamiris of the high Badakhshan ranges (Wakhi, Sarikoli, Shugnan, Rushan and others) speak their own eastern Iranian languages, follow Ismaili Islam, and live at altitudes above three or four thousand metres. The three branches let us see how a shared Iron Age inheritance fared under three different histories: drift and contact in the lowlands, deep isolation in the Yaghnob, and high-altitude seclusion in the Pamir.
3. Neighbours first: where a Yaghnobi sits
The clearest way to place a population is to ask what it is nearest to. The chart below gives scaled Global25 Euclidean distances (multiplied by 1000) from the Yaghnobi average, the most conservative of the three branches, to ancient reference points, to other Iranian-speakers, and to the Turkic-speaking neighbours of Central Asia.
The order is its own argument. The Yaghnobi sit nearest to a Kushan-era individual from Tajikistan of the first to fourth centuries CE, at about 33 units, and to a fellow lowland Tajik from Kulob, before reaching an Iron Age sample from Turkmenistan and the living Persians of Khorasan. The Pamiris and the other Iranian-speakers follow. Only well beyond them come the Turkic neighbours: a Turkmen at 121 units, an Uzbek at 178, an Uyghur at 236, a Kyrgyz at 342. The deep source poles lie farthest of all, the Tarim mummies and the Baikal hunter-gatherers off at the right edge. A Yaghnobi is closer to a person who lived in the Kushan period than to a modern Uzbek living next door. That is what genetic continuity looks like.
4. The Iron Age core
Model the three Tajik branches as a mixture of four ancient sources (the Bronze Age Oxus farmers of Gonur Tepe as the BMAC pole, the Andronovo steppe herders, a deep South Asian source for any subcontinental input, and a Baikal hunter-gatherer source for East Asian ancestry) and a consistent pattern appears. The bulk of every genome is the same two-part Iron Age recipe: the settled Oxus farmer base and the steppe layer that joined it in the late Bronze Age.
The shared Iron Age foundation, and what sets each branch apart
All three branches resolve to roughly four fifths Oxus farmer plus Andronovo steppe, the mixture that formed in southern Central Asia at the end of the Bronze Age. The Yaghnobi carry that base almost neat. The lowland Tajiks add a little East Asian and a few percent South Asian. The Pamiris carry the most South Asian of the three. Figures are proxy-dependent; read them as directions, not exact percentages.
The steppe layer is the historically important one. Across southern Central Asia the main admixture event at the base of Indo-Iranian ancestry took place at the end of the Bronze Age, when local Oxus groups mixed with Andronovo-related steppe populations, around the time the Oxus civilisation came to a close. That is the event these models recover, and it is the genetic substrate on which every later language, eastern Iranian and then Persian alike, was carried.
5. The Yaghnobis: a living Iron Age
The Yaghnobis are the case that proves how little the genome moved. Their language is a direct descendant of Sogdian, the only one left alive, and their ancestry is just as conservative. They model as about half BMAC and just over 40 percent Andronovo, with a small East Asian share near 8 percent and, tellingly, no measurable South Asian component at all. In the distance chart they sit about 33 units from a Kushan-period individual and around 51 from an Iron Age sample of Turkmenistan. Published work using formal methods reaches the same conclusion from the same data, modelling the Yaghnobis as roughly 90 percent or more Iron Age Turkmenistan with a small Baikal-related remainder. Whether read through these simple mixture models or through qpADM, the Yaghnobis are about as close to an unbroken Iron Age population as Central Asia has to offer.
6. The highland signature of the Pamir
The Pamiris share the Iron Age base, but the high valleys added something the lowlands never received. When a Bronze Age Tarim Basin source (the Xiaohe mummies, a deeply isolated population rich in Ancient North Eurasian ancestry) is offered to the model alongside the usual four sources, the Pamiri branches draw a small but real share of it, and their fit improves. The Yaghnobis and lowland Tajiks draw none, and their fit does not change at all.
A Tarim-related highland component, present only in the Pamir
Adding the Tarim source draws 5 to 7 percent in the Pamiri branches and improves their model fit by several units, while drawing nothing in the Yaghnobi and lowland Tajik and leaving their fit unchanged. This matches the published finding that highland Tajik groups received extra gene flow from the Tarim mummies. It is the genetic mark of the Pamir's isolation, layered on the common Iron Age base.
This highland inheritance sits alongside a separate, well-documented biological story: the Pamiri and Xinjiang highland Tajiks carry genetic adaptations to life above three thousand metres, where cold, low oxygen and intense ultraviolet exposure are constant. The Tarim-related ancestry and the altitude adaptations are different things, but both are products of the same long seclusion in the high mountains.
7. The divide they stayed on
Central Asia is split between two worlds, the Iranian-speaking and the Turkic-speaking, and the genome marks the divide as plainly as the map of languages. The Turkic expansion brought East Asian ancestry westward in successive waves, and the modern Turkic-speaking peoples carry it in large measure. The Iranian-speakers are the populations that absorbed the least. Ranking the groups by their East Asian share traces the whole gradient in one line.
East Asian (Baikal-related) share, from the Iranian to the Turkic world
The Iranian-speakers (green to the Tajik range) hold the East Asian share under about a fifth of the genome, and the Yaghnobis and Pamiris under a tenth. The Turkic-speakers (blue) climb from a quarter in Turkmens to two thirds in Kyrgyz. Within the Tajiks themselves there is a west to east cline, from the conservative valley populations to the more eastern samples, a small-scale echo of the same gradient.
The contrast is the heart of the matter. The Tajiks were not isolated from the Turkic world; they bordered it, traded with it and were ruled by it for long stretches. But where the Turkic languages spread together with substantial East Asian ancestry, the Persian language spread into Central Asia largely as a cultural and administrative inheritance, carried by populations already resident. The Tajiks took the western Iranian tongue and kept the eastern Iranian genome.
8. So, does the genome match the language?
Only the old language, not the new one. The Tajiks speak Persian, a western Iranian tongue, but their genome is the record of an eastern Iranian people of the Oxus, formed in the late Bronze Age from settled Oxus farmers and Andronovo steppe herders and carried with remarkable steadiness into the present. The Yaghnobis, who kept an eastern Iranian language alive in the mountains, kept the Iron Age genome with it, sitting almost on top of a Kushan-era sample. The Pamiris added a highland signal from the Tarim Basin that the lowlands never saw. And all of them stayed on the Iranian side of the great Central Asian divide, holding the East Asian dilution to a fraction of what their Turkic neighbours carry. The Persian speech is real and it is theirs, but it is the most recent layer of a much older inheritance. In the Tajiks, the language is young and the genes are old.
The story in five steps
Claim and reality
Because the Tajiks speak Persian, they are essentially a western Iranian people who moved in from the Iranian plateau.
The genome is a local Oxus one, built from Bronze Age Oxus farmers and Andronovo steppe. A Tajik sits far from the Persians of Fars and close to ancient Central Asian samples. The Persian language is a later cultural layer.
The Yaghnobis are just another Tajik group with an unusual dialect.
They speak the only living descendant of Sogdian and carry an Iron Age genome with no South Asian input, sitting about 33 units from a Kushan-era individual. They are close to a living Iron Age population.
All Tajik groups are genetically interchangeable.
The highland Pamiris carry a Tarim Basin component absent in the lowlands and the Yaghnob, and the lowland Tajiks carry more East Asian and South Asian ancestry than the Pamiris. The branches share a base but diverge in their later layers.
Tajiks and their Turkic-speaking neighbours are much the same people with different languages.
The East Asian share separates them clearly, from under a tenth in Yaghnobis and Pamiris to a quarter or more in Turkmens, Uzbeks and beyond. The Iranian and Turkic worlds are distinct on the genome.
Reproduce it yourself
Paste the coordinates below into Vahaduo (the Global25 tool) to rebuild the comparisons in this article: the Yaghnobi, lowland Tajik (Kulob, Ayni, Badakhshan and an eastern sample) and Pamiri (Wakhi, Sarikoli, Shugnan) averages, the Iranian-speaking and Turkic-speaking neighbours, the four modelling sources (BMAC Gonur Tepe, Andronovo, Onge for deep South Asian, Baikal hunter-gatherer for East Asian), the Tarim Basin highland source (Xiaohe), and the ancient anchors (Kushan-era Ksirov and Iron Age Takhirbai). All coordinates are scaled Global25 from the Moriopoulos 2025 collection.
Yaghnobi_(n=20),0.100847,0.059002,-0.028473,0.024160,-0.046070,0.018658,0.006463,-0.000992,-0.035802,-0.026479,-0.004774,0.000315,-0.000134,-0.009393,0.013314,0.017760,-0.003592,0.000462,0.001427,-0.011030,-0.007849,-0.005002,0.001485,0.002356,0.003551 Tajik_Kulob_lowland_(n=8),0.096323,0.024542,-0.033297,0.025625,-0.048663,0.021358,0.001410,-0.000981,-0.032017,-0.025149,-0.007064,-0.002423,-0.002905,-0.008137,0.012006,0.013579,0.000788,0.000311,0.000864,-0.008249,-0.006436,-0.002303,0.000164,0.002636,0.001716 Tajik_Ayni_(n=12),0.091533,0.009394,-0.029101,0.018276,-0.040751,0.014526,0.006796,-0.000654,-0.026281,-0.020122,-0.008756,0.001549,0.000681,-0.007913,0.011299,0.008088,-0.009812,-0.000581,0.000891,-0.008275,-0.006572,-0.003720,0.001633,0.001145,0.005059 Tajik_Badakhshan_(n=6),0.092007,0.027927,-0.039912,0.044251,-0.049804,0.025797,0.005287,0.002116,-0.029110,-0.026211,-0.008038,-0.001448,-0.000619,-0.011399,0.016400,0.010276,-0.006041,0.000253,0.002514,-0.007504,-0.007237,-0.004410,0.001992,0.001727,0.001357 Tajik_eastshifted_(n=13),0.083091,-0.044996,-0.019233,0.013392,-0.037616,0.011713,0.005965,0.003905,-0.019870,-0.011046,-0.012916,-0.002190,0.002299,-0.010523,0.005951,0.010832,0.000311,0.000916,0.001518,-0.002434,-0.006892,-0.002739,0.000825,0.000797,0.002459 Wakhi_(n=22),0.086079,0.009718,-0.044025,0.053074,-0.053433,0.028842,0.005349,-0.000986,-0.024231,-0.022437,-0.009783,-0.001223,-0.000298,-0.014666,0.015317,0.011490,-0.002969,-0.000454,-0.000766,-0.006426,-0.006645,-0.000856,0.001517,0.001329,0.001693 Pamiri_Sarikoli_highland_(n=50),0.087075,-0.003168,-0.036716,0.047378,-0.049585,0.028452,0.004451,-0.000134,-0.024171,-0.023334,-0.010299,-0.000390,-0.001112,-0.013437,0.014937,0.011718,-0.002756,0.000626,0.001438,-0.007561,-0.005900,-0.002641,0.001600,-0.000227,0.001624 Pamiri_Shugnan_(n=8),0.095896,0.040113,-0.031113,0.045583,-0.048663,0.025240,0.001028,-0.003750,-0.027841,-0.028520,-0.004344,-0.001368,0.000594,-0.016652,0.014200,0.016822,-0.002526,0.001742,0.002168,-0.008567,-0.005646,-0.004560,0.001417,0.003118,0.004296 Persian_Khorasan_(n=12),0.086316,0.072272,-0.064456,-0.005437,-0.048419,0.007391,0.003936,-0.001731,-0.025395,-0.016796,-0.002206,-0.000387,0.001623,-0.006319,0.007623,0.015546,-0.000641,0.001668,0.002682,-0.009036,-0.001861,-0.003473,-0.001972,-0.002681,0.006576 Persian_Fars_(n=12),0.084134,0.100833,-0.067725,-0.025152,-0.043162,0.001139,0.003486,-0.003250,-0.027297,-0.016781,0.001177,-0.000724,0.003072,-0.002305,0.007035,0.012817,-0.005194,0.001890,0.001760,-0.010463,0.001425,-0.003297,-0.000308,-0.004368,0.004361 Pashtun_Durrani_(n=7),0.083614,0.031985,-0.071009,0.046494,-0.061067,0.030069,0.002349,-0.000357,-0.017598,-0.019224,-0.002624,-0.000536,0.001055,-0.007274,0.014432,0.011353,-0.005268,0.000660,0.001641,-0.011011,-0.002198,-0.004955,0.004948,-0.002842,0.001897 Uzbek_(n=15),0.074516,-0.109271,-0.000729,0.008398,-0.033935,0.006675,0.009306,0.005661,-0.014589,-0.008772,-0.017224,-0.003277,0.002517,-0.004670,0.005103,0.003960,-0.004059,-0.000752,0.000696,-0.000550,-0.007753,-0.001558,-0.004338,0.000996,0.002571 Turkmen_(n=50),0.082931,-0.047770,-0.007407,-0.004774,-0.041565,-0.003436,0.006275,0.003715,-0.017605,-0.010719,-0.009123,-0.001025,0.001350,-0.003460,0.002166,0.005290,-0.000503,0.000317,0.000503,-0.002028,-0.007592,-0.002416,-0.003796,-0.001723,0.002431 Uyghur_(n=19),0.065598,-0.166280,0.001171,-0.001972,-0.026466,0.007868,0.006258,0.003146,-0.014263,-0.009198,-0.027871,-0.002706,0.001338,-0.006504,0.003814,0.007439,0.000885,-0.001374,-0.001019,0.000184,-0.008327,-0.000078,-0.000577,0.002353,0.002975 Kyrgyz_(n=68),0.058736,-0.263231,0.045498,-0.012426,-0.040876,-0.014728,0.011402,0.013007,-0.003567,0.001136,-0.026940,-0.002766,0.000796,-0.002151,0.003195,0.000991,-0.002324,-0.000917,0.002499,0.007943,-0.015164,-0.003420,-0.012144,0.001045,0.002008 Kazakh_(n=20),0.066757,-0.215343,0.044595,-0.008010,-0.036914,-0.012968,0.010387,0.011342,-0.005031,-0.000519,-0.022328,-0.002802,0.000766,-0.000268,0.001249,0.001081,-0.003260,-0.000285,0.001031,0.005465,-0.012509,-0.004835,-0.008535,-0.000211,-0.000527 Hazara_Pakistan_(n=17),0.064076,-0.172461,-0.001220,0.001976,-0.040677,0.002412,0.009552,0.008335,-0.012271,-0.004245,-0.025829,-0.001490,0.001880,-0.002032,0.003776,0.005202,-0.003375,0.000604,0.000495,-0.000530,-0.007157,-0.002997,-0.004691,-0.002183,0.000275 Kalash_(n=23),0.083883,0.024991,-0.084032,0.066594,-0.071679,0.040306,0.003116,0.002017,-0.030928,-0.025157,-0.005592,-0.000495,-0.002392,-0.010902,0.016611,0.008762,-0.013821,0.002104,0.000672,-0.012832,-0.003922,-0.005392,0.002862,-0.003426,0.003353 BMAC_Gonur_Tepe_BA_(n=12),0.079771,0.078873,-0.110999,0.008317,-0.098582,0.021033,0.007011,-0.006269,-0.063880,-0.040791,-0.002233,0.002648,-0.005501,-0.006468,0.018311,0.024894,-0.004792,0.000739,0.004546,-0.023251,-0.002080,-0.014313,-0.001654,-0.015755,0.011177 Andronovo_Alakul_Maitan_MLBA_(n=7),0.123742,0.119397,0.052258,0.074013,0.011694,0.029841,0.007419,0.004352,-0.013995,-0.025539,-0.002900,0.000471,-0.004651,-0.020958,0.024158,0.009546,-0.006184,-0.000815,-0.001167,0.001197,-0.005062,0.001660,-0.002923,0.003408,-0.004738 Onge_(n=19),-0.022525,-0.244529,-0.132429,0.095965,0.029933,-0.004756,-0.007644,0.007579,0.054823,0.024439,0.023495,0.003218,-0.004061,0.008475,-0.012693,-0.011145,0.010918,-0.001620,-0.005981,0.028829,-0.003711,0.009690,-0.012824,-0.001123,0.004343 Baikal_HG_Shamanka_EN_(n=10),0.030960,-0.414742,0.095034,-0.015795,-0.086847,-0.048220,0.005381,0.010061,0.014705,0.012520,-0.016060,-0.001814,0.003583,-0.007803,-0.004655,-0.003209,-0.002516,0.000583,0.006461,0.017833,-0.017045,-0.005948,-0.016515,-0.003205,0.004694 Tarim_Xiaohe_EMBA_(n=10),0.103237,-0.100030,0.072973,0.203394,-0.115222,0.047690,-0.049423,-0.060759,-0.041866,-0.096202,0.028548,-0.013368,0.024529,-0.071399,0.026248,0.022726,-0.018997,-0.001368,0.001999,-0.006528,-0.051197,0.007827,0.023910,0.014375,-0.008538 Tajikistan_Kushan_Ksirov_(n=3),0.095991,0.070410,-0.034444,0.012812,-0.048727,0.010133,-0.002115,-0.005308,-0.032042,-0.020714,0.007740,0.005795,-0.004311,-0.009725,0.008279,0.013082,0.006476,0.003252,0.011564,-0.010505,-0.006530,0.000165,0.001356,-0.007109,0.005189 Turkmenistan_IronAge_Takhirbai_(n=1),0.103579,0.093429,-0.018102,0.042313,-0.035083,0.025937,0.003525,-0.003461,-0.044382,-0.040821,-0.011042,-0.000450,0.000595,-0.018579,0.024158,0.017369,-0.013560,-0.000887,0.001508,-0.011380,0.001248,0.000989,0.001849,0.009037,0.001916
References and sources
- 1 Guarino-Vignon, P., Marchi, N., Bon, C., et al. (2022). Genetic continuity of Indo-Iranian speakers since the Iron Age in southern Central Asia. Scientific Reports 12, 733. The study modelling Yaghnobis and Tajiks against ancient individuals and showing continuity with Iron Age Turkmenistan and Tajikistan. link
- 2 Dai, S.-S., Sulaiman, X., Isakova, J., et al. (2022). The genetic echo of the Tarim mummies in modern Central Asians. Molecular Biology and Evolution 39, msac179. High-coverage genomes of Tajiks and Kyrgyz, and the additional Tarim-related gene flow in highland Tajiks. link
- 3 Narasimhan, V. M., Patterson, N., Moorjani, P., et al. (2019). The formation of human populations in South and Central Asia. Science 365, eaat7487. The BMAC and Andronovo framework for southern Central Asian ancestry. link
- 4 Zhang, F., Ning, C., Scott, A., et al. (2021). The genomic origins of the Bronze Age Tarim Basin mummies. Nature 599, 256 to 261. The Xiaohe population and its Ancient North Eurasian-rich, isolated profile. link
- 5 Cilli, E., Gabbianelli, F., Ciucani, M. M., et al. (2011 to 2013). Ethno-anthropological and genetic study of the Yaghnobis, an isolated community in Central Asia. Work documenting the Yaghnobis as a relic community speaking a living descendant of Sogdian. study
- 6 Global25 coordinates: Davidski (Eurogenes), with modern and ancient averages drawn from the Moriopoulos 2025 collection. Global25 spreadsheet tooling: Vahaduo. G25
Modern and ancient Global25 coordinates: Davidski (Global25) and the Moriopoulos 2025 collection. Tajik, Yaghnobi and Pamiri averages and the comparison populations are scaled Global25 from the collection; the four modelling sources and the ancient anchors are collection averages on the same scale. Global25 spreadsheet tooling: Vahaduo. Analysis: scaled Global25 Euclidean distances and non-negative least squares modelling in Python. Ancestry fractions are proxy-dependent and best read as directions rather than exact percentages; where a four-source model cannot separate old from recent layers, published qpADM estimates from the same data are preferred. The eastern Tajik and single-sample ancient points are used as indicative only.