The Parsis are the descendants of Zoroastrians who fled Persia after the Arab conquest of the Sasanian Empire, settling on the west coast of India over a thousand years ago. Today numbering only a few tens of thousands, the community has remained strikingly endogamous since its arrival. The genetic evidence lets us do more than confirm a Persian origin: it lets us measure, component by component, exactly how much of India the Parsis absorbed, and which part of their ancestry stayed frozen since Sasanian times
A Migration to Gujarat
Zoroastrianism emerged in Persia during the second millennium BCE and became the state religion of both the Achaemenid and Sasanian empires. The Arab Muslim conquest of the seventh century upended that religious order and drove part of the Zoroastrian population into exile. According to the tradition recorded in the Qissa-e-Sanjan, a small group of refugees landed on the coast of Gujarat and was granted permission to settle near a local ruler, on condition that they integrate into the surrounding society without disrupting it. From this founding episode the Parsi community takes its name, derived from Pars, the Persian province of origin.
The archaeological site of Sanjan, excavated since 2001 by the World Zarathusti Cultural Foundation and the Indian Archaeological Society, has helped document this settlement and, more recently, has yielded the first ancient DNA evidence directly tied to the community.
What the Genetics Already Show
The reference study on the subject, published by Sengupta, Gunnarsdottir and colleagues, compared autosomal, Y chromosome and mitochondrial DNA from Iranian Zoroastrians and Indian Parsis against neighbouring populations in both countries. The authors found that Zoroastrians in Iran and Parsis in India show markedly higher genetic homogeneity than their respective neighbours, a signature consistent with strict, long standing endogamy.
More striking still, the study dates the admixture between Parsis and local Indian populations to a window between 690 and 1390 CE, and provides strong evidence that this admixture was sex biased. Persian Zoroastrian ancestry was carried mainly through the paternal line, while a significant share of the local South Asian contribution entered through the maternal line. This is corroborated by earlier mitochondrial DNA work, which identified a notable proportion of South Asian specific maternal lineages within the Parsi population, suggesting the assimilation of local women during the initial settlement. Niraj Rai's team at the Birbal Sahni Institute of Palaeosciences later sequenced ancient mitochondrial DNA from the Sanjan tower of silence, finding Indian Parsis to be genetically closer to ancient Neolithic Iranians than present day Iranians are. In other words, it is paradoxically in India, far from the Zoroastrian heartland, that the ancient Persian signal appears best preserved.
Where the Parsis Sit: Closest Populations
Before modelling anything, the simplest test is to rank every population in the Global25 dataset by Euclidean distance from the Parsi_India average. The result is unambiguous. After Parsi_Pakistan, which is effectively the same population at distance 0.0056, the nearest neighbours are all eastern Iranian: Persian Khorasan at 0.034, the Iranian Bandari coast at 0.040, Balochi groups around 0.051, and a cluster of Iranian Zoroastrian and Yazd outlier samples. By contrast the surrounding South Asian populations sit far away: Sindhi at 0.111, Punjabi at 0.123, and the neighbouring Gujarati at 0.195, roughly four times the distance to the nearest Iranians.
Among ancient samples, the closest matches are Late Bronze Age and Iron Age groups of the eastern Iranian plateau and Central Asia: Dinkha Tepe and Hasanlu in northwest Iran, Sappali Tepe in Uzbekistan, and medieval Ghaznavid era samples from Pakistan, all in the 0.061 to 0.072 range. The Parsis sit on the eastern Iranian and South Central Asian end of the broad genetic cline, exactly where a Persian refugee population should.
The Key Question: How Much of India Did They Absorb?
A simple distance ranking shows the Parsis are Iranian-like, but it cannot separate their ancestral layers. For that we use a distal NNLS model, decomposing each population into five ancestral sources that span the genetic variation of southwest and south Asia before the historical period: Iran_N (Zagros Neolithic, Ganj Dareh), CHG (Caucasus hunter gatherer, Kotias), Anatolia_N (the first farmers, Barcin), Steppe (the Indo-Iranian input, Sintashta), and AASI (the deep Ancestral Ancestral South Indian component, proxied by the Onge).
The AASI component is the decisive one, because it is the single piece of ancestry that Iranians essentially lack. Iranian Zoroastrians and Yazd Persians carry only 2 to 4 percent AASI. Parsis carry roughly 16 percent. And the surrounding Gujaratis carry around 41 percent. The Parsi figure sits cleanly between its Iranian source and its Indian neighbour, and that 16 percent is, in effect, a direct measurement of how much local South Asian ancestry the community absorbed after arriving in Gujarat. Crucially, the Iran_N share barely moves across all these groups (around 40 percent), confirming that the Parsis did not lose their Persian farmer base, they simply layered a measurable Indian component on top of it.
This matters because steppe ancestry alone cannot explain the Parsi shift toward India. Both Persians and north Indians carry substantial Indo-Iranian steppe ancestry, around 15 to 18 percent in this model, so a naive reading might attribute Parsi distinctiveness to shared steppe input. The AASI tracer cuts through that ambiguity: steppe is roughly constant between Iranian Zoroastrians and Parsis, while AASI is what actually rises. The Indian contribution to the Parsis is South Asian hunter gatherer ancestry, not extra steppe.
The Proximal Model
The distal decomposition tells us which ancestral layers changed. A proximal model, using present day populations as sources, tells us the headline mixture in plain terms. Modelling Parsi_India directly as Iranian_Zoroastrian plus Gujarati returns 76 percent Iranian Zoroastrian and 24 percent Gujarati, with Parsi_Pakistan almost identical at 77 to 23. This is the same story the AASI tracer tells, expressed as a two way mix: roughly three quarters frozen Iranian Zoroastrian, one quarter local Gujarat. The figure also matches the published qpAdm based literature, which independently estimated the Indian contribution at around a quarter.
Yazd, notably, is historically the principal stronghold of the Zoroastrians who remained in Iran after the Islamic conquest, so the prominence of the Yazd and Fars Persian sources in the regional model is fitting on historical grounds, not merely statistical.
A Sasanian Era Anchor
The modern Iranian sources used above are not themselves frozen since the Sasanian era, so they only approximate the seventh to tenth century source population. A 2025 study by Amjadi and colleagues provides a more direct anchor: ancient genomes from northern Iran spanning the Achaemenid to medieval periods, including an individual from Vestemin in Mazandaran dated to around 332 CE, squarely within the late Parthian to Middle Sasanid window, just a few centuries before the historical Parsi migration. Modelling Parsi_India as a two way mixture of this single Sasanian era genome and Gujarati returns approximately 74 percent Iran type ancestry to 26 percent Gujarati, remarkably close to the modern source model.
This is directional corroboration rather than a headline result. The Vestemin genome is a single individual sequenced at low coverage, around 0.2x, so the fit is noticeably noisier than the population averages used elsewhere. Still, the consistency between a Sasanian era genome and modern Persian references fits the broader finding of that same study: historical period genomes from the northern Iranian plateau show strong continuity with earlier local populations rather than sharp discontinuity. For a community whose identity rests on a claim of unbroken descent from Sasanian Persia, that continuity is a useful data point.
Parsi_India,0.079245,0.066394,-0.091744,0.002462,-0.053357,0.014974,0.003622,-0.00027,-0.010903,-0.006868,-0.002497,-0.003468,0.005603,-0.001134,0.003206,0.009455,-0.004046,0.002486,0.003333,-0.008228,0.000723,-0.004191,-0.001683,-0.003258,0.002907
Parsi_Pakistan,0.080666,0.067599,-0.089443,0.002514,-0.052799,0.013362,0.001778,0.000141,-0.011258,-0.008898,-0.000607,-0.002685,0.006321,-0.001879,0.004561,0.00893,-0.004427,0.00168,0.002437,-0.008515,0,-0.004333,-0.001581,-0.002158,0.00188
Iranian_Zoroastrian,0.091924,0.107565,-0.063039,-0.021951,-0.045227,0.004139,0.001852,-0.005307,-0.027979,-0.016693,-0.000201,-0.000743,0.004448,-0.004217,0.007698,0.013121,-0.004256,0.002047,0.001901,-0.008284,-0.003479,-0.003131,0.000163,-0.002998,0.005188
Iranian_Persian_Yazd,0.092348,0.098913,-0.063733,-0.018669,-0.04645,0.001934,0.00586,-0.003615,-0.028729,-0.015381,0.001353,-0.002708,0.00557,-0.004184,0.006705,0.01226,-0.006823,0.001537,0.002179,-0.011414,-0.002754,-0.00366,-0.001471,-0.002547,0.004175
Gujarati,0.051391,-0.057733,-0.156053,0.110822,-0.080584,0.062193,-0.000223,0.011607,0.033,0.018852,-0.006496,8.2e-05,-0.002178,0.001741,-0.000679,0.000378,0.00088,-0.000171,-0.001427,-0.003933,0.001429,-0.003314,0.001109,0.001127,-0.002976
Limits and Caveats
A few caveats are worth stating clearly. First, the AASI proxy here is the Onge, an Andamanese population that approximates but does not equal the true Ancestral South Indian source, which has never been sampled directly. The absolute AASI percentages should be read as relative, comparable across the populations in this analysis, not as exact ancestry fractions. The Gujarati and Sindhi distal fits are also looser (0.13 and 0.08) because a five source West Eurasian plus AASI model only approximates the full structure of South Asian populations, which carry additional regional substructure.
Second, G25 and NNLS capture an overall autosomal signal, without distinguishing the paternal or maternal origin of any segment. The sex biased admixture pattern from Sengupta and colleagues, in which the Persian share is preferentially transmitted through men, is not directly visible here and needs uniparental Y and mitochondrial markers to confirm.
Third, Global25 tends to lean steppe shares slightly high relative to formal qpAdm, because the Sintashta steppe source overlaps with Iran_N and CHG through their shared Caucasus related ancestry. The steppe figures here should be treated as approximate. Finally, the small sample size behind Parsi_Pakistan in public datasets calls for caution when reading fine grained differences between the two Parsi groups, which remain a single population recently separated by a political border.
Conclusion
The genetic evidence aligns with the historical tradition of the Parsis, and lets us quantify it. A founding migration from Persia left an Iranian farmer base that has barely shifted, around 40 percent Iran_N, identical to Iranian Zoroastrians today. Onto that base, centuries of limited and sex biased local admixture layered roughly 16 percent deep South Indian ancestry, the one component Iranians lack, which is what genetically separates Parsis from their Persian cousins. Expressed as a two way mix, the result is about three quarters frozen Iranian Zoroastrian and one quarter Gujarat, a figure that holds whether measured through modern sources, a Sasanian era genome, or the published qpAdm literature. The Persian signal, remarkably, is better preserved among the Parsis of India and Pakistan than among many of the Iranian populations that share the same ancestral origin.
- Sengupta et al. The Genetic Legacy of Zoroastrianism in Iran and India: Insights into Population Structure, Gene Flow, and Selection, American Journal of Human Genetics, 2017.
- Amjadi et al. Ancient DNA Indicates 3,000 Years of Genetic Continuity in the Northern Iranian Plateau, from the Copper Age to the Sassanid Empire, Scientific Reports, 2025.
- Narasimhan et al. The Formation of Human Populations in South and Central Asia, Science, 2019 (ANI / ASI / steppe framework and AASI).
- Rai et al. Ancient mitochondrial DNA from the Sanjan tower of silence, Birbal Sahni Institute of Palaeosciences, reported in the Indian press, 2023.
- Davidski Global25 coordinates dataset.
- Vahaduo G25 analysis tool used for NNLS modelling.