The Roma, known as Gitanos in Spain, Tziganes in France, Zigeuner in Germany, are among the most genetically distinct populations in Europe. Despite a thousand years of European presence, their autosomal DNA continues to carry an unmistakable signature from the Indian subcontinent. This article uses G25 coordinates from the Moriopoulos 2025 modern collection to model the ancestry of nine Roma diaspora populations across the Balkans, Central Europe, and the Iberian Peninsula, and to quantify both their shared South Asian heritage and the variable degree of local European admixture accumulated along their long journey west.
? Key Findings
- All Roma populations retain between 18% and 42% South Asian ancestry, detectable even in populations that have lived in Iberia for over 600 years.
- Balkan Roma show the highest South Asian component (~39, 42%), consistent with their position closest to the ancestral entry point into Europe.
- Iberian Roma (Gitanos) show the strongest local admixture, with up to 49% Iberian ancestry, yet retain a clearly non-European signal absent in any other Spanish population.
- A Middle Eastern / Iranian “passage layer” is visible in most populations, reflecting the historical route through Persia and the Caucasus before entering Byzantine territory.
- The South Asian proxy most consistently selected by the NNLS model is Gujarati-like, likely reflecting populations from the northwest of the Indian subcontinent with higher Iranian-related ancestry relative to more eastern South Asian groups.
- Y-DNA haplogroup H-M82 (H1a1a) and mtDNA haplogroup M5a1, both exclusively South Asian, are the most common lineages among Roma men and women respectively.
I. Historical Migration Route
The Roma are not a monolithic group but a cluster of related populations who share a common origin in northwestern India, most likely in the region spanning today's Punjab, Rajasthan and Sindh. The earliest genetic divergence from South Asian sister populations is estimated between the 5th and 11th centuries CE, with most genomic studies converging on a departure from the subcontinent around the 9th, 10th century CE.
From the Indian subcontinent, ancestral Roma populations moved into Persia and the Caucasus, a passage clearly reflected in the autosomal signal. Byzantine Greek sources mention a group called the Atsingani in 9th, 11th century Anatolia, widely considered the first historical reference to Roma in the Mediterranean world. By the 13th century they were documented in the Balkans, and by the 15th century they had spread across most of Western Europe, reaching Spain (1425 CE), England (~1500 CE) and Scandinavia.
? Ancestral Migration Route
9th, 10th c. CE → ? Persia / Afghanistan
~10th c. CE → ? Armenia / Caucasus
~10th, 11th c. CE → ? Byzantine Anatolia
~11th c. CE → ? Balkans
12th, 14th c. CE → ? Western Europe
14th, 15th c. CE
This route is not merely a historical narrative: it leaves a layered genetic trace that remains partially visible in modern Roma populations and that the G25 model below attempts to decompose systematically.
II. Haplogroup Signatures of South Asian Origin
Before examining the autosomal G25 results, it is worth noting the uniparental haplogroup evidence, which provides the clearest possible signal of South Asian ancestry and is entirely independent of admixture modeling.
III. G25 Admixture Results by Diaspora
The G25 coordinates below were extracted from the Moriopoulos 2025 Modern Population Collection (no-simulations averages). Admixture modeling was performed using NNLS (non-negative least squares) in the G25 scaled space, with three macro-level source categories:
- South Asian (proxy: Gujarati, Jat_Haryana, Punjabi_Hindu_India, Sindhi), representing the ancestral Indian source population.
- Balkan / SE European (Greek Peloponnese, Greek Macedonia, Bulgarian, Romanian, Macedonian), reflecting Byzantine and post-Byzantine host admixture.
- Iberian (Spanish Andalucia, Extremadura, Cataluña), relevant for the Gitano populations of Spain and Portugal.
- Iranian / MENA (Iranian Persian, Iranian Zoroastrian, Armenian), the Caucasus and Persian passage layer.
- Turkish / Ottoman (Turkish Rumeli), Ottoman Balkan admixture visible in some groups.
You can run your own G25 coordinates against these Roma population averages or model Roma populations on Calculator #186, The Migration Era Calculator (Age of the Huns, Scaled) available on this site.
Note on the Serbian and Turkey/Balkans results: Both populations are based on a single individual (n=1), meaning individual-level variance may distort macro-component proportions. They should be interpreted with greater caution than the multi-individual averages.
Iberian Roma, The Gitanos
The Iberian Roma, collectively known as Gitanos, represent one of the most studied Roma communities in Europe. Their entry into Spain is historically documented to 1425 CE, when King Alfonso V of Aragon granted a safe-conduct to a group of Roma pilgrims. Over the following centuries they spread throughout the Iberian Peninsula. The genetic results below show that six centuries of Iberian residence have significantly increased their local admixture, yet their South Asian ancestry remains clearly detectable and sharply distinguishes them from all non-Roma Spanish populations.
IV. Comparative Table, All Roma Populations
| Population | n | South Asian | Balkan / SE Eur. | Iberian | Iranian / MENA | Turkish | Distance |
|---|---|---|---|---|---|---|---|
| Roma, Turkey / Balkans | 1 | 41.8% | 45.8% | , | 12.4% | , | 0.00528 |
| Roma, Bosnia-Herzegovina | 3 | 39.0% | 58.9% | , | 2.1% | , | 0.00289 |
| Roma, Czechia | 4 | 37.5% | 41.6% | 4.8% | 8.3% | 7.9% | 0.00134 |
| Roma, Granada (Andalucia) | 7 | 35.8% | 32.5% | 21.6% | 10.1% | , | 0.00341 |
| Roma, Porto (Portugal) | 4 | 35.5% | 45.7% | 16.3% | 2.5% | , | 0.00365 |
| Roma, Bilbao (Basque Country) | 8 | 34.5% | 48.5% | 17.0% | , | , | 0.00346 |
| Roma, Madrid | 4 | 33.6% | 35.8% | 21.7% | 8.9% | , | 0.00320 |
| Roma, Barcelona | 6 | 20.1% | 27.1% | 48.7% | , | 4.1% | 0.00199 |
| Roma, Serbia | 1 | 18.1% | 55.2% | 3.7% | , | 23.1% | 0.00411 |
? A Note on Single-Sample Populations
Roma_Serbia and Roma_Turkey_Balkans are each based on a single sequenced individual (n=1). Individual Roma genomes vary considerably due to the combined effects of the historical bottleneck and variable admixture with local host populations. These two entries should be treated as indicative of the range of variation rather than as stable population averages. The populations with n ≥ 4 (Czechia, Barcelona, Granada, Madrid, Porto, and the Bosnia-Herzegovina group) provide more statistically robust estimates.
V. Interpretation: What the Results Tell Us
The South Asian Persistence
The most striking result across all nine Roma populations is the consistent presence of a South Asian genetic component ranging from approximately 18% to 42%. This is remarkable for a population that has been resident in Europe for up to a millennium. For comparison, no other non-South-Asian European population, including Sephardic Jewish communities whose presence in Iberia pre-dates the Roma, shows any detectable South Asian autosomal signal.
The NNLS model consistently selects a Gujarati-like proxy to represent the South Asian component rather than more easterly or more purely South Asian populations (Bengali, Dravidian). This is consistent with what population genetics research suggests about the ancestral Roma homeland: a Northwest Indian population with relatively high Iranian-related ancestry, as found today among communities from Punjab, Rajasthan, and Gujarat, particularly among upper-caste and Rajput-related groups. Some researchers have specifically proposed Rajasthan as the most likely departure point.
The Balkan Layer, Byzantine Admixture
Across all Roma populations, including those now living in Iberia or Central Europe, a strong Balkan / Greek component forms the bulk of the non-South-Asian ancestry. This is not the result of recent mixing with Balkan populations, but rather reflects the extended period, roughly 200 to 400 years, that ancestral Roma spent within the Byzantine Empire before beginning their westward dispersal. The Greek Peloponnese and Greek Macedonia proxies being selected consistently is highly coherent with this history.
The implication is significant: most of what is “European” in Roma ancestry is Byzantine-Greek-Balkan, not the ancestry of the western European host populations they later settled among. A Roma individual in Seville carries far more Greek Peloponnese ancestry than Andalusian ancestry in the purely genetic sense.
The Iberian Gradient
Among Iberian Roma populations, a clear geographic gradient emerges. The Roma of Barcelona show the highest Iberian absorption (~49%), while those of Granada (~22%), Porto (~16%), and Bilbao (~17%) show lower but still significant Iberian components. This variation may reflect differences in the degree of endogamy practiced by different local Gitano communities, as well as possible differences in when or how they settled into particular regions.
Interestingly, despite being based in Andalucia, the region of Spain most associated historically with Gitano culture, Roma from Granada do not show the highest Iberian proportion. The Granada sample retains a notably large Iranian/MENA component (~10%), possibly reflecting greater preservation of the older pre-Iberian ancestry in this specific community.
The Iranian / Armenian Passage Layer
A Middle Eastern / Caucasian layer ranging from approximately 2% to 12% is detectable in most Roma populations. This “passage layer” is consistently captured by Armenian or Iranian Zoroastrian proxies, both of which genetically approximate the populations ancestral Roma would have encountered during their transit through the Caucasus and Persia. This component is most visible in Turkish Balkans (12.4%), Czechia (8.3%), Granada (10.1%), and Madrid (8.9%).
The survival of this signal after so many generations of European residence suggests either that it was large enough in the founding Roma population to remain detectable, or that some ongoing admixture with MENA-related populations (including Ottoman Turks and Middle Eastern traders) reinforced it during the Balkan centuries.
? Common Misconception
“Roma have lived in Europe so long that they are genetically European, their South Asian ancestry is negligible.”
? Genetic Reality
Even after 600 years in Iberia, Gitanos retain 20, 36% South Asian ancestry, more South Asian ancestry than any non-Roma population anywhere in Western Europe. Their Indian heritage is not “negligible” in any genomic sense.
? Common Misconception
“Gitanos must be closely related to Egyptian or North African populations, hence the old name ‘Gypsies’.”
? Genetic Reality
The term “Gypsy” derives from a medieval European misidentification of Roma as migrants from “Little Egypt.” Genetically, Roma have no special affinity with North African or Egyptian populations. Their ancestry traces to India via the Caucasus and the Byzantine world.
VI. G25 Coordinates, Roma Diaspora Populations
The following G25 coordinates are drawn from the Moriopoulos 2025 Modern Population Collection. Paste them as target populations in Vahaduo or use them as reference points when running your own analysis on Calculator #186.
VII. Key Takeaways
The genetic data paint a consistent and striking picture of Roma ancestry. Across all nine populations examined, spanning the Balkans, Central Europe, and the Iberian Peninsula, a set of converging signals emerges that is entirely coherent with the historical narrative of migration from South Asia through the Middle East into Byzantine Europe and then westward.
The South Asian component is the most diagnostic: present in every Roma population, ranging from ~18% in a single Serbian individual to over 40% in the least-admixed Balkan groups, and pointing consistently to a Northwest Indian source population with affinities to modern Gujarati, Punjabi, and Sindhi communities. This is entirely in line with what haplogroup studies have shown for decades: Y-DNA H-M82 and mtDNA M5a1 leave no room for ambiguity about the Indian origin of the Roma people.
The Balkan / Byzantine layer, consistently the largest non-South-Asian component in almost every population, tells the story of a long pause in the Byzantine world before the westward dispersal. Roma arrived in Western Europe already carrying the genetic imprint of Greece and the Balkans, not as pristine descendants of the Indian subcontinent but as a population that had already absorbed significant Balkan admixture over several centuries.
Finally, the Iberian variation, from ~17% in Porto to ~49% in Barcelona, documents the ongoing, variable process of local admixture that continues to differentiate Roma communities even within a single country. Yet even at the high end of this range, the South Asian signal never disappears: Barcelona Roma are still ~20% South Asian, a proportion that exceeds what any admixture calculator would ever return for any non-Roma Western European population.
References
- Morar B. et al. (2004). Mutation History of the Roma/Gypsies. The American Journal of Human Genetics, 75(4), 596, 609. DOI:10.1086/424759
- Mendizabal I. et al. (2012). Reconstructing the Population History of European Romani from Genome-wide Data. Current Biology, 22(24), 2342, 2349. DOI:10.1016/j.cub.2012.10.039
- Rai N. et al. (2012). The Phylogeography of Y-Chromosome Haplogroup H1a1a-M82 Reveals the Likely Indian Origin of the European Romani Populations. PLOS ONE, 7(11), e48477. DOI:10.1371/journal.pone.0048477
- Moorjani P. et al. (2013). Genetic Evidence for Recent Population Mixture in India. The American Journal of Human Genetics, 93(3), 422, 438. DOI:10.1016/j.ajhg.2013.07.006
- Olivieri A. et al. (2021). Mitogenome Diversity of Present-Day Romani People Points to a Single Founder Population from the Northwestern Indian Subcontinent. European Journal of Human Genetics, 29, 1835, 1843. DOI:10.1038/s41431-021-00924-y
- Haak W. et al. (2015). Massive Migration from the Steppe was a Source for Indo-European Languages in Europe. Nature, 522, 207, 211. DOI:10.1038/nature14317
- Moriopoulos Modern Population Collection 2025 (no simulations, averages). Available at: moriopoulos.com
- Davidski. Global25 PCA Modern Population Averages (Scaled). Eurogenes Blog. Available via: Vahaduo G25 Download