India is home to over 1.4 billion people and one of the most complex genetic landscapes on Earth. When people from Chennai and Chandigarh compare their DNA results, they often find striking differences, and also surprising commonalities. Modern population genetics has given us the tools to understand both.
Using Davidski’s G25 Standard Calculator, one of the most widely used tools in the genetic genealogy community, we modeled 24 key Indian populations to reveal how ancient migration waves have shaped the genetic geography of the subcontinent. The results tell a coherent, fascinating story of three overlapping ancestries, a dramatic north, south gradient, and a caste structure that adds yet another layer of complexity.
Table of Contents
- 1. The Three Ancestral Components
- 2. The Full Data: 24 Indian Populations
- 3. North India: The Steppe Gradient
- 4. South India: IVC Dominance and the AASI Thread
- 5. The Double Gradient: Region and Caste
- 6. The Bengalis: A Special Case
- 7. What Does “IVC Ancestry” Actually Mean?
- 8. G25 Coordinates: Key Indian Populations
1. The Three Ancestral Components
Before diving into the North/South comparison, it is essential to understand the three ancient ancestries that underlie virtually every South Asian genome. These are not populations that exist today, they are ancient groups reconstructed from archaeological DNA, whose descendants mixed to produce modern Indians.
(Indus Valley Civilization)
(Indo-Aryan / Yamnaya-related)
(Pre-Neolithic Hunter-Gatherers)
(EEF + Iranian Neolithic)
Common Misconception
“North Indians are Aryan, South Indians are Dravidian”, implying a clean, binary genetic replacement of distinct founding populations.
What the Data Shows
All modern Indians are admixed descendants of the same three, four ancient groups. The difference is in proportions, not in kind. The “Aryan/Dravidian” framework is a linguistic categorization, not a genetic one, IVC ancestry is the dominant component almost everywhere.
across most Indians
(South to Northwest India)
(absent in NW, present in South)
2. The Full Data: 24 Indian Populations
The table below presents the G25 Standard Model results for 24 representative Indian populations, grouped by region. The key components are Steppe (Yamnaya), IVC (Shahr-i-Sokhta BA2), CHG (Ganj Dareh), Anatolian EEF (Barcin + Tepecik + Levant + Kura-Araxes), AASI (Jarawa), and East Eurasian (Han + Nganassan).
| Population | Steppe | IVC | CHG | EEF/Levant | AASI | East Eurasian |
|---|---|---|---|---|---|---|
| Northwest India, Highest Steppe | ||||||
| Kashmiri Pandit | 21.8% | 47.0% | 10.0% | 15.9% | 0.0% | 5.3% |
| Punjabi Sikh | 25.5% | 45.7% | 12.0% | 12.2% | 0.0% | 3.4% |
| Brahmin (Punjab) | 25.4% | 34.3% | 8.6% | 17.7% | 0.0% | 9.1% |
| Khatri | 23.3% | 45.2% | 10.8% | 14.4% | 0.0% | 4.9% |
| Jat (Haryana) | 20.1% | 53.2% | 6.6% | 12.8% | 0.0% | 4.4% |
| North-Central India | ||||||
| Brahmin UP (Awadh) | 23.9% | 51.1% | 1.5% | 14.4% | 0.0% | 5.4% |
| Rajput (Rajasthan) | 17.3% | 55.6% | 4.0% | 15.0% | 0.7% | 3.9% |
| Gujarati | 9.8% | 76.6% | 0.0% | 8.3% | 0.0% | 3.2% |
| Bengali (India) | 6.1% | 59.8% | 0.0% | 11.6% | 11.3% | 9.0% |
| Lower Caste / Tribal North | ||||||
| Chamar (UP) | 5.0% | 69.8% | 0.0% | 8.4% | 6.1% | 4.5% |
| Gond (Tribal) | 1.8% | 48.1% | 0.0% | 13.1% | 14.5% | 12.9% |
| South India, Upper Caste | ||||||
| Brahmin Tamil (Iyer) | 12.2% | 72.0% | 0.2% | 10.8% | 0.0% | 3.2% |
| Brahmin Telugu (Vaidiki) | 16.2% | 58.0% | 0.0% | 14.8% | 0.0% | 6.1% |
| Vellalar | 2.4% | 76.4% | 0.0% | 8.5% | 8.0% | 2.1% |
| Pillai (Tamil) | 0.0% | 78.2% | 0.0% | 9.4% | 5.1% | 2.8% |
| Reddy (Telugu) | 6.2% | 64.5% | 0.0% | 11.4% | 8.7% | 4.9% |
| Kamma (Telugu) | 6.1% | 64.5% | 0.0% | 11.6% | 7.6% | 4.7% |
| South India, Mid to Lower Caste | ||||||
| Telugu (general) | 6.1% | 69.6% | 0.0% | 11.2% | 5.3% | 4.5% |
| Nair (Kerala) | 3.6% | 72.6% | 0.0% | 8.4% | 7.4% | 5.1% |
| Ezhava (Kerala) | 3.4% | 68.2% | 0.0% | 9.3% | 10.6% | 4.9% |
| Nadar (Tamil Nadu) | 6.5% | 56.9% | 0.0% | 14.8% | 4.2% | 8.6% |
| Tamil (Sri Lanka) | 2.4% | 71.8% | 0.0% | 10.4% | 8.6% | 3.5% |
| Pulaya (Kerala) | 11.0% | 53.2% | 0.0% | 15.4% | 7.4% | 7.7% |
| Vishwakarma (Kerala) | 8.3% | 43.8% | 0.0% | 19.9% | 10.6% | 8.5% |
3. North India: The Steppe Gradient
The most striking finding in the data is the steep decline in Steppe ancestry as you move from Northwest India toward the South and East. Punjabi Brahmins and Sikhs carry ~25% Steppe ancestry, a value comparable to many modern Northern Europeans. Kashmiri Pandits hover around 22%. Jats of Haryana reach 20%.
As we move southeast, the proportion drops rapidly. Rajputs of Rajasthan show 17%, Gujaratis drop to under 10%, and Bengalis of India sit at just 6%. This gradient directly reflects the trajectory of the Indo-Aryan expansion, which entered South Asia from the northwest around 2000, 1500 BCE and spread eastward and southward over subsequent centuries.
North Indian Populations, Admixture Bar Chart
Gujaratis stand out with a remarkable 76.6% IVC-related ancestry, the highest in this dataset among non-tribal populations. This makes sense geographically: Gujarat borders what was the southern edge of the Indus Valley Civilization heartland. The Harappan port city of Dholavira is in Gujarat. Gujaratis appear to be among the most direct modern descendants of the IVC population, with relatively low Steppe ancestry despite speaking an Indo-Aryan language, suggesting the Indo-Aryan transition in Gujarat was primarily linguistic and cultural rather than demographic.
4. South India: IVC Dominance and the AASI Thread
South Indian genetics present a different picture. The Steppe component drops dramatically, to under 5% in most Dravidian-speaking communities, and zero in some Tamil populations like the Pillai. Instead, IVC ancestry remains the dominant force, reaching 70, 78% in many South Indian communities. This challenges the narrative that IVC ancestry is primarily “North Indian.”
What distinguishes South India genetically is the presence of a small but consistent AASI component (Ancient Ancestral South Indians, modeled by the Jarawa), which is largely absent in the Northwest. This ancestry represents the descendants of the pre-Neolithic hunter-gatherers who inhabited South Asia before any farming populations arrived. It appears at 5, 11% in Tamil, Telugu, and Keralite communities.
South Indian Populations, Admixture Bar Chart
Tamil Brahmins (Iyer) show 12% Steppe ancestry, lower than their North Indian counterparts but well above non-Brahmin South Indians. Telugu Brahmins (Vaidiki) reach 16%. This gradient within South Indian Brahmins itself reflects the historical southward migration of Brahmin priestly communities. Importantly, South Indian Brahmins have essentially zero AASI ancestry despite living in the South for millennia, consistent with patterns of endogamy preserving earlier genetic profiles.
5. The Double Gradient: Region and Caste
The data reveals that Indian genetics follows not one but two overlapping gradients: a geographic one (Northwest to Southeast) and a caste-based one (upper to lower). These interact to produce the full pattern we observe.
Figure 1. Steppe ancestry (Yamnaya_RUS_Samara) across 21 Indian populations, ordered from highest (Northwest) to lowest (South India and tribal). G25 Standard Model, NNLS optimization.
6. The Bengalis: A Special Case
Bengali Indians occupy a fascinating intermediate position. With only 6.1% Steppe ancestry, they resemble South Indians more than their North Indian neighbors. But they carry a significant AASI component (11.3%), actually higher than most South Indian communities, alongside a notable East Eurasian contribution (~9%), reflecting the geographic position of Bengal at the border between South and Southeast Asia and the historical influence of Austroasiatic-speaking populations.
This makes Bengalis genetically distinct from both the Steppe-heavy Northwest and the IVC-dominant South, forming a genuine Eastern cline in the Indian genetic landscape. The Gond tribal population shows the most extreme version of this eastern pattern, with 14.5% AASI and nearly 13% East Eurasian.
7. What Does “IVC Ancestry” Actually Mean?
The most striking result in this dataset is that the IVC-related component is not a “Northern” ancestry at all, it is the single largest component in virtually every Indian population, from Kashmiri Pandits (47%) to Tamil Pillai (78%) to Gujaratis (77%). It links all Indians, cutting across the linguistic and cultural divide.
The Indus Valley Civilization was not confined to the Northwest. After the IVC’s decline (c. 1900 BCE), this population dispersed across the subcontinent, carrying its genetic signature to both North and South India long before the Steppe migration arrived from the northwest.
If you are South Asian and using the Davidski Standard Calculator on Vahaduo, your results will typically show a large IVC (Shahr-i-Sokhta BA2) component, a variable Steppe (Yamnaya) proportion depending on your community and regional background, and possibly a small AASI (Jarawa) fraction. North Indians from the Punjab and UP typically land in the 20, 26% Steppe range; South Indians in the 0, 10% range. Comparing your results against the populations in this article is the most reliable way to contextualise what you see.
8. G25 Coordinates: Key Indian Populations
Below are the G25 scaled coordinates for selected Indian populations, sourced from the Moriopoulos 2025 Collection and Davidski’s Global25 averages. These can be pasted directly into Vahaduo as targets.
Northwest / North India
South India
Conclusions
1. The IVC ancestry is the pan-Indian unifier. Regardless of language, caste, or region, the Indus Valley Civilization-related ancestry constitutes the majority of most Indians’ genomes, from 43% in the Gond tribal population to over 78% in the Tamil Pillai. This ancestry is the genetic legacy of a sophisticated Bronze Age civilization that predates both the Steppe migration and, in the South, the AASI hunter-gatherers’ full absorption.
2. Steppe ancestry maps closely to the historical range of Indo-Aryan languages and caste hierarchy. The Northwest Corridor shows the highest Steppe proportions (20, 26%), consistent with entry from the Central Asian steppe c. 2000, 1500 BCE. As you move south and east, this component fades predictably. Upper castes everywhere retain more Steppe ancestry than lower castes.
3. AASI ancestry is a thread connecting South India and tribal populations to the continent’s original inhabitants. The Jarawa proxy appears most strongly in tribal populations (Gond: 14.5%), South Indian communities (Ezhava: 10.6%, Tamil Sri Lanka: 8.6%), and Bengalis (11.3%). It is nearly absent in Northwest India and South Indian Brahmins.