A map shows the spread of the Indo-European languages from their original homeland immediately south of the Caucasus.

New insights into the origin of the Indo-European languages

Linguistics and genetics combine to suggest a new hybrid hypothesis
A map shows the spread of the Indo-European languages from their original homeland immediately south of the Caucasus.
Picture: © P. Heggarty et al., Science (2023)
  • Research

Published: | By: Russel Gray/Laura Weißert

An international team of linguists and geneticists led by researchers from the Max Planck Institute for Evolutionary Anthropology in Leipzig has achieved a significant breakthrough in our understanding of the origins of Indo-European, a family of languages spoken by nearly half of the world’s population. Also involved was a team from Indo-European Studies at the Faculty of Arts of the University of Jena.

For over two hundred years, the origin of the Indo-European languages has been disputed. Two main theories have recently dominated this debate: the "Steppe" hypothesis, which proposes an origin in the Pontic-Caspian Steppe around 6,000 years ago, and the  "Anatolian" or "farming" hypothesis, suggesting an older origin tied to early agriculture around 9,000 years ago. Previous phylogenetic analyses of Indo-European languages have come to conflicting conclusions about the age of the family, due to the combined effects of inaccuracies and inconsistencies in the datasets they used and limitations in the way that phylogenetic methods analysed ancient languages.

To solve these problems, researchers from the Department of Linguistic and Cultural Evolution at the Max Planck Institute for Evolutionary Anthropology assembled an international team of over 80 language specialists to construct a new dataset of core vocabulary from 161 Indo-European languages, including 52 ancient or historical languages. This more comprehensive and balanced sampling, combined with rigorous protocols for coding lexical data, rectified the problems in the datasets used by previous studies.

The Jena researchers Prof. Martin Joachim Kümmel, Dr Matilde Serangeli and doctoral student Robert Tegethoff – along with many others – were responsible for the examination, provision and analysis of numerous, especially Indo-Iranian and Anatolian language data. From the point of view of historical linguistics, Kümmel says, the most important advantage of the new study is that the quality of the data is now at the highest level – because this is the only way to reliably identify methodological problems in the new computer-based procedures.

"In most previous phylogenetic studies of the same kind, the data quality was problematic, so that when unexpected problems occurred in the evaluation, it was difficult to clarify whether the methodology or the data basis was the cause," explains Kümmel. "This problem should now be eliminated so that the advantages and disadvantages of different modelling methods can be explored in more detail."

Indo-European estimated to be around 8,100 years old

The team used recently developed ancestry-enabled Bayesian phylogenetic analysis to test whether ancient written languages, such as Classical Latin and Vedic Sanskrit, were the direct ancestors of modern Romance and Indic languages, respectively. Russell Gray, Head of the Department of Linguistic and Cultural Evolution and senior author of the study, emphasized the care they had taken to ensure that their inferences were robust. "Our chronology is robust across a wide range of alternative phylogenetic models and sensitivity analyses," he stated. These analyses estimate the Indo-European family to be approximately 8,100 years old, with five main branches already split off by around 7,000 years ago.

These results are not entirely consistent with either the Steppe or the farming hypotheses. The first author of the study, Paul Heggarty, observed that "recent ancient DNA data suggest that the Anatolian branch of Indo-European did not emerge from the Steppe, but from further south, in or near the northern arc of the Fertile Crescent – as the earliest source of the Indo-European family. Our language family tree topology, and our lineage split dates, point to other early branches that may also have spread directly from there, not through the Steppe."

New insights from genetics and linguistics

The authors of the study therefore proposed a new hybrid hypothesis for the origin of the Indo-European languages, with an ultimate homeland south of the Caucasus and a subsequent branch northwards onto the Steppe, as a secondary homeland for some branches of Indo-European entering Europe with the later Yamnaya and Corded Ware-associated expansions. "Ancient DNA and language phylogenetics thus combine to suggest that the resolution to the 200-year-old Indo-European enigma lies in a hybrid of the farming and Steppe hypotheses," remarked Gray.

Wolfgang Haak, a Group Leader in the Department of Archaeogenetics at the Max Planck Institute for Evolutionary Anthropology, summarizes the implications of the new study by stating, "Aside from a refined time estimate for the overall language tree, the tree topology and branching order are most critical for the alignment with key archaeological events and shifting ancestry patterns seen in the ancient human genome data. This is a huge step forward from the mutually exclusive, previous scenarios, towards a more plausible model that integrates archaeological, anthropological, and genetic findings."


Original publication:
Paul Heggarty, Cormac Anderson, Matthew Scarborough, Benedict King, Remco Bouckaert, Lechosław Jocz, Martin Joachim Kümmel, Thomas Jügel, Britta Irslinger, Roland Pooth, Henrik Liljegren, Richard F. Strand, Geoffrey Haig, Martin Macák, Ronald I. Kim, Erik Anonby, Tijmen Pronk, Oleg Belyaev, Tonya Kim Dewey-Findell, Matthew Boutilier, Cassandra Freiberg, Robert Tegethoff, Matilde Serangeli, Nikos Liosis, Krzysztof Stronski, Kim Schulte, Ganesh Kumar Gupta, Wolfgang Haak, Johannes Krause, Quentin D. Atkinson, Simon J. Greenhill, Denise Kühnert, Russell D. Gray (2023): Language trees with sampled ancestors support a hybrid model for the origin of Indo-European languages. Science, DOI: 10.1126/science.abg0818External link