ELI5 How do we know what extinct languages sound like if no one can speak them?

Read the Story

Show Top Comments

/u/daggerfont’s examples notwithstanding, though, we *don’t* always know what extinct languages sounded like. We’re making our best guesses based on the evidence available to us, and the less evidence we have, the less confident we are in those reconstructions.


There’s a couple of ways. I’m not an expert in it, but as a history major I do study related things. You can imagine the evolution of language sort of like a family tree, just a little more complicated and interwoven because it’s not necessarily just two “parent languages” that influence a language. But Knowing how we pronounce modern languages, and other ancient ones that we’ve already figured out, you can trace pronunciation of vowel sounds along with spelling patterns and such things can give us an idea of what languages sounded like. Also, sometimes we get accounts of not exactly extinct languages, but old ones that aren’t spoken the same way from other nearby cultures where they describe what the people who lived in a place sounded like. The Romans wrote a lot about what the Germanic and Gallic tribes were like, and I vaguely remember reading at least one passage about their language. It gets pretty much impossible when you go back to something like Linear A, which we haven’t even deciphered. When it comes to languages with written records that include poetry like Latin and ancient Greek (an others too), we can tell what words sounded good together and based on an analysis of their literary devices, like rhyming or repetition of sounds, we can get some information about how different letters were pronounced. Also, sometimes words in one language are transcribed by people in a different language, for example Arabic or Greek words being written in Latin characters. If you know the pronunciation conventions for the second language, and find the correlating word in the first language, that tells you something about how it was pronounced, or at least how a foreign listener interpreted it. Another way, which I find the most interesting personally, is through forensic archaeology. In certain cases, based on the condition of preserved human remains, forensic archaeologists can reconstruct the voice box of a person. From the way that their voice box developed, it is possible to tell what kind of vowel sounds were used in the language that they spoke, and which ones weren’t present. One example of this is Ötzi the Ice Man. From his voice box (iirc) it seems like his language didn’t have hard /a/ sounds like in “plane” or “fate,” but instead had vowel sounds more like those in /a/ in “father” the /e/ “feather,” and the /u/ in “dune” There’s a lot more to it that I don’t understand, it’s a pretty complicated thing. If I got any of this wrong, someone please feel free to correct me!


To add to what others say, you will find ancient writings in the sort of “kids these days don’t speak right anymore and say things like..,” style. There’s also contemporary accounts of plays or speeches and graffiti that tell us much.


We have a pretty good understanding of how spoken languages changes in order to make it easier to speak. We take “shortcuts” when speaking and eventually these become the norm. But there are different ways that languages can develop in this way so by tracing back different languages we can reconstruct how a common root language were pronounced. In addition to this we do have school textbooks from a lot of these extinct laungages which desicrbes how to pronounce different letters. Some of these textbooks we have are older then the pyramids of Giza and contains detalied explanation of how to possition your mouth to express various letters.


This will be long, but breaking this down into an ELI5 would be… **we compare the sounds from modern languages, then reconstruct earlier languages based on the shared (and different) sounds in the more recent languages.** — So the field of historical linguistics here is directly relevant. As an anthropologist– in the US, we include linguistic anthropology in the broader anthropological field– I can maybe shed a bit of light on this. Consider the English word **father**. In French, “father” is *père*. In Spanish, it’s *padre*. In Italian, it’s *patre*. Now, you would be right in looking at these and thinking they sound familiar and look familiar in their structure. The descend from Latin, where the word for “father” is *pater*. We know this. The Romance languages are well known daughter languages of Latin, basically having developed out of regional dialects. Now, consider the word for “father” in Dutch. **Vader** Or in German: **vater** Or in Persian: **pedar** Or in Hindu: **pitri** You can start to see similarities in these words. That’s because, as it turns out, all of these languages are part of the Indo-European language family, a family of languages that seems to have a common ancestor spoken by one or more cultures somewhere back about 5000 – 6000 years ago, possibly around the Black Sea. These people appear to have been relatively mobile / nomadic, and when they moved outward from the Black Sea, they brought their language with them. Between population increases and cultural hybridization with local and regional cultures, the language they spoke also spread, and diverged. As speakers of the daughter languages of the early tongue– what scholars call *proto-Indo-European*– also spread out and interacted with other populations, the languages diverged further, spread further… until you get the diverse languages of the Indo-European family today. So… how / why do we know this. Originally, it was worked out by linguists. (And today there’s archaeological and genetic evidence to support it.) The similarities in European languages and languages of the Indian subcontinent were recognized hundreds of years ago, and over time, the study of these similarities became more formalized. Scholars began to realize that if languages– even languages spoken by people living very far from each other– shared enough similarities, they were probably somehow related. As the field of linguistics– and of historical linguistics– developed, the reconstruction of *proto-languages* (that is, mother languages of multiple descendant “daughter” languages) became a topic of major interest. **How are languages reconstructed?** Look above at all of the various words for “father.” Look at the similarities, and look at the differences. Now, consider that there are many other languages in the Indo-European family that also share these similarities. Group them by geographic region. And then start comparing. Not just “father” but other words for key concepts that are probably *very* old. “Mother.” “Water.” “Daughter.” “Son.” and so on. What historical linguists call the “core vocabulary.” It’s presumed– with evidence to back this up– that there are certain concepts that most human cultures have had words for over our species’ history. Basic kinship terms. Words about the basic surrounding environment. Pull these words out of the various languages in a given region. French, German, Spanish, Italian, Portuguese, Romanian, Dutch, English… And then start comparing. The **p** at the beginning of the word for “father” in Spanish, French, Italian, Portuguese (Romance languages) versus the **v** at the beginning of the word in the Germanic languages. So… mark that as a possible grouping. The Germanic on one hand, and the Romance on the other. Perhaps these groups have been separated from each other long than they’ve been separate from each other. Then, do this with other words and sounds. As you compare, and build your database of “older” sounds, what you’re really doing is reconstructing what the *proto* language sounds were. For the Romance languages, you have a way of checking this. You can look at Latin. By doing this extensively, you essentially reconstruct what the “original” sounds and words were. Is it perfect? No, but it’s actually pretty solid. Then, you can do this for other regions. Remember, the **p** associated with the Latin daughter languages is also found in Hindu and Persian. So perhaps those languages are closer to the Romance languages in some respects than the Germanic languages. And so on and so forth. What you end up with after decades and centuries of research is various *proto* languages, all the way back to something of a reconstructed *proto-Indo-European*. Do we know *exactly* how all these proto languages sounded? Not really. *But* having done a lot of linguistic comparisons with sounds and word structures, grammar, etc., we can actually be reasonably confident in quite a bit of what the meaningful sounds were in these languages. This is a *major* part of how we *think* we know what ancient languages sounded like. We look at their various daughter languages, and as scientifically as we can, we extrapolate.