Computer Program Can Reconstruct Lost Languages

Juan Naharro Gimenez / Getty Images

A replica of the Rosetta Stone is displayed as part of the 'Treasures of the World's Cultures' exhibition at Centro Exposiciones Arte Canal on January 12, 2010 in Madrid, Spain.

You say puh-tay-tow, I say puh-tah-tow, but how did our ancestors use language millennia ago? Typically you’d ask a linguist, but manually reconstructing protolanguages — hypothetical early languages from which extant ones evolved — can be a lengthy, arduous process. What if you could get reasonably close in a fraction of the time by using a computer?

Researchers in Canada and California have done exactly that by designing software that can take rules about how language-related sounds change over time to essentially reverse-engineer the process and recreate the rudiments of lost root languages — a sort of linguistic time machine-meets-Rosetta stone.

(MORE: If Apple Makes a Smartwatch, This Is the Competition)

The idea that language changes over time is obvious enough on contemporary time-scales — just look at dialects. Today some people say “axed” (phonetically) instead of “asked” while others say “howdy” instead of “hello.” When I lived in Denton, Texas, folks said “y’all” instead of “you all,” and the colloquialism “ain’t” (instead of “am not” or “is not”) — the bane of English language formalists everywhere — was widespread by the 18th century.

(Conversely, the historian J.M. Roberts writes in The History of the World that a word like “alcohol” survives more or less in its original form from Sumerian, a language spoken in southern Mesopotamia since the 4th millennium B.C.; so, says Roberts, does the world’s first recipe for beer.)

But tracing back the origins of dialect changes is kid’s stuff compared to constructing entire progenitor languages that existed prior to the earliest extant ones. All we have are the descendant languages and ideas about how sounds change over time: kind of like playing Clue with half the game board and only some of the cards. Compare the features of two or more languages with common ancestors — a process known as the comparative method — and scholars would argue you can get close, but it can be a painstaking process.

Imagine instead taking over 600 existing languages spoken in Asia and the Pacific — precisely what these researchers did — and feeding them to a computer that quickly and accurately reconstructed likely protolanguages from which the modern cognates evolved. In this case, the computer program scanned a database of over 140,000 words, from which it managed to construct a protolanguage the researchers believe may have been spoken around 7,000 years ago. How accurate are we talking? According to the researchers: “Over 85% of the system’s reconstructions are within one character of the manual reconstruction provided by a linguist specializing in Austronesian languages.”

That’s kind of remarkable, even if Alex Bouchard-Côté — one of the researchers and co-author of the related paper “Automated reconstruction of ancient languages using probabilistic models of sound change” published in the journal Proceedings of the National Academy of Science — admits the algorithm is still “doing a basic job right now” (via TechNewsDaily).

Outside geeky academic circles and obscure scholarly journals populated with articles no one generally reads, what’s the practical purpose of reconstituting a dead language?

For starters, knowing how languages changed over time can help us better organize history, say the order in which key events happened. According to Bouchard-Côté, for instance, we might be able to refine our understanding of how Europe was settled: “If you can figure out if the language of the settling population had a word for wheel, then you can get some idea of the order in which things occurred, because you would have some records that show you when the wheel was invented.”

“It’s very time consuming for humans to look at all the data,” says fellow researcher and paper author Dan Klein (via BBC). “There are thousands of languages in the world, with thousands of words each, not to mention all of those languages’ ancestors. It would take hundreds of lifetimes to pore over all those languages, cross-referencing all the different changes that happened across such an expanse of space – and of time. But this is where computers shine.”

But okay, ask the question we’re all burning to know: What happens if you take Vulcans and Romulans — two Star Trek races with their own constructed languages and common ancestry — and feed this program that?

MORE: Nataly Dawn On Her New Album How I Knew Her and Why Kickstarter Fans Can Be Fickle

Time.com

Alt Tech

Move Over, BabelFish: Computer Program Reconstructs Lost Tongues