What Deep Learning Gets Wrong About Language

The competition between Google and Amazon to be your digital assistant has moved to automated translation. The last 15 months have quietly ushered in a new era in automated translation fueled by the Google-Amazon rivalry. Long a source of laughs for its real and apocryphal fails, in 2016 Google Translate raised the bar with a new translation engine that delivers increasingly accurate translations between 103 languages. In October 2017, Google made the jump to voice recognition with Pixel Buds, a set of wireless earphones through which Google wanted to expand into real-time translation. Amazon’s digital assistant, Alexa, can already translate simple sentences in 36 languages, and the company is trying to catch up with Google by developing a more sophisticated translation engine for its voice recognition system capable of handling multiple conversations and some degree of cultural recognition.

Even with these improvements, accuracy remains a concern. During the recent Winter Olympic games, a Google Translate mix-up resulted in the delivery of 15,000 eggs to Norway’s delegation of 109 athletes. Last January, a town in northern Spain was widely mocked after rolling out a new website that invited tourists to visit the “Looting Center for the Arts” before enjoying a stroll through the town’s picturesque “historic helmet.” (The Spanish word “casco” can mean both helmet and neighborhood, while the arts center is named after the Botín family, owners of the Santander banking conglomerate, rather than after the cognate for “booty.”) Although the town spent the equivalent of roughly $7,500 on website design, nobody thought to double-check Google’s translations, much less consult a professional translator. Anecdotes aside, the error that resulted in so many extra eggs for Norway’s athletes suggests that the accuracy rate for non-European languages is far lower.

It may seem unfair to criticize Google Translate if the worst misunderstanding to befall its more than 500 million users is the occasional embarrassment. And yet these mishaps point to something linguists and language teachers know well and that AI does not — language is more than complex strings of data. Nuance, bias, humor, playfulness, cultural differences, or — why not — incompetence and deception, are layered into human interactions. Is Deep Learning — the idea that computers can process and learn from data in ways that mimic how the human brain processes and learns from input — up to the task? And more importantly, what are the consequences if it is not?

To be fair, today’s Google Translate is not your older sibling’s Google Translate. Until November 2016, it relied on a system called statistical machine translation, meaning Google’s computers analyzed a large corpus of transcripts from the United Nations and the European Parliament in order to produce statistical models from which to generate translations. So, if you asked Google to translate from Spanish to French, for example, its translation engine would first look for patterns in its Spanish-English reference library, return a statistical good guess based on its data, and then do the same for English to French.

Since English and French have historically been the languages of diplomacy, and English is more closely related to German or Spanish than to Arabic, Japanese, or Turkish, it is easy to see why earlier versions of Google Translate were criticized for their Eurocentric bias. Recently, however, Google developed a more sophisticated and accurate method called neural machine translation. This new algorithm can process whole sentences rather than isolated words or phrases; it engages deep learning processes so that, like a brain, the system improves over time; and, finally, it translates directly from one language to another without adding a layer of English in the middle.

As more public agencies use Google Translate to convey important information to a diverse public, reliance on automated translation raises questions about the links between inequality and access to information. What criteria does Google use when adding new languages to its core service? Why is Luxembourgish, with fewer than 400,000 speakers, one of the 103 languages available, while Quechua, an indigenous language spoken by over nine million in the Andes, is not? What happens when languages without a strong digital presence cannot compete with the availability of digital services in English such as Siri, Alexa, or your automated GPS co-pilot? What might be the consequences of providing inaccurate translations in contexts where the stakes are much costlier than a few thousand extra eggs?

Last year a British court heard the testimony of a Chinese national whose only available interpreter was Google Translate; unsurprisingly, the hearing had to be adjourned due to faulty communication. There are a growing number of studies on patient safety risks associated with language barriers in hospital care. In a recent Swiss study conducted at the Geneva University Hospital, Google’s French-Arabic translations scored poorly in accuracy and comprehensibility. To be sure, humans make mistakes in communication all the time as well, but there are good reasons why legal and medical translation providers undergo thorough training or why so many pre-med students study language at college — miscommunication in these fields can have life-changing consequences.

We like to think of software and algorithms as neutral and unbiased. In recent months, however, technology’s gender, racial, and cultural biases have come under increasing scrutiny. ProPublica reported on the bias against African-Americans in the software used at parole hearings to assess the likelihood of reoffending. Facial recognition algorithms have been proven unreliable for people of color. Google Translate itself has been criticized for gender bias: when translating from Turkish to English, for instance, the gender-neutral “is a doctor” automatically becomes “he is a doctor.” Soldiers and entrepreneurs were also automatically assumed to be men; nurses and teachers, women.

Deep learning cannot be neutral because the dataset it learns from is not. But since machine translation is looking at text as complex strings of data without contextual or cultural cues, automated translation can be unintentionally neutral in the worse sort of way — by failing to recognize how a sentence, phrase, or even a single word mean completely different things in different contexts. As a quick experiment, try saying the word “Great!” in as many different tones as possible: congratulatory, surprised, sarcastic, impatiently, mocking…

Undeniably, Google Translate is a highly cost-effective and useful tool. But there are hidden social and economic costs in relying uncritically on it. According to a recent survey commissioned by the American Council for the Teaching of Foreign Languages for their advocacy campaign for language education, #LeadwithLanguages, 90% of U.S. businesses rely on employees with language skills. While demand for advanced language and intercultural skills in the labor force grows, investment in language education lags painfully behind. Only 20% of K-12 students are enrolled in a world language class, and the number is even lower for college students — just 8%. By all means, let’s devote resources to STEM education. But let’s also make a comparable investment in providing opportunities for linguistic immersion for primary, secondary and higher education students, and in providing the kind of critical thinking skills that languages and other disciplines in the humanities teach us. Don’t just take my word for it: there is a growing sense in the tech industry and in the business community that a degree in the humanities is becoming a hot commodity.

At Dartmouth, where I teach, the Spanish and Portuguese department no longer allows students to use Google Translate in class. Here is why: for all its Deep Learning, Google Translate’s approach to language remains superficial. Communicating across languages is more than a box with Text A and a box with Text B. There is nuance, connection, emotion, creativity. Learning a language is deep learning but on a human scale — a transformative process of personal and intellectual growth. When we trust automated translation to communicate for us, we are mistaking the tool — and a questionable one, at that — for its purpose.

Languages and the impulse to communicate make us human. True communication across languages and cultures does not result typically result in 15,000 eggs delivered to your door, though. If you want an example of what communicating across languages is like, instead of looking at your phone, look no further than Chloe Kim, the very human, proudly bilingual and bicultural gold-medalist snowboarder, or Wesley Studi, the Native American actor who closed off his speech at the Oscar ceremony in Cherokee.