Skip to main content

Amazon explains how Alexa learns new languages

Amazon Alexa
Image Credit: Khari Johnson / VentureBeat

Watch all the Transform 2020 sessions on-demand here.


Amazon’s Alexa assistant recently learned to speak new languages globally: Hindi, U.S. Spanish, and Brazilian Portuguese. Synthetic data aided substantially in this, explained Amazon senior manager for research science Janet Slifka in a post on the Alexa blog this morning, but it wasn’t the end-all-be-all solution. The languages required new bootstrapping tools.

One of the tools in question was developed by Amazon’s Alexa AI Applied Modeling and Data Science group and uses a technique called “grammar induction” to analyze “golden utterances” (i.e., canonical examples of customer requests proposed by Alexa feature teams) and produce a series of expressions that can generate similar sentences. The other — “guided resampling” — creates novel sentences by recombining words and phrases from examples in the available data, with an emphasis on optimizing the volume and distribution of the sentence types.

Slifka notes that when a new-language version of Alexa is under active development, teams compile training data for the systems that suss out customers’ intents. A portion comes from existing languages translated by AI models, while the rest is typically drawn from crowd workers and Cleo, an Alexa voice app that tasks customers with supplying answers to prompts.

A grammars system taps a technique known as Bayesian model merging to generate a representative grammar, or a set of rewrite rules for varying basic template sentences through word insertions, deletions, and substitutions. Normally, the process might take a computational linguist a day, given 50 golden utterances, but the tool shortens the process to seconds by identifying patterns in lists of utterances and using them to produce upwards of 100 candidate rules for thousands of templates. For instance, if two words (say, “pop” and “rock”) consistently occur in similar syntactic positions but the phrasing around them varies, it might suggest a candidate rule that  “pop” and “rock” are interchangeable in some contexts.


June 5th: The AI Audit in NYC

Join us next week in NYC to engage with top executive leaders, delving into strategies for auditing AI models to ensure fairness, optimal performance, and ethical compliance across diverse organizations. Secure your attendance for this exclusive invite-only event.


Helpfully, the grammar system can automatically determine which rules account for the most variance in the sample data (without overgeneralizing), which become eligible variables in further iterations of the process. As an added bonus, it’s able to take advantage of existing Alexa catalogs of frequently occurring terms or phrases. For example, if the golden utterances were sports-related and it determined that the words “Celtics” and “Lakers” were interchangeable, it would conclude that they were also interchangeable with “Warriors,” “Spurs,” “Knicks,” and all the other names of NBA teams known to Alexa.

As for the guided-resampling tool, it similarly uses catalogs and existing examples to augment natural language understanding training data. Specifically, it generates additional training samples by swapping out elements in an utterance — for instance, “play Justin Bieber” and “can you play a song by Camila Cabello?” — using what’s known as the Jaccard index to evaluate pairwise similarity between the contents. (The Jaccard index measures the overlap between two sets — in this case, contents in different types of requests.) The result is a system that produces proportionally larger training sets for more complex utterance data patterns, which Slifka notes helps AI models achieve higher performance.

“Alexa is always getting smarter, and these and other innovations from AMDS researchers help ensure the best experience possible when Alexa launches in a new locale,” she wrote.