Amazon improves Alexa's ability to recognize conversation topics by 35%

Watch all the Transform 2020 sessions on-demand here.

As any HomePod, Google Home, or Echo owner can tell you, getting a smart speaker to understand you — much less suss out the topic of a conversation — is usually a crapshoot. But encouragingly, researchers at Amazon are progressing toward more responsive, contextually aware voice experiences, in part thanks to “topic modeling” — i.e., identifying topics to help route requests more accurately.

In new research, they developed a prototypical system that can boost Alexa’s topic recognition by up to 35 percent. It’s described in a paper that’ll be presented at the IEEE Spoken Language Technologies conference in Athens, Greece in late December.

“[Our] system uses two additional sources of information to determine the topic of a given utterance: the utterances that immediately preceded it and its classification as a ‘dialogue act’,” Behnam Hedayatnia, an applied speech scientist at Amazon, wrote in a blog post.

To validate the AI system, the researchers used more than 100,000 annotated voice requests collected during the 2017 Alexa Prize competition, which tasked 15 teams with deploying Alexa chatbot systems. Annotators labeled the training data with one of 14 dialogue acts and 12 topic labels — such as Politics, Entertainment/Movies, Fashion, and Entertainment/Books — and noted the keywords in commands that helped them identify topics. (For instance, “brand” and “Italy” in “Gucci is a famous brand from Italy.”)

June 5th: The AI Audit in NYC

Join us next week in NYC to engage with top executive leaders, delving into strategies for auditing AI models to ensure fairness, optimal performance, and ethical compliance across diverse organizations. Secure your attendance for this exclusive invite-only event.

The topic-modeling system comprises three AI architectures: (1) a deep averaging network (DAN), (2) a variation on the DAN that learns to predict keywords indicated in topics, and (3) a bidirectional long-short-term memory (LSTM) network. Bidirectional LSTMs are a category of recurrent neural network capable of learning long-term dependencies; they allow the neural networks to combine their memory and inputs to improve prediction accuracy.

Inputs to all three networks consist of a voice command, a dialogue act classification, and a conversational context — in other words, the last five turns in a conversation, where a turn is a combination of a speaker’s request and a chatbot’s response.

The DAN produces embeddings — mathematical representations — of words, and subsequently of sentences, by averaging the word embeddings. Those sentence embeddings are averaged together again to produce a single summary embedding, which is appended to the embedding of the current voice command and passed to a neural network that learns to correlate embeddings with topic classifications.

The ADAN, meanwhile, builds a matrix that maps every word it encounters against each of the 12 topics it’s asked to recognize and records how often annotators correlated a particular word with a particular topic. It simultaneously embeds words from the current voice command and past commands and averages them together before averaging the averages.

In the end, each word has 12 numbers associated with it — a 12-dimensional vector — indicating its relevance to each topic. The vectors associated with words from current voice summaries are combined with vectors from past summaries and passed to the neural network for classification.

In testing, four versions of the system improve voice recognition accuracy over the baseline. One configuration achieved accuracy of 74 percent, up from 55 percent for baseline.

Daily insights on business use cases with VB Daily

If you want to impress your boss, VB Daily has you covered. We give you the inside scoop on what companies are doing with generative AI, from regulatory shifts to practical deployments, so you can share insights for maximum ROI.

Read our Privacy Policy

Thanks for subscribing. Check out more VB newsletters here.

An error occured.

The AI insights you need to lead