Watch all the Transform 2020 sessions on-demand here.
How does Amazon’s Alexa know whether Mary Poppins refers to the book, soundtrack, or movie? With the help of artificial intelligence (AI), of course — specifically, the subfield of natural language processing. In a blog post today, Chengwei Su, a senior applied scientist in the Alexa AI Natural Understanding group, detailed a system that allows the AI models responsible for Alexa’s core domains — such as books, movies, and videos — to improve in accuracy independent of the others.
The research will be presented at the IEEE Spoken Language Technologies event in Athens, Greece later this month, and Su says the work is already in production.
“[Domain] models are trained on different data, so there’s no guarantee that their probability estimates are compatible,” he wrote. “Should a 70 percent estimate from Music be given priority over a 68 percent estimate from Books, or is it possible that, when it comes to Mary Poppins, the Music model is slightly overconfident?”
Here’s how it works: First, the domain models rank hypotheses according to their confidence scores, which are reranked according to weights — values signaling the importance of the input. Those weights are additionally used to classify utterances (i.e., voice commands) by the action the user wants to perform (intent) and the data item the intent is supposed to act upon (slot), factoring in contextual information such as device type. (On the Fire TV, for instance, more precedence is given to Video domain hypotheses than on voice-only speakers.)
June 5th: The AI Audit in NYC
Join us next week in NYC to engage with top executive leaders, delving into strategies for auditing AI models to ensure fairness, optimal performance, and ethical compliance across diverse organizations. Secure your attendance for this exclusive invite-only event.
“Within the Music domain, for instance, the utterance ‘play Thriller’ would probably call up the PlayMusic intent (as opposed to, say, the CreateList intent),” Su explains, “but the slot classifier might assign similar probabilities to the classifications of the word ‘Thriller’ as AlbumName and SongName.”
Su and colleagues fed the domain-specific reranking models both the domain classification probability for each utterance and the most probable intent and slot hypotheses in order to account for cases where the intent confidence might be more important than domain confidence. After sufficient training, it learned to produce separate weights for domain, intent, and slot.
“The advantage of this approach is that each domain can update its own weighting system whenever required, and multiple domains can perform updates in parallel, which is more efficient,” Su said.
The update comes a day after Alexa gained a bevy of new features, including the ability to set location-based routines and reminders, discover and call local businesses and restaurants via voice requests, sift through multiple email inboxes for important messages, and more. Just last week, Amazon’s Alexa team launched a self-learning system that “detects the defects in Alexa’s understanding and automatically recovers from these errors” without the need for human intervention, and a dialogue-driven music playlist feature that allows users to find new playlists through voice.
Also last week, Amazon debuted Alexa Answers, a feature that lets customers tackle uncommon questions by submitting answers that may be distributed to millions of Alexa users around the world.