Watch all the Transform 2020 sessions on-demand here.
Distinguishing between the relevant and irrelevant bits of a conversation is a good life skill in general, but for voice assistants like Amazon’s Alexa, it’s indispensable. In order to respond appropriately to what’s being said — about anything from the weather to a nearby restaurant or a package in transit — they need to know whether the subject at hand is beyond their knowledge scope.
Researchers at Amazon tackled the problem with a natural language understanding (NLU) system that simultaneously recognizes in-domain (known) and out-of-domain (unknown) topics. The results will be presented at this year’s Interspeech conference in Hyderabad, India in early September.
“Sometimes … an Alexa customer might say something that doesn’t fit into any domain,” Yong-Bum Kim, a scientist within Amazon’s Alexa team and a lead author on the paper, wrote in a blog post. “It may be an honest request for a service that doesn’t exist yet, or it might be a case of the customer’s thinking out loud: ‘Oh wait, that’s not what I wanted.’ If a natural-language-understanding (NLU) system tries to assign a domain to an out-of-domain utterance, the result is likely to be a nonsensical response.”
The team began by assembling two datasets comprising utterances (i.e., voice commands): one covering 21 different domains and the other sampled from 1,500 frequently used Alexa skills.
June 5th: The AI Audit in NYC
Join us next week in NYC to engage with top executive leaders, delving into strategies for auditing AI models to ensure fairness, optimal performance, and ethical compliance across diverse organizations. Secure your attendance for this exclusive invite-only event.
When it came to choosing a model, they settled on a bidirectional long short-term memory (Bi-LSTM) architecture that (1) factored in the order in which the utterances were received and (2) considered the data sequences both forward and backward. They fed it both “word-level” and “character-level” information — specifically embeddings, or points in a 100-dimensional space that represent words — and the words’ constituent characters
The neural network produced a vector summary of useful individual character features, which the team combined with the aforementioned embeddings before passing them to a second Bi-LSTM. This one learned to recognize the summary of the entire input.
On average, the researchers’ system improved classification accuracy by 6 percent for a given target. And they achieved dramatically better results when they trained the system on the 21-domain dataset: 90.4 percent accuracy compared to the existing system’s 83.7 percent.
“By using a training mechanism that iteratively attempts to optimize the trade-off between those two goals, we significantly improve on the performance of a system that features a separately trained domain classifier and out-of-domain classifier,” Kim wrote. “[The] domain classification makes … determinations [such as the actions that a customer wants executed] much more efficient … by narrowing the range of possible interpretations.”