Skip to main content

Alexa gains multilingual mode, celebrity voices, and frustration detection

Amazon VP David Limp onstage
Amazon VP David Limp onstage
Image Credit: Khari Johnson / VentureBeat

Watch all the Transform 2020 sessions on-demand here.


Alexa is gaining more humanlike speech — and learning to understand multiple languages at once. Starting today, Amazon’s launching a multilingual mode that can automatically detect both Spanish and English in the U.S., French and English in Canada, and Hindi and English in India. Alexa will respond with content and skills appropriate for the recognized language.

It’s akin to the multilingual mode Google rolled out for Google Assistant in August. Currently, select Google Assistant devices can speak a combination of English, French, German, Italian, Japanese, and Spanish, with more languages to come.

Additionally, Amazon is rolling out celebrity voices generated by an AI model — starting with Samuel L. Jackson. He’ll tell you jokes, let you know if it’s raining, set timers and alarms, play music and more, according to the tech giant. There’s two versions of his voice — explicit and non-explicit — both of which will roll out later this year and cost $0.99 for a limited time.

Neural Text-To-Speech is to thank for the celebrity voices. It’s a new feature that came to Alexa several months ago, and more recently to Polly, AWS’ cloud service that converts text into speech. Basically, it improves speech quality by increasing naturalness and expressiveness.


June 5th: The AI Audit in NYC

Join us next week in NYC to engage with top executive leaders, delving into strategies for auditing AI models to ensure fairness, optimal performance, and ethical compliance across diverse organizations. Secure your attendance for this exclusive invite-only event.


Amazon detailed its work on Neural Text-To-Speech in a research paper late last year (“Effect of data reduction on sequence-to-sequence neural TTS“), in which researchers described a system that can learn to adopt a new speaking style from just a few hours of training — as opposed to the dozens of hours it might take a voice actor to read in a target style. The end result? An AI system capable of distinguishing elements of speech both independent of a speaking style and unique to that style.

Those aren’t the only improvements coming down the pipeline. The Alexa wake word engine — the engine that’s available to all of Amazon’s partners — is now 50% more accurate. Concretely, it misunderstands names less frequently.

Limited emotion detection for Alexa is another new addition to Alexa’s raft of features. Starting early next year in the music domain, when Amazon’s voice assistant detects frustration in a customer’s voice as a result of a mistake that it made, it’ll apologetically offer an alternative action (i.e., offer to play a different song).

“As customers continue to use Alexa more often, they want her to be more conversational and can get frustrated when Alexa gets something wrong,” wrote Amazon in a press release. “To help with this, we developed a deep learning model to detect when customers are frustrated, not with the world around them, but with Alexa. And when she recognizes you’re frustrated with her, Alexa can now try to adjust, just like you or I would do.”

Alexa’s emotional intelligence project has been in the works for years now. Chief scientist for Amazon’s Alexa AI Rohit Prasad told VentureBeat in 2017 that Amazon was beginning to explore emotion recognition AI, but only to sense frustration in users’ voices, and this appears to be the fruit of that labor.