Watch all the Transform 2020 sessions on-demand here.
Alexa can now deliver the news with the tenor and tone of a professional newscaster, thanks to a new artificial intelligence (AI) technique. Starting today for customers in the U.S., as first spotted by TechCrunch, Alexa will brief you on the day’s events and narrate snippets from Wikipedia with a “more natural,” contextually sensitive voice that emphasizes words and phrases in a human-like way.
To hear the new “newscaster” voice, try asking: “Alexa, what’s the latest?” And to listen to the voice read a snippet from a Wikipedia article, say a command like: “Alexa, Wikipedia Nick Jonas.”
“Just the way humans vary their way of speaking based on the situation, our new … technology enables Alexa to deliver the day’s news by adapting a different speaking style as compared to how she would sound when, for example, providing information from Wikipedia,” Amazon wrote in a blog post published this morning.
The underlying tech behind the improved voices is a text-to-speech (TTS) system that can learn to adopt a new speaking style from just a few hours of training. Traditional methods require hiring a voice actor to read in the target style for collective tens of hours.
June 5th: The AI Audit in NYC
Join us next week in NYC to engage with top executive leaders, delving into strategies for auditing AI models to ensure fairness, optimal performance, and ethical compliance across diverse organizations. Secure your attendance for this exclusive invite-only event.
Amazon’s neural TTS model (or NTTS for short), which was first described in a paper published late last year, consists of two components. The first is a generative neural network that converts a sequence of phonemes — perceptually distinct units of sound that distinguish one word from another, such as the p, b, d, and t in pad and pat — into a sequence of spectrograms, a visual representation of the spectrum of frequencies of sound as they vary with time. The second is a vocoder that converts those spectrograms into a continuous audio signal.
The end result? An AI model-training method that combines a large amount of neutral-style speech data with only a few hours of supplementary data in the desired style, and an AI system capable of distinguishing elements of speech both independent of a speaking style and unique to that style.
“The ability to teach Alexa to adapt her speaking style based on the context of the customer’s request opens the possibility to deliver new and delightful experiences that were previously unthinkable,” said Andrew Breen, senior manager with the TTS Research team at Amazon. “We’re thrilled that our customers will get to listen to news and Wikipedia information from Alexa in this new way.”
The debut of Alexa’s new voices come months after Amazon rolled out Whisper Mode on compatible smart home appliances and speakers. When it’s enabled, speaking to Alexa in a hushed tone triggers the assistant to whisper back.