Watch all the Transform 2020 sessions on-demand here.
Artificial intelligence and machine learning now touch our lives in more ways than we can possibly imagine, but one particularly tangible, ubiquitous example is the digital assistant that lives in your smartphone, smart speakers, tablet, and computer. Whether you say “Hey Google,” “Hey Siri,” or “Alexa,” you’re conjuring up an advanced collection of AI and ML tools designed to listen to you, understand you, and do what you ask.
None of these digital assistants is perfect, and even after years of interaction with people, they each have some non-trivial issues to address this year. This week, three VentureBeat writers are spotlighting 10 important issues with the digital assistant they use most. Read about Google Assistant here, Apple’s Siri here, and Amazon’s Alexa below.
1. Conversational music requests
The fine folks at the Ambient point out that Alexa is much less forgiving when it comes to music-queuing commands than Google Assistant, and they make a compelling point. Of all the tasks smart speakers deftly handle, playing tunes should be the one at which they excel. (According to a recent survey published by Voicebot, 38.2% of smart speaker owners say they stream music on their device.)
But shout “Alexa, play the newest Linkin Park album,” and the assistant will promptly (but politely) inform you that One More Light isn’t available in Amazon Music without an Amazon Music Unlimited subscription — even if you’ve selected Spotify as the default service. It’s pretty frustrating.
June 5th: The AI Audit in NYC
Join us next week in NYC to engage with top executive leaders, delving into strategies for auditing AI models to ensure fairness, optimal performance, and ethical compliance across diverse organizations. Secure your attendance for this exclusive invite-only event.
To be fair, Amazon has made quite a few improvements to the music discovery and playback experience on Alexa in recent months. Earlier this year, it rolled out Song ID, which lets you ask Alexa to announce the title and artist name before each song plays. In December, Alexa gained a dialogue-driven Amazon Music search feature that lets you find new playlists through voice. Coinciding with that rollout, Alexa became better tailored to individual tastes, now considering play count, era, favorite genres, and other factors when you ask to hear a track. You can also tell Amazon’s voice assistant to never play a particular song again.
But personally, we’re holding out hope for greater customizability.
2. Podcast management
Last year, Google rolled out a bunch of back-end and front-end features to beef up Google Assistant’s podcast management, like a new API and the Google Podcasts app. Amazon has lagged behind it in this regard. While Alexa will happily recommend a podcast from radio app TuneIn’s library, asking it to play something specific — even if the thing in question is reasonably popular — usually yields a disappointing “Sorry, I don’t know that one.”
Sure, nothing’s preventing you from diving into the Music & Books section of the Alexa companion app’s menu, selecting your podcast library of choice, and searching for the series you’d like to subscribe to. Third-party skills like AnyPod and PocketCasts are another great alternative — the former can even list your podcast subscriptions and play specific episodes. But that shouldn’t excuse Alexa’s utterly lacking bone-stock system, which frequently has trouble recognizing podcast titles, doesn’t offer a way to subscribe to podcasts, and tends to play serialized series out of order.
Amazon is evidently aware of Alexa’s shortcomings here — it recently added support for podcasts to Alexa Routines (albeit through the TuneIn skill). But it’s got a long way to go to catch up with the competition.
3. IFTTT actions
Ever heard of IFTTT? It’s a nifty web service that lets you chain cross-platform services, devices, and apps together with conditional statements called applets. It’s pretty easy to get the hang of — and largely free! — but IFTTT is weirdly constricted on Alexa. Only Alexa devices can be used as a trigger, meaning that you can’t flip on an internet-connected light switch and have Alexa play a sound bite, for example, or have an automatic garage door opener notify folks at home via Alexa that someone has arrived.
There’s a workaround for notifications in the form of Notify Me (a third-party skill), but it would be nice if Alexa’s native IFTTT integration was a tad more robust.
4. Command shortcuts
Last year, Google rolled out a nifty feature to Google Home. Called Shortcuts, it lets you shorten complicated commands to a memorable, succinct phrase. For instance, “OK, Google, show me photos of my dad from 2017 in New York City” could become “OK, Google, show me NYC trip photos.” It’s a massive time-saver.
Amazon’s voice assistant offers no such functionality, unfortunately — short of programming an Alexa Routine with a custom phrase. There’s a workaround — you can program an IFTTT phrase to trigger actions — but those actions are limited to third-party devices and services, so things like queuing up songs from Amazon Music or beaming pictures to a Fire TV Stick are a no-go.
5. Peer-to-peer payments
Why can’t you pay your pals with Alexa? It’s probably not a question that’s top of mind for most folks, but during those late nights when pulling out your phone and launching Venmo seems like too much of a hassle, it would be a godsend. The framework is in place, since Echo speakers already let you purchase things from Amazon’s sprawling marketplace.
Interestingly, the Wall Street Journal last year reported that Amazon planned to dabble in peer-to-peer payments with a service that would rival offerings from Visa, PayPal, and Mastercard. Users would be able to connect their bank accounts and send payments to each other via Alexa — and perhaps even pay for things like gas with their voice in cars equipped with Alexa.
Those plans never came to fruition, but hey — better late than never, right?
6. Realistic voices
Wonder why Google Assistant sounds so realistic (on the spectrum of voice assistants, at least)? It’s principally because it taps cutting-edge, AI-driven text-to-speech (TTS) systems like Tacotron 2, which builds voice synthesis models based on spectrograms, and WaveNet, which builds models based on waveforms. Thanks to this and other behind-the-scenes techniques, Google is able to generate relatively human-sounding voices comparatively cheaply and quickly.
Amazon is not quite there yet, but it’s making progress. In November, Alexa speech scientists described in a series of papers a (TTS) system that can learn to adopt a new speaking style, such as that of a newscaster, after just a few hours. The fruit of their labor rolled out in January, when Amazon introduced a “more natural,” contextually sensitive voice speaking style for its news briefings and Wikipedia snippets feature. (To hear it, try asking: “Alexa, what’s the latest?” or say: “Alexa, Wikipedia Nick Jonas.”) Also worth noting? Alexa has a Whisper Mode that, when enabled, triggers it to whisper back when commanded in a hushed tone.
But unlike Google’s recently debuted six new languages for the Assistant, Alexa’s accents and dialects sound just as robotic as the default — at least for now.
7. Support more languages
Alexa might be available on over 150 products in 41 countries, but it understands the fewest languages of any voice assistant. Currently, it knows five: English (Australia, Canada, India, U.K., and U.S.), French (Canada, France), German, Japanese (Japan), and Spanish (Mexico, Spain). That’s compared with Cortana’s roughly eight languages and dialects, Siri’s 21, and Google Assistant’s more than 30.
The situation has improved somewhat. When Alexa came to India last year, Amazon launched it with an “all-new English voice” that understood and could converse in local pronunciations, and the company is bootstrapping expanded language support through crowdsourcing. Last year, it released Cleo, a gamified skill that rewards users for repeating phrases in local languages and dialects like Mandarin Chinese, Hindi, Tamil, Marathi, Kannada, Bengali, Telugu, and Gujarati.
Alexa is nowhere close to its rivals in the language department, though. And that’s a shame.
8. Texting and group calls
Alexa does have a messaging feature in Alexa Messaging, which lets you exchange transcribed (or typed) texts among friends through Alexa-enabled devices and the Alexa companion app for Android and iOS. And Amazon rolled out support for SMS in last January. But good luck sending an MMS or group message — Alexa doesn’t play nicely with either format yet. Neither do other voice assistants, to be fair, but with the popularity of smart displays on the rise (the market is expected to grow 119% in 2019, according to Strategy Analytics), more robust visual messaging seems like a natural extension of Alexa’s current capabilities.
While the Alexa team is at it, why not add support for conference calls? Amazon’s Echo speaker lineup facilitates voice messaging and calls between Alexa-enabled devices, as well as free landline and mobile calls to the U.S., Canada, and Mexico (with the exception of emergency numbers, dial-by-letter numbers, and premium-rate numbers), but it can’t conference more than one Alexa-enabled device or phone at the same time. Juggling a conference call with voice commands probably wouldn’t be the easiest thing in the world, but it’d be nice to have the option.
9. General knowledge
Alexa can rattle off a few esoteric facts about people, places, and things, but not as many as the competition. In an experiment conducted by Loup Ventures last year, Google Assistant was able to answer 88% of 800 questions correctly, versus Apple’s Siri at 75%, Alexa at 73%, and Cortana at 63%. (Alexa did, however, show an improvement of 11.6% points from a July test.) Alexa needn’t be an all-knowing oracle, but we’d welcome a more knowledgeable voice assistant.
Fortunately, Amazon has made a concerted effort to supply Alexa with reputable new sources of information. This past summer, Amazon began sourcing hours of operation, descriptions, and addresses from Yext, a digital knowledge management platform that counts Taco Bell, Arby’s, Marriott, and Rite Aid among its clients. December 2018 saw the launch of Alexa Answers, an invitation-only program that allows people to submit answers to questions that, if approved, are distributed to the millions of Alexa users around the world. And last summer Amazon introduced Answer Updates, a feature that notifies users who stump the assistant when Alexa “learns” the answer.
10. Offline answers
Alexa has been accused of spying on households countless times, and while none of the reports so far have had serious merit, the stories’ virality speaks to a growing anxiety concerning the data companies like Amazon collect about their customers. That’s why an offline mode — a basic, no-frills voice assistant mode that wouldn’t require an internet connection or rely on server-side processing — would seem a ripe fit for Alexa’s next upgrade.
Not possible, you say? Au contraire. Paris-based Snips, which recently raised $13 million in venture capital, is developing a “decentralized” voice assistant that prioritizes privacy above all else. It carries out machine learning in the edge and can perform tasks like checking a calendar, playing locally stored music, and controlling smart home devices.
Perhaps tellingly, Amazon itself has experimented with offline voice recognition. Last August, it detailed a complex voice recognition model that works offline.
It’s a given, of course, that any offline incarnation of Alexa wouldn’t be quite as fully featured as its internet-connected counterpart. But providing the option would build a lot of goodwill.