The idea of talking conversationally to computers has been a long time in the works. Science fiction is so often a self-fulfilling prophecy as it provides a vision for humans to chase after with technological innovation. For those of us who have watched voice-based computer interactions evolve, we have seen it go through many manifestations as it grew up. We now find ourselves in a world where using voice to interface with a computer is commonplace on a regular basis for the masses. While I’m not quite confident we have reached an inflection point, I am confident we are at least on the cusp of one with voice-based user interfaces and the vision of the Hal 9000 (The AI assistant of Arthur C. Clarke’s Space Odyssey series) and Jarvis (the voice based AI assistant of Iron Man).
In anticipation of this and the many other “voice first” based products and experiences we believe will come to market in 2016-2017, we sought to do a quantitative study of Amazon’s Echo, Apple’s Siri, and Google’s Ok Google. We conducted two separate studies in early May, since our intuition told us voice would be a major theme of Google I/O and at Apple’s upcoming WWDC. We focused the Amazon Echo study on our early adopter panel since we knew we would not get a statistically significant number of Echo owners in our mainstream representative US consumer panel. We collaborated on the Amazon Echo study with my friend Aaron Suplizio (@aaronsuplizio) from Experian. Experian is also studying how the Echo is being used, specifically in the context of conversational commerce. (Experian didn’t pay us to do the study but did cover the costs for the raffle where two respondents won a $100 gift card.)
The second study was focused on our mainstream consumers to understand how they use Siri and Ok Google (or any Google voice-based search technique) to better learn how both are used and what the overall perception of each is by mainstream consumers. I’ll start by sharing what we learned about the Amazon Echo.
Amazon Echo and the Voice First User Interface
By spreading our study across 1300 early adopters, we found 13.86% of the panel owned an Amazon Echo. It came as no surprise to us the overwhelming majority of Echo owners also owned an iPhone (83.72%) as iPhone owners at large tend to show more early adopter tendencies vs. Android owners. What was most enlightening, in contrast to the Siri and Google voice study, was how different usage of the Echo was vs. Siri and OK Google. This was interesting both in terms of location of usage but also most common tasks.
We first wanted to understand where the Echo is used in most consumers’ home (we had a hunch it was either the kitchen or the living room). As you can see, the kitchen has the edge on the living room with 51% of consumers saying they have their Echo in the kitchen.
Given the type of things the Echo does, and perhaps in alignment with Amazon’s goals in delivering services to consumers via the Echo, knowing the primary usage room is important. Particularly because it is likely that the things we ask of our voice assistants may vary based on the context of the room or physical location we are in. For example, asking the Echo to turn the TV on is less relevant as a primary task unless the Echo is in the living room. We can certainly make the case someday voice assistants will be available at all times in all rooms. Click to enlarge the graph.
We followed this question by asking respondents to choose the top two things they do most often with their Echo. The top three most common use cases done regularly were: play a song (34%), control smart lights (turn on/off lights) 30%, and set a timer (24%). A few quick thoughts on Echo usage.
Playing a song as the top use case is not surprising given the product is positioned as a smart speaker. Bluetooth speakers have actually sold well at retail. The idea of having portable sound around the house is compelling for consumers. It also makes sense as the entry point for a smart voice assistant given the need for a speaker, microphone and accompanying components for microphone arrays and noise cancelling tech for better speech recognition. Controlling the lights is, in my opinion, a solid indicator of voice controlled smart home technologies which will someday become commonplace. As our homes get smarter, it makes sense that the way we will interact with our smart objects is through voice. It may be the catalyst to drive the true smart automated home into the masses.
In terms of overall satisfaction from Echo owners, most were satisfied with the overall product but satisfaction ranked highest when we asked specifically about the voice recognition capabilities of the Echo. Owners felt it delivered on recognizing what they were saying and performing the task they asked of it. This has a lot to do with the Echo’s microphone tech and noise cancelling capabilities as well as its connection to persistently good broadband which is often where Siri and Ok Google break down when trying to use while driving and/or operating in areas of poor quality service in mobile broadband networks.
Only 13% of Echo owners stated they noted declining usage since they acquired it. The top reason listed by those using it less was “the novelty of using my voice is wearing off”.
Understanding how Siri and Ok Google Are Used
Perhaps the most important observation we came away with from our study was Siri is the most used voice-based user interface. In our mainstream panel of 518 consumers (44% iPhone owners, 40% Android owners, 2% Windows Phone or Blackberry, 13% don’t own a smartphone), 65% indicated they had used either Siri, Google’s “Ok Google or voice search,” or Microsoft’s Cortana. Of all three, only 21% had never used Siri. Which compares to 34.8% who have never used Google’s voice solution, and 72% who have never used Microsoft’s Cortana. More consumers across the spectrum of operating systems (iOS, Android, Windows) have used Siri than any other voice UI. I credit the success of Apple’s iPad as assisting with this observation since many Android phone owners, non-smartphone owners, and Windows PC owners have iPads as well.
Looking at how they used each voice UI, we see for the most part people use Siri and OK Google/Voice search in the same ways on their smartphones. Contrasting these common usages against those of Echo, we see the distinct differences having a voice user interface to a communications device like a smartphone differs from one that is stationary in the home and positioned as a smart hub vs. a personal computing product like a smartphone, PC, or tablet.
Search is the most common task done on smartphones or tablets using Siri or OK Google/Google Voice. Google announced at Google I/O that 20% of all Google search queries are now done by voice. Looking at the data, we can conclude more voice search queries are done with Siri than with Google’s voice-based search. When I look at these most common tasks, they strike me as fairly basic. Which is an important observation to understand given where the market is today. These most common tasks may be simply because the products are still somewhat limited in their capabilities but could also be because they are the ones that work the best and most consistently.
Overall satisfaction with the voice recognition of Siri and OK Google/Google Voice search was relatively similar and only different slightly from the grades iPhone owners gave Siri and Android owners gave OK Google/Google voice search. Both were also below 80% which is not bad for where these technologies are today. The Echo’s voice recognition capabilities did yield higher satisfaction rates than both Siri and OK Google/Google voice search but I interpret that due to the technological variables of being stationary, having better noise cancellation, and a persistent high bandwidth connection to the internet. All things that are variables which impact the experience of voice-enabled user interfaces.
Finally, context of location usage for voice-based user interfaces is another important factor to understand. For those who use Siri or OK Google/Google voice search most regularly, the primary location is the car with 51% of consumers saying this is their primary location to use voice-enabled actions. The home was second with 39%. From a cultural perspective, it should come as no surprise that both these locations offer an element of privacy which is why only 6% of respondents said they commonly use Siri or OK Google/Google Voice in public.
Going Forward
I walked away from this study with confidence the voice user interface has gone mainstream. What’s more, mainstream consumers seem to recognize the value and convenience with them. Consider these statements from consumers:
- It does not always work but when it does it is very useful – 55% Strongly agree
- I would use my devices voice capabilities more if I could speak to it more naturally – 43% strongly agree
- If it worked more often, I would use my devices voice assistants more – 48% strongly agree
- I want my device’s voice interface to integrate better with more devices and apps that I use regularly – 66% strongly agree
- I am not comfortable speaking to my technology – 41% strongly DISAGREE
It is encouraging, from a sentiment perspective, that voice looks to be a natural extension of our keyboard/mouse/touch-based input and output methods. Consumers seem to recognize the value and desire for it to work in more ways. I’ve long said the true test of a great feature very early in its life cycle is when it combines both delight and frustration. Once you use it, you’re hooked but you want it to be great all the time because you can see the potential. This is why we snuck this question into the sentiment segment to see if consumers agreed and 47% strongly agree and 38% somewhat agree that, when their voice assistant works, it is great and, when it doesn’t, they get irritated.
The battle for the voice-based assistant is on. This is another area where the one with the biggest ecosystem built around their Voice UI/Voice OS has the best shot of being “hired” by the masses.
We appreciate all our panelists and their willingness to share their thoughts on consumer technology products. If you are interested in participating in our consumer studies, please click here.
Given that your findings show Siri is the “most used voice-based user interface”, and that Siri gets similar marks on satisfaction compared to other voice UIs, it would seem Apple is well positioned for the next stage of voice UI (especially with the rumors of an upgraded Siri coming soon, and Apple’s recent AI acquisitions).
Color me surprised, few around me seem to use that regularly, not even occasionally, most not at all.
Interesting though, thanks. I’ll give it another shake myself (I’ve got a terrible French accent though, don’t know if I can keep my phone in English w/ a French voice recog ^^).
Well since most use it at home or in car it stands to reason you wouldn’t see usage.
I noticed you’ve got a typo here “only 6% of respondents said they commonly use Siri or OK Google/Google Voice in pubilc.” Should be “public”.
I mean people I’m close to, I’d know about it. I’ll double-check.
Well, again, there were people who have not used it yet. But I’d say 66% of people who have/do use it counts as mainstream.
Fully agreed with that conclusion, and I don’t doubt your figures.
I’m just wondering if there’s some geographical factor. Siri et all took a bit to be translated to French, and lifestyles vary… Or my inner circle may be deeply unrepresentative (it is, but I love using it for in-depth observation anyway ^^)
No question there will be cultural and regional differences. That part is harder since we can’t go deep on every country but generically we are seeing the same high-level patterns.
Perhaps you missed this part of the article: “only 6% of respondents said they commonly use Siri or OK Google/Google Voice in pubilc.” You’re not going to see much usage ‘in the wild’ so to speak.
I did the same survey within the people I know in India…70% people (age < 18) are not using either siri or Google "ok Google"…
I use dictation (the microphone button on the virual keyboard) reasonably often. I use Siri hardly at all, except for a lark. The reason lies mainly with the extremely limited number of queries that Siri can understand. It doesn’t help that many of the queries that it does understand (Baseball, skiing conditions, etc) are of absolutely zero interest to me.
A second reason that siri seldom gets used around here is that the data that it taps to provide answers for those few queries that it does understand, is seldom the best data. For instance, when I ask it what the weather forecast is, it shows me results from Yahoo weather. But that seems to be based on weather at Pearson Airport (ie, the western suburbs), which is quite far from here, rather than the (far more relevant) weather for downtown Toronto. It’s also quite coarse grained compared to the forecasts provided by Environment Canada or by theWeatherNetwork.ca.
In short, I find the “local” weather forecast for Toronto selected by some American intern at yahoo to be extremely un-useful to me compared to the weather forecasts provided by Canadian sources. It’s almost certain that the same problems would exist for whatever weather data Google provides on their phone in response to queries.
It would be interesting to compare what people use some of these assistants for (versus what they can do). I want to use Siri more around skipping music tracks or some other basic commands, but half of the time I get an error message. In a lot of senses I have seen that Apple’s own lock-down of the functions is more limiting than what the user wants
I am still not convinced that we are even close to a tipping point. If this was targeted at early-adopters (ie not the mainstream) and even still 59% of users are not comfortable speaking to their devices (the opposite of your last data point), then it makes me think it still may end up as a niche area (other than perhaps the car where their are serious safety issues and the occasional short-cut at home where it takes fewer steps)
The guys at voice.io were doing some fine stuff with voice processing and recognition — last time I checked. They were a startup with lots of see capital but as an older guy I am not much into talking to computers. I cant stand it when I have to speak “yes” and “no” or state my account number to a telephone system for customer service. I could see the kids using voice as a user interface these days but us older folks – probably not so much.
This post post made me think. I will write something about this on my blog. Have a nice day!!
You have observed very interesting points! ps nice internet site. “Where can I find a man governed by reason instead of habits and urges” by Kahlil Gibran.
I really like reading through a post that can make men and women think. Also thank you for allowing me to comment!
Nice post. I study one thing more difficult on totally different blogs everyday. It would at all times be stimulating to read content from other writers and observe somewhat something from their store. I’d choose to use some with the content material on my weblog whether or not you don’t mind. Natually I’ll provide you with a link in your net blog. Thanks for sharing.
I have recently started a site, the information you provide on this website has helped me tremendously. Thank you for all of your time & work.
magnificent points altogether, you just gained a brand new reader. What would you recommend in regards to your post that you made some days ago? Any positive?