A great deal has been written recently on the growing importance of voice-driven computing devices, such as Amazon’s Echo, Google Home and others like it. At the same time, there’s been a long-held belief by many in the industry that software innovations will be the key drivers in moving the tech industry forward (“software eats the world”—as VC Marc Andreesen famously touted over 5 years ago).
The combination of these two—software for voice-based computing—would, therefore, seem to be at the very apex of tech industry developments. Indeed, there are many companies now doing cutting-edge work to create new types of software for these very different kinds of computing devices.
The problem is, expectations for this kind of software seems to be quickly surpassing reality. Just this week, in fact, there were several intriguing stories related to a new study which found that usage and retention rates were very low for add-on Alexa “skills”, and similar voice-based apps for the Google Assistant platform running inside Google Home.
Essentially, the takeaway from the study was that outside of the core functionality of what was included in the device, very few new add-on apps showed much potential. The implication, of course, is that maybe voice-based computing isn’t such a great opportunity after all.
While it’s easy to see how people could come to that conclusion, I believe it’s based on an incorrect way of looking at the results and thinking about the potential for these devices. The real problem is that people are trying to apply the business model and perspective of writing apps for mobile phones to these new kinds of devices. In this new world of voice-driven computing, that model will not work.
Of course, it’s common for people to apply old rules to new situations; that’s the easy way to do it. Remember, there was a time in the early days of smartphones when people didn’t really grasp the idea of mobile apps, because they were used to the large, monolithic applications that were found on PCs. Running big applications on tiny screens with what, at the time, were very underpowered mobile CPUs, didn’t make much sense.
In a conceptually similar way, we need to realize that smart speakers and other voice-driven computing devices are not just smartphones without a screen—they are very different animals with very different types of software requirements. Not all of these requirements are entirely clear yet—that’s the fun of trying to figure out what a new type of computing paradigm brings with it—but it shouldn’t be surprising to anyone that people aren’t going to proactively seek out software add-ons that don’t offer incredibly obvious value.
Plus, without the benefit of a screen, people can’t remember too wide a range of keywords to “trigger” these applications. Common sense suggests that the total possible number of “skills” that can be added to a device is going to be extremely limited. Finally, and probably most importantly, the whole idea of adding applications to a voice-based personal assistant is a difficult thing for many people to grasp. After all, the whole concept of an intelligent assistant is that you should be able to converse with it and it should understand what you request. The concept of “filling in holes” in its understanding (or even personality!) is going to be a tough one to overcome. People want a voice-based interaction to be natural and to work. Period. The company that can best succeed on that front will have a clear advantage.
Despite these concerns, that doesn’t mean the opportunity for voice-based computing devices will be small, but it probably does mean there won’t be a very large “skills” economy. Most of the capability is going to have to be delivered by the core device provider and most of the opportunity for revenue-generating services will likely come from the same company. In other words, owning the platform is going to be much more important for these devices than it was for smartphones, and companies need to think (and plan) accordingly.[pullquote]Existing business models and existing means for understanding the role that technologies play don’t always transfer to new environments, and new rules for voice-based computing still need to be developed.”[/pullquote]
That doesn’t mean there isn’t any opportunity for add-ons, however. Key services like music streaming, on-demand economy requests, and voice-based usage or configuration of key smart home hardware add-ons, for example, all seem like clearly valuable and viable capabilities that customers will be willing to add on to their devices. In each of those cases, it’s also important to realize that the software isn’t likely going to represent a revenue opportunity of its own; simply a means of accessing an existing service or piece of hardware.
New types of computing models take years to really enter the mainstream, and we’re arguably still in the early innings when it comes to voice-driven interfaces. But, it’s important to realize that existing business models and existing means for understanding the role that technologies play don’t always transfer to new environments, and new rules for voice-based computing still need to be developed.
7 thoughts on “Voice Drives New Software Paradigm”
Long term I don’t see non-mobile voice solutions beating mobile voice solutions. The play for Amazon will be to put Alexa in lots of other things, gaining a sort of mobility. But I think it is obvious that mobility has won the day, and voice tech needs to walk that path.
Looking at the current mobile app economy, App Annie for example has estimated that the mobile app economy is essentially equivalent to the mobile game economy. Games constitute the vast majority of App Store revenue. Demand for non-game mobile app developers is very high, but their salary is paid from a business for which the mobile app is just a customer interface and not the whole widget. Google, Facebook, Uber, Snapchat, Spotify, etc. all use their mobile apps as a customer interface, but the apps themselves do not earn revenue.
Therefore unless Voice UI is good for doing games, the chance of a self-sustaining “skills” economy is very slim, regardless of whether AI becomes good at filling in holes in comprehension.
Also, I think regardless of games, Voice probably needs to be proactive (not totally, but in good part), which is a whole level more complex than smartphones’ reactive model: not only generating proactive stuff, but not crossing the line from proactive to pestering (huh, darling ? ;-p)
I won’t mind pestering from a personal assistant, but not from a shared one. You know how in a movie “Her” there are two types of AI assistants – one is impersonating the main character girlfriend (Samantha) which ended up being shared and one is a personal game assistant where he plays a videogame. The problem I see is that the main character took a pestering from a shared assistant and it allowed her to copy his personality and eventually take over of his habits and personal affairs. Although a pestering from a game assistant sounded completely natural.
Thinking more on it, I’m wondering if smartphones is that relevant an analogy to jump from. Maybe “enhanced radio/TV” is a more appropriate start: 99% of passive content-broadcasting, but with 1% of notifications/interactions/actions.
That puts a whole lot of emphasis on device quality. I know one of my issues with getting an Alexa or Google Speaker is that there’s no way the loudspeaker is any good for music, and I hate duplication.
That is certainly an interesting possibility that the pundits who are thinking that voice UI will replace GUIs seem to be ignoring. This would suggest that consumers are not buying Amazon Echo for its AI, but as an alternative to Bluetooth speakers.
I think the critical question for these voice agents if AI can be made self-aware. I experimented with Zo – Microsoft AI assistant and found it quite rudimentary. Its novelty worn off after a day or two. Very inconsistent and doesn’t learn from the experience. Sometimes I wonder if for now at least the goal for the (voice) assistants is to collect large amounts of conversational data for the further experiments with AI. Which is OK although it diminishes the usefulness of getting a voice assistant for myself at the moment.