The Voice UI

on June 29, 2015
I hear the phrase “voice is the new UI” often during meetings around Silicon Valley. This is nothing new. I’ve been involved in many industry discussions over the past 15 years where the “voice as the user-interface” vision has been well articulated. Science fiction stories have long portrayed humans interacting with machines via voice and, to the astonishment of the audience, the machines talk back. Consumer technology is unquestionably headed in this direction. We will have to explain to future generations what it was like to live in a world where we couldn’t operate our electronics with our voice and our electronics were not smart enough to understand us.

If we step back and take a look at a general theme in consumer technology today, we notice a pattern emerging – the elimination of friction. The success of messaging apps as platforms all over the world are based on the simple premise of eliminating friction. The move to contactless mobile payments is a move to eliminate friction. Google with Now on Tap is moving in that direction as is Apple with Proactive in iOS 9. The examples are countless and the trend is clear. Convenience trumps nearly everything in consumer electronics and things that eliminate friction are convenient.

Being able to pay with my smartphone or smartwatch is convenient and eliminates friction. Amazon’s brilliant idea for one-click purchases was to eliminate friction, making it easier and faster for me to buy things from Amazon. Voice as a user-interface layer eliminates the friction for many tasks that are possible by typing on my smart device but, often for such small interactions, voice is much more convenient. To text my wife a short message I could pull out my phone, used Touch ID to log-in, pull up iMessage, click on her contact info, and start typing. Or I could lift my Apple Watch and say “Hey Siri, text Jen I’ll be home in 30 minutes.” If we believe we are on the grand path to eliminate as much friction as possible from the world of technology, we have to believe voice truly is the new UI. Honestly, it can’t get her fast enough.

I look at this in two ways. The first world viewpoint and the third world viewpoint.

First World Problems

I was having a discussion with a family member about the future and he said “I want to be able to talk to my oven and tell it to turn on to 450 degrees.” Voice as UI layers applies to all kinds of household appliances. “Refrigerator, how much milk do I have left?” Or “how many eggs do I have?” “Do I have everything I need to make waffles?” In this vision and many like it, the appliances talk back, making sure we get what we need. Your refrigerator may tell you that you need more eggs and ask if they should be added to your grocery list. Once your shopping list is complete, you can send the request to have everything you need delivered by the end of day.

Interestingly, Amazon’s Echo is presenting this vision and trying to make it mainstream via a singular household appliance. If you have never seen this video on the Amazon Echo I recommend it. I’ve yet to try an Amazon Echo but one is on the way. This product is a great example of the potential of voice UI and what can happen when more and more of our appliances become “smart.” When you watch videos like this or have experiences of our own where we use voice to control and interact with appliances, you conclude this is the direction we are heading. The challenge is all the innovation surrounding this vision that still needs to happen.

Echo is great, but it’s only one product. While Amazon is touting integration with smart home products so you can control them through Echo, most appliance companies will be slow to adopt any standard and integrate with a product like Echo. It’s more likely, at least in the beginning, that each appliance manufacturer will want to build the smarts into their appliances rather than work through an aggregator. This is a debate I hear frequently in industry circles.

It is true voice recognition has come a long way. However, we still need artificial intelligence layers in the cloud to mature even more than it is today. The cost of components and sensors need to come down in price as well before we can see this expand to everyday appliances at price points the masses can afford. As much as I want this vision to become a reality sooner than later, it seems we still have a bit of a wait ahead.

Third World Problems

As interesting as voice is as an interaction layer to most of us in the developed world, it may evolve to become central to those in the third world, particularly with things like smartphones. One of the primary problems, besides economics, to connecting the next billion humans to the internet is a lack of technical literacy and often the lack of literacy at all. There are massive pockets of humans who live in villages with maybe one TV and radio. Which brings up the interesting question of how would they use a smartphone even if they could afford one and the data plan attached to it? This is where things like voice as a user-interface may provide a solution.

There is still a long way from commercialization for this specific use case and voice as UI will have to become fully mature and established in developed markets first. But if we can bring natural interfaces like voice to the masses and include the ability to understand the many languages and dialects spoken today, we could be one step closer to connecting the next billion and the several billion after that to the internet.