Within my industry circles of analyst colleagues, industry executives, and venture capitalists, the idea of next-generation computer interfaces comes up frequently and conversational UI is a main theme. You are going to here quite a bit about this topic so I thought it would be useful to establish a big picture foundation.
I’ve been thinking about computer interaction models the past year and have concluded it is easiest to simplify how we interact with computers by bring it down to workflows. Every interaction we have with a computer comes down to a task or set of tasks. Prior to smartphones, our workflows were defined by a mouse and keyboard. They were our only input mechanisms to interact with a computer. Smartphones brought about touch as an input mechanism and now voice is being added. Gestures are something that has existed in pockets of experiences like video gaming but is a much less common computer interface than typing, touching, pointing (finger/mouse) and speaking.
If we distill our computer interaction models down, it helps us better frame how different input and output mechanisms can vary based on things like situation, context, physical locations, etc. For example, voice is a slam dunk inside the home for workflows like turning on lights or adjusting thermostat or other tasks. Specifically because, more often than not. the object you want to interact with has no screen or you are not close enough to the screen to touch it. Saying, “Turn the AC to 65 degrees” from any location in the home is an easier and more efficient workflow than walking to the thermostat or pulling out your smartphone to open the app to adjust it. Similarly in a car, voice is ideal because your hands are tied up and, for safety reasons you shouldn’t spent a lot of time fidgeting with a screen to play music, look up directions, find nearby points of interest, etc. Voice interfaces add quite a bit of value in computer interactions to contexts where before there either were none or the process was less efficient than using voice to interact with the computer.
However, voice is not and will likely never be the primary computer interface. It will be one of many which extend new capabilities and efficiencies. But all our computer interaction models will need to work harmoniously together to give us the widest range of workflow possibilities. This brings us to the conversational element.
The interesting thing in describing this computer interaction as a conversation is because it is natural. Humans are used to this type of communication whether it is voice or text. I’d offer that the ways humans use technology is largely conversational. We spend quite a lot of time either in text message or email conversations as a healthy portion of our time using all devices. So why not add this element at a computer interaction level? The possibilities of deeper engagement in things like searching, commerce, automation, and even new workflows which don’t exist yet are likely to come from this interaction model.
When we really drill down to the underlying meaning of the conversational interface, what surfaces is the common theme of intent and context. The belief is we will have advancements in machine learning, deep learning, and overall artificial intelligence and that our interactions with computers will deepen due to their ability to truly understand us. Not just understand what we say but know about our likes, dislikes, preferences and as intimate of details as we allow them to know in order to be more helpful to us.
A statement has been made before that “A computer should never ask a question it should know the answer to.” Currently, computers don’t truly have any context on us so they continually need information which conceivably they should know. This is ultimately what this entire concept seeks to solve.
Viv, a new voice startup from the folks who had a role in creating Apple’s Siri, demonstrated the power of voice when context and third party APIs are integrated into such a platform. An example I found particularly interesting was when the demo showed a voice transaction of paying someone back. You could say, “Pay John back $15” and, with the API being tied to Venmo in this case, the entire process of paying a friend back was automated and implemented using voice automation. You can watch the whole demo here of the Viv launch for a deeper look at the concept.
All of this is setting the stage for the next few months when, at both Google’s IO and Apple’s WWDC, I expect the voice interaction/APIs for Siri and Google Now to be highlighted in some capacity. While we are still extremely early, the groundwork for this new interaction layer is being built right now.
What has changed in the past few years is humans’ willingness to engage in speaking with computers. I expect these types of technologies to be adopted quickly and add significant value to how we interface with computers and more easily automate workflows in the future.