Last week, I attended Microsoft’s BUILD conference in Anaheim, where, among other things, Windows 8 details were rolled out to the Microsoft ecosystem. One of the most talked-about items was the Metro User Interface (UI), the end user face for the future of Windows. The last few days, I have been thinking about the implications of Metro on user interfaces beyond the obvious physical touch and gestures. I believe Metro UI has as much to do with voice control and air gestures as it does with physical touch.
Voice command and control has been a part of Windows for many generations. So why do I think Metro has anything to do with enabling widespread voice use in the future, and why do I think people would actually use this version? It’s actually quite simple. First, only a few voice command and control implementations and usage scenarios have been successful, and they all adopt a similar methodology and all come from the same company. Microsoft Auto voice solutions have found their way into Ford and Lincoln automobiles, branded SYNC, and drivers actually are using it. Fiat uses MS Auto technology as well. Microsoft Kinect implements a very accurate implementation for the living room using some amazing audio beamforming algorithms and a hardware four microphone array.
None of these implementations would be successful without establishing an in-context and limited dictionary. Let’s use Kinect as an example. Kinect allows you to “say what you see” on the TV screen, limiting the dictionary of words required to recognize. That is key. Pattern matching is a lot easier when you are matching 100s of objects versus 100K. Windows 8 Metro UI limits what users see on the screen, compared with previous versions of Windows, making that voice pattern matching all the easier. One final, interesting clue comes with the developer tablets distributed at BUILD. The tablets had dual microphones, which greatly assists with audio beam forming.
Air gestures are essentially what Kinect users do with their hands and arms instead of using the XBOX controller. When players want to click on a “tile” in the XBOX environment, they place your hand in the air, hover over the tile for a few seconds, and it selects it. Kinect uses a camera array and an IR sensor to detect what your “limbs” are doing and associates it with a tile location on the screen. Note that no more than 8 tiles are shown on the screen at one time, increasing user accuracy.
Hypothetically, air gestures on Metro could take a few forms, and they could be guided by form factor. In “stand-up” environments with large displays, they would take a similar approach as Kinect does. In the future, displays will be everywhere in the house and air gestures would be used when touching the display just isn’t convenient or desired. I would like this functionality today in my kitchen as I am cooking. I have food all over my hands and I want to turn the cookbook page or even start up Pandora. I can’t touch the display, so I’d much rather do a very accurate air gesture.
In desk environments, I’d like to ditch the trackpad and mouse and just use my physical hand as a gesture methodology. It’s a virtual trackpad or gesture mouse. I use all the standard Metro gestures on a flat surface, a camera tracks exactly what my hand is doing and translates that into a physical touch gesture.
Microsoft introduced Metro as the next generation user interface primarily for physical touch gestures and secondarily for keyboard and mouse. Metro changes the interface from a navigation-centric environment with hundreds of elements on the screen to content-first with a very clean interface. Large tiles replace multitudes of icons and applets and the amount of words, or dictionary is drastically reduced. Sure this is great for physical touch, but also significantly improves the capability to enhance voice control and even air gestures. Microsoft is a leader in voice and air gesture with MS Auto and Kinect, and certainly could enable this in Windows 8 for the right user environments.