This article is exclusively for subscribers to the Think.Tank.
This article is exclusively for subscribers to the Think.Tank.
As Ben Bajarin pointed out in his post here yesterday, Microsoft’s Xbox One is a whole lot more than a game console. Of course, the Xbox has long been the leading edge of Microsoft’s effort to dominate digital home entertainment. But a combination of clever new hardware and Microsoft’s unique positioning with respect to the entertainment industry could propel it to victory–and reverse in faltering fortunes in consumer businesses.
Of course, the hardware still has a lot to prove. The ultimate goal of the digital living room is a single box that can deliver all your entertainment desires. On paper, at least, the Xbox One comes closer than anything we have seen before. But features on paper, or even in a demo, are one thing and real life is another. Even Google TV looked sort of good in a demo before flopping with consumers.
The biggest challenge facing the Xbox One is the promised integration with cable set top boxes. Success will depend on the new Xbox’s ability to control the set top box through an easily set up HDMI connection. It needs to banish the cable box to irrelevancy for everything except accessing and decoding content, ultimately becoming your DVR and your gateway to video on demand. That would make it a huge breakthrough. But if it needs IR blasters to control cable, it will go the way of Google TV. Microsoft is so far silent on which boxes from which cable operators the Xbox will integrate with.
It also remains to be seen how well the gesture and voice control will work to replace traditional remotes or controllers. Again, these are technologies that often demo better than they work, but successful elimination of the need to use hardware to control the box would also be a huge step forward.
So it looks like Microsoft will have a hardware edge when the Xbox One ships “later this year.” The real challenge is to build on what already appears to be a slim lead in the availability of content. Here Microsoft can built on two advantages. One is that it has been a technology partner of both studios and and cable and satellite operators for years. For example, AT&T U-verse service runs on Mediaroom IPTV technology developed by Microsoft (the division was recently sold to Ericsson.)[pullquote]If Apple ever announces that unicorn of tech unicorns, an Apple television, it will have to get over a bar that has been raised by Microsoft. It’s been a long time since we could say that about any product.[/pullquote]
But a more important reason, and an odd one given Microsoft’s history as the big bully of the tech industry, is that Microsoft is the company that Hollywood is not afraid of. Microsoft’s leading rivals in the living room are Apple, Amazon, and Google (Sony could claw back into contention, but it has fallen a long way behind.) Each of these competitors inspires fear and loathing in the studios. Apple is the company that ate the music business. Amazon is the company that seems to destroy value in every market it enters–good for consumers, but torture for producers. And Google is a company whose ambitious are scarily unbounded. Apple and Google TV effort has been hobbled by lack of cooperation from content owners and distributors’ Google so far has restricted itself to selling and streaming downloads to other companies’ devices, though it is rumored to be contemplating a set top box of its own. In this company, Microsoft can position itself as an honest broker, a neutral player with no dog in the fight.
The only entertainment content deal that Microsoft announced at the Xbox launch was an exclusive with the National Football League that will bring a lot of “second screen” content, such as stats and highlights, while watching a game on your Xbox. But there was no word about making the games available outside of the NFL’s existing deals with CBS, Fox, NBC, and ESPN. (Microsoft will also get branding on the hoods of replay stations; let’s hope that works out better for them than Motorola branding on coaches’ intercom systems.)
In the end, it is Microsoft’s ability to strike content deals with studios, networks, and sports leagues and getting cable operators to support deep integration of Xbox with their services that will determine success in the living room. At a minimum, though, it seems that if Apple ever announces that unicorn of tech unicorns, an Apple television, it will have to get over a bar that has been raised by Microsoft. It’s been a long time since we could say that about any product.
For a run of at least 30 years, the “classic” consumer electronics industry successfully transitioned from one technology to another. TVs are a good example. TVs went from big color tubes to rear projection to flat panels and HD projection to HD panels. We can’t forget laser disc to Beta to VHS to DVD either. Consumers ate it up, too, and were pleased to roll the old iron out of the living room into another room and roll the new gear in.
Then things changed with 3D TVs, which were an unmitigated disaster for the industry. I call it a disaster because for the most part, consumers were not willing to pay more for 3D and in some cases flat out didn’t want it. HDTV margins collapsed and are still in a funk for many CE markets. 4K TV and Smart TV is NOT the solution either as research I have seen indicates general consumers won’t pay a premium. There are a few things going on here. First, already-installed generic 1080P flat panels at 10” will be a very good solution for many years to come. No one quite knows how long the installed base of displays will last, but it could be 10 years.
Smart TV’s, while valued more than 3D by consumers, isn’t valued at a lot either. Consumers are getting conditioned, too, to know you can add “smarts” for as low as $50 with the external add-on of a Roku, Apple TV, or DVD player. So what is the answer to revitalize the “classic” CE industry? You really need to understand the problem, and the problem is lack of immersiveness and too many constraints.
Certainly, 3D HDTV was more immersive than HDTV, but not enough so for us to spend hundreds more to replace our current 1080P TVs. Also, 3DTV had too many constraints, or what I like to call “if-thens”. Everyone in the room had to wear 3D glasses to enjoy the content and without it, the content is a blur. 3D glasses aren’t cheap, either, as active glasses were $50-$100 a pair. Then there is the hassle of charging and making sure every one of them is ready for the big movie. Then there is the nausea some people feel when watching 3D videos. There are 2.6M results from a Google search result from “3D” and “nausea.” Passive 3D like LG showed at this and last year’s CES will significantly lower the glasses cost and a few manufacturers showed prototypes of glasses-less 3D TVs, but are many years off and are not very high quality. 3D may not the answer, but what is?
Consumers are looking for immersiveness without constraints which is affordable. One example of this is a concept AMD showed off at CES. AMD showed off its “SurRoundhouse” proof of concept which is quite expensive and complex now, but takes the industry in the correct, general direction. The SurRound house is a “theater” room with 10, 55” HDTVs looking like windows in a house, 32 speakers, and four subwoofers. The ten LG 1080P HDTVs displayed more than 600 Mp per second at 10,800 x 1,920 resolution, which is 3X the resolution of 4K (UltraHD), albeit spread around the room. Driving the video and audio was one PC with an AMD FX 8150 8-core Black Edition processor with three FirePro 8000 graphics cards with Digital Multipoint Audio which was amplified by eight AV receivers.
AMD plays what looks like a hostage rescue scene from a video game and shifts audio from stereo to 32 speakers to show the value of high quality, multi-channel, positional audio. Each shift of the audio takes your eye to different windows of the house and as helicopters are flying, crashing, and as multiple machine gun melees erupt, you really feel like you are in a different and very real place. The content was entirely custom and to it takes work to get games and movies to take advantage of a setup like this. This is a different class of entertainment, one that could actually motivate to invest, maybe over-invest in new CE gear.
Here is a smartphone video I took of AMD’s SurRoundhouse. Of course you don’t get the same experience as as the real thing, but you can get somewhat of a sense of the experience below. Make sure you select 1080P and full screen:
AMD could have improved the experience even more by improving the quality of the graphics in the scenes. They looked more cartoony than life-like. AMD says that the goal of the demonstration was to show the experiential difference in the audio, but I’d still like to see max graphics to turn it into a reality show.
So how is a $35,000, 10 display, 8 receiver, 36 speaker setup requiring custom content “without constraints” and “affordable”? It’s not right now, but if you look ahead to new technologies, the cost curve, and need for CE and entertainment businesses to create radically different experiences, it could very well become affordable and relatively simple. Let me explain.
First challenge is content. The entertainment industry has shown that it will make changes if it sees potential extinction or at least a major depression in business. The film industry started shooting in 4K well before 1080P had mass adoption so the big question would be “if” they see the opportunity to shoot in multi-“frame” and multi-“angle” dimensions to be surround or at least convex. The next challenge is cabling, but possibly already has a video solution with multi-channel, 60Ghz wireless display technology. Lower frequency wireless speakers are already available, but the challenge would be to solve amplification at the current frequencies. The great thing about wireless audio is that you wouldn’t need eight receivers to send the right audio to the right speaker. Theoretically, you would only need one with a bunch of broadcast antennas.
Then there are the displays…. The current monitor sweet spot this year will be at around 30,” priced around $300. I can imagine in 5 years that that $300 display becomes 40-50” for a full room display build out around $3,000. This seems reasonable when you think that LG sells their 89” 4K TV today for $22,000. Yes, 4K displays will lower in price, but how many years before it gets down to $3,000?
AMD’s SurRoundhouse gives the industry a potential scenario for the entertainment or theater room of the future. While it doesn’t pass the tests for mass industry adoption today in media rooms, it could, and is certainly more interesting than the same boring, flat experience. Neither 4K or SmartTV is the solution to the woes of the traditional CE market and I hope they are looking at AMD’s glimpse of the future.
At Google’s I/O developer conference in June, Google boldly announced the Nexus Q, a $299 streaming music sphere. Last week, Google stopped selling the Nexus Q and cancelled all consumer pre-orders, saying buyers would get the unit for free. It appeared obvious to just about everyone except Google that the Nexus Q value proposition was incredibly weak. So how does one of the most powerful companies in the world allow a product so obviously not ready get to market? It happens for many reasons and more often than you might think. I’d like to characterize the many reasons products like this make it this far as I experienced it through my 20 years of product management and product marketing.
The Google Nexus Q Value Proposition
While I detail out the Nexus Q value proposition here, let me provide a small sample. For $299, consumers get a cool looking black sphere for music and video streaming that can only pull content from the Google Play cloud. To control it, you must have an Android phone or tablet, and if your friends want to add tracks to the music list they must have an Android device. The Q also came with a high quality, built-in amp and speaker jacks. Net-net, it was $200 or 3X more than an Apple TV or Roku device if you are looking at it as a content streamer. If you are audiophile, you will be comparing it to a Sonos, which while comparatively priced, can also stream from multiple cloud services, and be controlled by iOS and Android devices.
The press gave it the expected response.
The Press Reaction
The press reaction was brutal. The best way to show just how bad it was is with the headlines:
- The Google Nexus Q Is Baffling
- Google Nexus Q Review: An Unfinished Orb of Mystery
- Google Nexus Q Hands-On Review: The Buggy Streaming Story
- The Nexus Q Review: Magical Mystery Sphere Or $299 Gutter Ball?
Google I/O was a smashing success. They launched a great Nexus 7 tablet with Project Butter and Google Now, some cool social and search features, and who will ever forget the Google Glass demo? Google didn’t need to launch the Q to have what I would characterize as one of the more perfect developer conferences.
So why and how exactly do products like the nexus Q make all the way through the ideation, planning, execution, and launch phase without someone putting the brakes on? There are many ways and reasons this happens that I outline below, and product managers need to pay heed to all of these potential pitfalls. I am generically speaking about products, not specifically about the Nexus Q.
Don’t Solicit Outside Opinions
Some companies don’t solicit outside opinion by design. Outside opinions can be via market research, consultants, tech analysts, etc. The project is so secret that they put the team members in another building with limited access and have them sign NDAs. This is done, obviously for security, and there are hundreds of examples of this that the industry finds out about after the fact. This week in the Apple-Samsung trial, we all heard that the iPhone development was treated like this. This secret method obviously works well for Apple and a handful of companies, but for other companies, not so great. Some other companies don’t solicit outside input because they just don’t see value in it. Some say it’s too time consuming or too expensive. That’s just code for “I don’t see the value”. They figure they are the experts, have all the answers, and any outside inputs could lead to having the project derailed.
Living in an Alternate Universe
Sometimes, companies will solicit external input on a concept or product but don’t listen to the advice or heed it. Many times you will hear the phrase, “well that’s only one piece of input we’ll incorporate.” This usually means the outside input won’t be used or even heeded, because, quite frankly, the company “knows” more than the outsider giving the input. Or so they think. These kind of folks get outside input because the rule book says they needed to get it and once they did, it becomes a filled-in check box versus valued input. The product planner may believe the input, but just not heed the outcome. The end result is the same as not soliciting input.
Underestimating the Downstream Impact
Many times, a product will successfully make it down the gate process only for the PM to be surprised down the line by a cost over-run or an internal team missing a delivery date. This could very well have happened to the Nexus Q. I can imagine the Q had a very long list of required features and nice to have features. Judging by what launched, the “must-have” features didn’t even ship. I can see just a few features that could have been added to make the Q the must-have party device. What happened next was where the damage was done. So what is a product leader’s reaction if their product is hit with a significant cost overrun or schedule slip? In the case of the Q, the impact was most likely minimized. Look at many Google products that get sent out as “Beta” at Google. Most of them do. Google News and Shopping were betas for years and cost the consumer nothing but time to use. The difference is that the Nexus Q was $299 and it was more like Alpha stage if you gauge on feature completeness.
The Emperor Has No Clothes
When I have been privy to post-mortems of train-wreck projects, many times it comes down to a lack of leadership. The entire team had a decent vision, knew the customer and what they wanted, solicited external input, listened to it, and knew the downstream impacts of issues that existed if they trudged forward. But the leader in charge felt they had to meet the date at all costs or didn’t listen to the team members. The leader made a commitment to someone, whether it be to their VP, SVP or board of directors to deliver something by a certain date and they were going to do it come “hell or high water”. Leaders are trained to listen, be decisive and stick to their guns, sometimes at big costs.
What’s Next for the Nexus Q?
I believe the team managing the Nexus Q had a deadline to hit with Google I/O , they got in over their heads on features and just couldn’t deliver. And no one stopped the project. The Nexus Q name and Google brand has been damaged, but I think it’s recoverable. To maintain the Nexus Q price, Google will need to add a ton of features. I am envisioning the “ultimate party device” where everyone who comes to the party can play videos, music and games from their own device and the cloud. I can also see “party mode” where every picture, video clip, and social media post taken at the party is shown on a large HDTV. If something like that is a bridge too far, then Google will need to pull the price lever. The BOM cost cannot be over a $100 with the built-in 25W amp, so they have a lot to work with. Whatever Google does, they need to make some decisive moves if they hope to be successful with living room devices.
Yesterday, NVIDIA launched VGX and the GeForce Grid, which, among many things, could render future game consoles obsolete. This may sound very far-fetched right now, but as I dig into the details of the capability of the GeForce Grid and map that against consumer future needs, unless future consoles can demonstrably deliver something unique and different, they will just be an unnecessary expense and a hassle to the end consumer.
Problems with Cloud Gaming Today
Services exist today for cloud gaming like OnLive and Gaikai. They have received a lot of press, but it’s uncertain if their business models and experiences would exist years from now if they stay with their current approaches and implementations.
Scalability is one issue. Services need to directly match one cloud game session with one graphics card, so if you have 1,000 gamers, you need 1,000 graphics cards. You can just imagine the challenges in scaling that experience out to millions of users. You would need millions of graphics cards, which in a data center environment doesn’t make a lot of sense logistically or financially.
Latency is another issue. Cloud game services need to maintain severs 100s of miles away to maintain an appropriate latency in game-play. Latency is the lag time between when a user does something and they get a response. Imagine if there were a one second delay between the time you pull the trigger in Battlefield 3 and the time which something happens. This would render the cloud game absolutely unplayable. Latency in social media apps like Facebook is acceptable, but not with games. Having to provide “edge servers” close to end users like the industry does today is completely unproductive as you cannot leverage these same servers during off-times and it’s difficult to even leverage servers across different time zones. Therefore, servers are sitting around idle with nothing to do. This places another immense financial burden on the cloud game provider. NVIDIA and their partners are attempting to solve these problems.
Nvidia VGX and the GeForce Grid
NVIDIA, with VGX and the GeForce Grid is attempting to solve the scalability and latency problems associated with today’s cloud gaming services like Gaikai and OnLive. NVIDIA VGX are the technologies addressing the current virtual display issues and the GeForce Grid is the specific implementation to attack issues in cloud gaming. They are addressing the problems with two very distinct, but related technologies: GPU virtualization and low latency remote display.
Virtualization of the GPU enables more than one user to share the resources of a graphics card. Therefore, the one to one ratio between user gaming sessions and graphics card goes away. With NVDIA VGX, multiple users can share a single, monster-sized graphics card. This provides much better scalability for the cloud game data center and correspondingly reduces costs and increases flexibility.
Lower latency remote displays enable a significant improvement in the speed at which the remote image is sent to the end client device. In this cloud gaming scenario, the gaming frames are actually converted into an H.264 movie and sent to the user. NVIDIA has enabled improvements in the process by eliminating many steps in the process. The frame of the game no longer needs to touch the CPU or main memory and is encoded directly on the NVIDIA VGX card and sent directly over PCI Express to the network card. By bypassing many of the previous components and removing steps, this speeds up the process immensely. This delivers a few benefits. First, all things equal, it can deliver a much faster experience to the gamer that they never experienced before. The experience just feels more like it is happening locally. Combined with GPU virtualization, the reduced latency also enables cloud gaming data centers to be located farther away from users, which increases data center utilization and efficiency. It also enables entire geographies to be served that could never be served before as “edge servers” can be consolidated.
Wither Future Game Consoles?
If NVIDIA and its partners can execute on the technology and the experience, it would essentially enable any device that could currently playback YouTube video well to be a virtual game device. Gamers could play any game, any time, and immediately. What kinds of devices do that today? They are all around us. They are smartphones, Smart TVs, and even tablets. There’s no loading games off of a disc, no downloading 500MB onto a PC; its just pick the game and play. Once the gamer is done playing on the TV, they can just take their tablet and pick up in their bedroom where they left off.
This kind of usage model is quite common when you think of it. Many consumer books, movies and even music in this same way, so why not games? For many consumers, convenience trumps quality and that’s one of the issues I can see with future consoles. There is no doubt that the visual detail and user interfaces will be much more sophisticated than cloud gaming. As I look to how well the iPod did with its “inferior” music quality, consumers chose convenience over quality. Look at Netflix on a phone or tablet. Consumers can get much higher quality on the local cable service, but a growing number of consumers choose convenience over quality.
Device makers and service providers who don’t see any monetization currently off of games today will very aggressively adopt this approach. TV makers, for instance, see no revenue from any game played on their devices. Gaikai, as an example, is cutting deals with TV manufacturers like LG to provide this service built into every Smart TV in the future. Telcos and cable companies are also very motivated to tap into the huge gaming revenue stream.
I believe that consoles will adopt cloud gaming capabilities in addition to physical media or they will be viewed as lacking the features gamers want. I also believe that cloud gaming will seriously cannibalize future game consoles. Many who would have purchased a new game console if cloud gaming with NVIDIA VGX and GeForce Grid had not existed will not buy game consoles. With that premise, it begs the question if future game consoles have a bright future. If game console makers don’t do something aggressive, their future is looking dim.
If you would like a deeper dive on NVIDIA VGX and the GeForce Grid, you can download my whitepaper here.
(originally published at Forbes)
Back in September, I wrote an analysis on why Apple should build an HDTV. The premise was that there are huge experiential issues Apple could solve and they could strike a deal with the MSO’s and satellite companies. That was a big premise, but ironically with what Samsung showed at CES, it’s apparent Samsung will accelerate the likelihood ofApple launching an “iTV”.
Samsung 2012 Smart TVs at CES
At this year’s CES, Samsung made a very impressive showing in consumer electronics. They showed off an array of devices from intelligent refrigerators to thin and energy sipping OLED displays to phones to Smart TVs. Two major themes came out of the HDTV launches; smart interfaces, apps and cable and satellite content.
Smart Interaction, Kind Of
Samsung showed in controlled demonstrations their next generation of TV interfaces. Samsung calls it Smart Interaction, or the ability to control the TV through voice commands and far-field air gestures. Voice commands andthe air gestures work in a similar fashion to Microsoft’s Kinect. Get the TV’s attention with your voice and tell it to change channels, turn the volume up or down, go to apps, etc. Air gestures allow the consumer to use their hand as a virtual mouse clicking on an icon, or using the hand as a consumer would use their finger on a tablet by swiping or grabbing.
All of this is great in theory, but one of the challenges that I saw at CES was that it just didn’t work well. The demoer was having a very hard time with the system getting it to work. I talked to others at the show to see if in fact this was an anomaly, but it wasn’t. Smart Interaction didn’t work well for those I talked to either. This was a public demo in a controlled environment so I expected to see a better response, especially because you know everyone will compare it toMicrosoft Kinect and Apple’s Siri.
To be clear, what Samsung showed was a glimpse into their 2012 product line and not on shipping platforms, but was still concerning because perfecting these interfaces takes years, not months. Apple is proof of this in that Siri, the voice-control mechanism on the iPhone 4s is still beta three months after public launch.
Samsung Cable and Satellite Content Deals
Samsung also launched an impressive amount of U.S. content deals with Comcast, DIRECTV, Verizon, and Time Warner Cable. The vision is classic IP-TV, or removing the set top box and just plugging the Ethernet cable into the TV. In theory, this provides the consumer with a much more integrated TV-content experience.
Comcast will provide its Xfinity services directly to a new Samsung TV without the need of an STB. DIRECTV will give the new Samsung TVs to access to live and stored content from the satellite content provider. Verizon said it will provide Samsung the Verizon FiOS TV app which gives users access to 26 live TV channels and access to VOD titles through Verizon Flex View. Time Warner Cable and Samsung did show a demo of a user accessing stored content from a set top box in the home and said apps would be available “later this year.” While these announcements are complex and not as simple as saying, “all STB content now available on the new 2012 Samsung TV”, it was a step forward from last year where cable companies weren’t all that excited about this IPTV premise in a world where they are an icon next to Netflix and Hulu.
Samsung’s Smart Interaction Accelerate Apple iTV
Samsung demonstrated two things at CES 2012 related to Smart TVs in general. First, they showed how not to demo the next generation of TV user interface. Messing with the TV interface is dangerous in that it is the primary pathway to get to content. Users blame themselves when they lose the remote, but when users get an error with voice control or air gestures, they will blame Samsung and stop using it. Then they will tell 10 friends about it. Yes, it will improve over time, but from what I saw, there is a lot of improvement to do. This enables Apple, with an iTV, to perfect the user interface. Apple would undoubtedly leverage Siri for voice control and leverage local iOS devices to do this. Leveraging the huge base of iPhones, iPads, and iPods allows voice control to be better, in that the microphone is 10 inches away from you, not 10 feet. This helps block out more noise and generally could provide a much better experience. I believe it will work much better than Siri given the “dictionary” is smaller. The smaller the “dictionary”, which in this case will be content, the higher the likelihood it does what you want it to do.
Envision how this looks at a Best Buy. You will have a Samsung TV on one side of the store and an Apple iTV in theApple store within a store. The Samsung voice control may not demo well based upon what was shown at CES, and theApple voice control will “just work.” Net-net, by Samsung launching Smart Interaction before it’s ready provides a clear and demonstrable pivot-point for Apple to differentiate from. This is in a similar way to how Apple’s capacitive touch screen interface “just worked” and other phones didn’t just work well back when the iPhone first launched.
Samsung’s Content Deals Accelerate Apple iTV
The second thing Samsung demonstrated, and demonstrated well, was that they could cut deals with the cable and satellite guys. This breakthrough is important because it shows that there is a deal to be done. When a TV can blend cable, satellite, and OTT content, this is the “holy grail”. Even better is when the user can have one program guide or one database to find the content they want with a precise, by-user recommendation engine like Netflix and Amazon.
By Samsung breaking some newer ground with Comcast, DIRECTV, Verizon, and Time Warner Cable, this at least givesApple the most concrete idea of what it would take to for them to do a deal. Yes, Apple has been trying to cut deals with them forever, but Apple certainly doesn’t want Samsung to get too entrenched as it could dull some differentiation with an Apple iTV. Just as the iPod and iTunes got credit for aligning the music industry, Apple wants to get credit for aligning the cable and satellite providers and in turn, deliver a great experience to the users.
While Apple was not at CES 2012, their impact and industry reaction from Samsung will help accelerate development and launch of an Apple iTV. Samsung has provided Apple with an experience to pivot and differentiate off of, and has helped provide a basis point for Apple’s own deal with the cable and satellite companies. Samsung has helped accelerate Apple’s iTV. Ironic, yes?
As I wrote last week, Samsung and LG are following Microsoft’s lead in future interfaces for the living room. Both Samsung and LG showed off future voice control and in Samsung’s case, far-field air gestures. Given what Samsung and LG showed at CES, I believe that Sony could actually beat both of them for ease of interaction and satisfaction.
I have been researching in one way or another, HCI for over 20 years as an OEM, technologist, and now analyst. I’ve conducted in context, in home testing and have sat behind the glass watching consumers struggle, and in many cases breeze though intuitive tasks. Human Computer Interface (HCI) is just the fancy trade name for how humans interact with other electronic devices. Don’t be confused by the word “computer” as it also used for TVs, set top boxes and even remote controls.
Microsoft recently started using the term “natural user interface” and many in the industry have been using this term a lot lately. Whether it’s HCI or NUI doesn’t matter. What does matter is its fundamental game-changing impact on markets, brands and products. Look no farther than the iPhone with direct touch model and Microsoft Kinect with far-field air gestures and voice control. I have been very critical of Siri’s quality but am confident Apple will wring out those issues over time.
At CES 2012 last week, Samsung, Sony, and LG showed three different approaches to advanced TV user interfaces, or HCI.
Samsung took the riskiest approach, integrating a camera and microphone array into each Smart TV. Samsung Smart Interaction can do far field air gestures and voice control. The CES demo I saw did not go well at all; speech had to be repeated multiple times and it performed incorrect functions. The air gestures performed even more poorly in that it was slow and misfired often. The demoer keep repeating that this feature was optional and consumers could fall back to a standard remote. While I expect Smart Interaction to improve before shipment, there’s only so much that can be done.
LG used their Magic Motion Remote to use voice commands and search and to be a virtual mouse pointer. The mouse
pointer for icons went well, but the mouse for keyboard functions didn’t do well at all. Imaging clicking, button by button, “r-e-v-e-n-g-e”. Yes, that hard. Voice command search worked better than Samsung, but not as good as Siri, which has issues. It was smart to place the mic on the remote now as it is closer to the user and the the system knows who to listen to.
Sony, ironically, took the safe route, pairing smart TVs with a remote that reminded me of the Boxee Box remote which has a full keypad one side. Sony implemented a QWERTY keyboard on one side and trackpad on the other side which could be used with a thumb, similar to a smartphone. This approach was reliable in a demo and consumers will use this well after they stop using the Samsung and LG approaches. The Sony remote has microphone, too which I believe will be enabled for smart TV once it improves in reliability. Today the microphone works with a Blu-ray player with a limited command dictionary, a positive for speech control. This is similar to Microsoft Kinect where you “say what you see”.
I believe that Sony will win the 2012 smart TV interface battle due to simplicity. Consumers will be much happier with this more straight forward and reliable approach. I expect Sony to add voice control and far field gestures once the technology works the way it would. Sony hopes that consumers will thank them too as they have thanked Apple for shipping fully completed products. Samsung and LG’s latest interaction models as demonstrated at CES are not ready to be unleashed to the consumers as they are clearly alpha or beta stage. I want to stress that winning the interface battle doesn’t mean winning the war. Apple, your move.
Microsoft launched Kinect back in November 2010 in a move to change the man-to-machine interface between the consumer to their living room content. While incredibly risky, the gamble paid off in the fastest selling consumer device, ever. I saw the potential after analyzing the usage models and technology for a few months after Kinect launch and predicted that at least all DMA’s would have the capability.
The Kinect launch sent shock waves into the industry because the titans of the living room like Sony, Samsung, and Toshiba hadn’t even gotten close to duplicating or leading with voice and air-gesture techniques. With Samsung and LG announcing future TVs with this capability at CES, Microsoft’s living room interaction strategy has officially been affirmed at CES and most importantly, the CE industry.
Samsung launched what it called “Smart Interaction”, which allows users to control and interact with their HDTVs. Smart Interaction allows the user to control the TV with their voice, air-gestures, and passively with their face. The voice and air gestures operate in a manner similar to Microsoft in that pre-defined gestures exist for different interactions. For instance, users can select an item by grabbing it, which signifies clicking an icon on a remote. Facial recognition essentially “logs you in” to your profile like a PC would giving you your personal settings for TV and also gives you the virtual remote.
A Step Further Than Microsoft ?
Samsung has one-upped Microsoft on one indicator, at least publicly, with their application development model. Samsung has broadly opened their APIs via an SDK which could pull in tens of thousands of developers. If this gains traction, we could see a future challenge arise where platforms are fighting for the number of apps in the same way Apple initially trumped everyone in smartphones. The initial iPhone lure was its design but also the apps, the hundreds of thousands of apps that were developed. It made Google Android look very weak initially until it caught up, still makes Blackberry and Windows Phone appear weaker, and can be argued it was the death blow to HP’s webOS. I believe that Microsoft is gearing up for a major “opening” of the Kinect ecosystem in the Windows 8 timeframe where Windows 8 Metro apps can be run inside the Kinect environment.
Challenges for Samsung and LG
Advanced HCI like voice and air-gesture control is a monumental undertaking and risk. Changing anything that stands between a CE user and the content is risky in that if it’s not perfect, and I mean perfect, users will stop using it. Look at version 1 of Apple’s Siri. Everyone who bought the phone tried it and most stopped using it because it wasn’t reliable or consistent. Microsoft Kinect has many, many contingencies to work well including standing in a specific “zone” to get the best air gestures to work correctly. Voice control only works in certain modes, not all interactions.
The fallback Apple has is that users don’t have to use Siri, it’s an option and it can be very personal in that most use Siri when others aren’t looking or listening. The Kinect fallback is a painful one, in that you wasted that cool looking $149 peripheral. Similarly, Samsung “Smart Interaction” users can fallback to the remote, and most will initially, until it’s perfected.
There are meaningful differences in consumer audiences of Siri, Kinect, and Samsung “Smart Interaction”. I argue that Siri and Kinect users are “pathfinders” and “explorers” in that they enjoy the challenge of trying new things. The traditional HDTV buyer doesn’t want any pathfinding or exploring; they want to watch content and if they’re feeling adventurous, they’ll go out on a limb and check sports scores. This means that Samsung’s customers won’t appreciate anything that just doesn’t work and don’t admire the “good try” or a Siri beta product.
One often-overlooked challenge in this space is content, or the amount of content you can actually control with voice and air gestures. Over the top services like Netflix and Hulu are fine if the app is resident in the TV, but what if you have a cable or satellite box which most of the living population have? What if you want to PVR something or want to play specific content that was saved on it? This is solvable if the TV has a perfect channel guide for the STB and service provider with IR-blasting capabilities to talk to it. That didn’t work out too well for Google TV V1, its end users or its partners.
This is the Future, Embrace It
The CE industry won’t get this right initially with a broad base of consumers but that won’t kill the interaction model. Hardware and software developers will keep improving until it finally does, and it truly becomes natural, consistent, and reliable. At some point in the very near future, most consumers will be able to control their HDTVs with their voice and air gestures. Many won’t want to do this, particularly those who are tech-phobic or late adopters.
In terms of industry investment, the positive part is that other devices like phones, tablets, PCs and even washing machines leverage the same interactions and technologies so there is a lot of investment and shared risk. The biggest question is, will one company other than Microsoft lead the future of living room? Your move, Apple.
In what seems to be a routine in high-tech journalism and social media now is to speculate on what Apple will do next. The latest and greatest rumor is that Apple will develop an HDTV set. I wrote back in September that Apple should build aTV given the lousy experience and Apple’s ability to fix big user challenges. What hasn’t been talked about a lot is why voice command and control makes so much sense in home electronics and why it will dominate the living room. Its all about the content.
History of U.S. TV Content
For many growing up in the U.S., there were 4-5 stations on TV; ABC, NBC, CBS, PBS and an independent UHF channel. If you ever wanted to know what was on, you just looked into the daily newspaper that was dropped off every morning on the front porch. Then around the early 80’s cable started rolling out and TV moved to around 10-20 channels and included ESPN, MTV CNN, and HBO. The next step was an explosion in channels brought by analog cable, digital cable and satellite. My satellite company, Time Warner, offers 512 different channels. Add that to the unlimited of over the top “channels” or titles available on Netflix, Boxee, and you can easily see the challenge.
The Consumer Problem
With an unlimited amount of things to watch, record, and interact with, finding what you want to watch becomes a huge issue. Paper guides are worthless and integrated TV guides from the cable or satellite boxes are slow and cumbersome. Given the flat and long tail characteristic of choices, multi-variate and unstructured “search” is the answer to find the right content. That is, directories aren’t the answer. The question then becomes, what’s the best way to search.
The Right Kind of Search
If search is the answer, what kind of search? The answer lies in how people would want to find something. Consumers have many ways they look for things.
Some like to do surgical searching where they have exacts. They ask for “The Matrix Revolutions.” Others have a concept or idea of what they are looking for but not exactly; “find the car movie with Will Ferrell and John Reilly” and back comes a few movies like Step Brothers and Talladega Nights. Others may search by an unlimited amount of “mental genres”, or those which are created by the user. They may ask for “all Emmy Award winning movies between 2005 and 2010”. You get the point; the consumer is best served with answers to natural language search and then the call to action is to get that person to the content immediately.
Natural Language Voice Search and Control
The answer to the content search challenge is natural language voice search and control. That’s a mouthful, but basically, tell the TV what you want to watch and it guides you there from thousands of entry points. Two popular implementations exist today for voice search. There are others, like Dragon Naturally Speaking, but those are niche commercial plays.
Microsoft has done more more to enhance the living room than any other company including Apple, Roku, Boxee and Sony. Microsoft is a leader in IPTV and the innovation leader in entertainment game consoles. With Kinect, a user can use Bing to search and find content. It works well in specific circumstances and at certain points in the experience, but it needs a lot of improvement. Bing needs to find content anywhere in the menu structure, not just at the top level. It also needs to improve upon its ability to work well in a living room full of viewers. Its beam-forming is awesome but needs to get better to the point that it serves as a virtual remote.
Finally, it needs to support natural language search and the ability to narrow down the choices. I have full confidence that they will add these features, but a big question is the hardware. The hardware is seven years old. Software gymnastics and offloading some processing to the Kinect module has been brilliant, but at some point, hardware runs out of gas.
While certainly not the first to bring voice command and dictation to phones, Apple was the first to bring natural language to the phone. The problem with the current Siri is that its not connected to an entertainment database, its logic isn’t there to narrow down choices, and it isn’t connected to a TV so that once you find what you are looking for you can immediately switch the TV.
As I wrote in September (before Apple 4s and Siri), Apple “could master controlling the TV’s content via voice primarily.” If Apple were to build a TV, they could hypothetically leverage iPhones, iPads, iPods to improve the voice results. While Kinect has a full microphone array and operates best at 6-8 feet, an iPhone microphone could be 6 inches away and would certainly help with the “who owns the remote” problem and with voice recognition. Even better would be if multiple iOS devices could leverage each others sensors. That would be powerful.
While I am skeptical in driving voice control and cognition from the cloud, Apple, if they built a TV, could do more local processing and increase the speed of results. Anyone who has ever used Siri extensively knows what I am talking about here. The first few times Siri for TV fails to bring back results or says “system unavailable”, it gets shelved and never gets used again by many in the household. Part of the the entertainment database needs to be local until the cloud can be 99% accurate.
What about Sony, Samsung, LG, and Toshiba?
I believe that all major CE manufacturers are working on advanced HCI techniques to control CE devices with voice and air gestures. The big question is, do they have the IP and time to “perfect” the interface before Apple and Microsoft dominate the space? There are two parts to natural language control, the “what did they say”, and the “what did they mean”. Apple licences the first part from Nuance but the back end is Siri. Competitors could license the Nuiance front end, but would need to buy or build the “what did they mean” part.
Now that HDTV sales are slowing down, it is even harder to differentiate between HDTVs. Consumers haven’t been willing to spend more for 3D but have been willing to spend more for LED and Smart TV. Once every HDTV is LED, 3D and “smart”, the key differentiator could become voice and air gestures. If Sony, Samsung, LG and Toshiba, aren’t prepared, their world could change dramatically and Microsoft and Apple could have the edge..
While I don’t believe it, to many, it appears that Apple has already won the smartphone and tablet wars, so the next logical conclusion is “what’s next”. Many articles about the Apple in the TV business rumors (not to be confused with the “hobbyist” Apple TV) focus on what a lousy business TVs are or questioning if Apple could add enough incremental value given cable and content companies have the power position. These are good and pragmatic reasons, but then again when has Apple been pragmatic? I see nothing pragmatic about expensive MP3 players at 2X the price of others, paid music downloads or app stores 10 years ago. I personally would like to see Apple enter the TV market.
TVs and STBs Have Big Issues
Let’s face it, TVs aren’t very easy to use, especially when they are connected to a set top box (STB). Most of us tech-heads forget just how literate we all are with technology. Just ask a less tech-literate person to change inputs on the TV to go from the set top box to the DVD player. Many times they have “Channel 3” written down somewhere so they remember. Ever lost that remote? Sure you have and it really pissed you off. How about a set top box from a cable company? Mine takes almost a second to change the channel. And why do I keep running out of storage when I have TBs of secured storage in other parts of my house? I know what you are thinking… too complex, too many companies involved with too many conflicting agendas. Well, I’ve heard that same short-term thinking before with digital music.
Big Problems Need a Fearless Company like Apple
Apple has a solid track record in fixing those issues that have plagued users for years. Apple has significantly moved the industry in:
- Simple digital content downloads
- Application purchasing and updating
- UI simplicity
- Computer boot time, wake from sleep time
- Reliability and dependability
So Apple fixes huge issues and TVs and STBs have big issues. It sounds like the perfect match.
A Bold Assumption on Content and the Distributors
My assumption is that Apple will find a business model the content providers will find advantageous or tempting enough to cross the cable and satellite companies. If not, then you would expect them to declare war and do everything in their power to circumvent this by investing in the “pipe” or content companies themselves. This market is too huge and too big an opportunity for the most valuable company on the planet to pass up. I know, this sounds impossible, but when Napster arrived on the scene, how plausible did iTunes sound? How plausible did downloadable movies sound with bit-torrent around?
So why should Apple make a TV? Because there is so much they could improve and people will pay a hefty premium to have a superior experience in a few different areas.
Finding Content via Advanced HCI
Controlling a device with 1,000s of “channels” makes absolutely no sense with a physical remote like we have today with up and down buttons and even numbers. This would be like instead of having Google web search as we have today, we were stuck with Yahoo directories and no search. Directories made sense until the options exploded, like we have today with content.
Apple is one of a few companies who could master controlling the TV’s content via voice primarily, then secondarily air gestures for finer grain controls. First, the TV needs to be smart enough to determine who in the room has “control” and who doesn’t. It’s the future problem of today’s “who has the remote” issue. Then it needs to separate between background noises and real people if you are to have the best voice control. After you have found what you want to watch, you can fine-tune with the flick of a finger. This takes technologies even more advanced than the Kinect to pull this off, including the right sensors and parallel compute power delivered by OpenCLTM frameworks.
Apple Device Integration
If Apple developed a TV, they could conceivably guarantee that the iPhone, iPad, Time Capsule and Macs could seamlessly share content between each other. We have seen from the issues with Android and webOS on getting Netflix and Hulu+ that content providers are more apt to license when there are more closed systems.
As I am watching my NFL Football game, I want perfect, real-time sync of stats on my iPad, and want to be able to carry the game from the media room with me on my iPhone into the kitchen. I’d like overflow content storage to go to my Mac, PC, or Time Capsule. Finally, I would also expect to see sharing of basic sensors like cameras, microphones, gyroscopes, proximity sensors, and accelerometers to extend and facilitate security, monitoring, and gaming applications.
I would want some of the basic positive characteristics I get in my iPhone and iPad in my iTV. I would expect it to be very responsive, reliable, and with a sense of awesome style. My set top box or my TV is neither of these. I would know that every differentiating feature would work well or it wouldn’t be included. I would also expect some key 10’ UI apps as well.
I believe Apple can and will be able to arrive at a business model with content providers and cable/satellite companies. Either that or it will get very ugly for everyone. The most valuable company in the world with a huge pile of cash, no debt and a historic track record of pioneering breakthrough content deals can do this, or if forced to will go around it. Apple has been a company that fixes those nagging problems, and the TV and STB have a lot of them. Our basic method of finding content is broken. STBs crash and are slow and don’t work with other devices in the home. I’d like to see Apple fix these issues. How about you?