Siri – Tech.pinions

Siri’s Bing Moment

There were many interesting nuggets that came out of WWDC 2013. For our insiders, I plan to share the few that I don’t think are getting enough attention but yet are more significant than I believe people realize. But perhaps the most awkward part of the keynote was when Apple announced that the new and updated version of Siri will run on Microsoft’s Bing search engine.

This move is clearly one that is up for interpretation. I’m sure many will speculate that this move is nothing more than Apple doing what they can to eliminate any dependencies for Google on core services. Or that Apple does not want to give Google any more valuable data than they already have.

We have opined and written much on our thoughts that Apple clearly wants to usurp the search experience from Google. Siri is a way that this is happening as it functions as an interface layer, which Apple controls, for a search paradigm. Realistically, for a Siri user, it is irrelevant which search engine it uses so long as the data is accurate.

So I decided to put Bing to the test. Microsoft has a challenge called Bing It On in which they challenge you to submit five search queries then vote on a side-by-screen results screen on which you thought was most relevant to you. You don’t know which engine you are choosing, you simply pick the side that you think presented the best results. So I decided to try this as an experiment.

Here are the search queries I used.

– How to identify a queen bee cell
– How to play bluegrass guitar
– Schedule for Wimbledon 2013
– Omelet recipe ideas
– Grammar resources

The way in which I decided which side-by-side screen shot won was by how close to the top the most relevant answer was to the reason behind my search. Interestingly Bing won 4 out of 5 times. The only query Google won was the Wimbledon schedule.

I was actually surprised at this and it has inspired me to try and change my default search engine from all my devices from Google to Bing as a longer term experiment.

As I pointed out before, Siri running Bing may be up for interpretation in terms of Apple’s intentions. However, what matters is that the results are relevant and actionable.

The last thing I want to point out, and I plan to flesh this out more in the future, is that I will not be surprised if we see Apple and Microsoft become closer partners on things in the future. It appears they both now believe they have a common enemy in Google. ((I’m not sure Apple believed this until the last few years)) What’s more, is that in my opinion Google’s enemy is not Apple but it is Microsoft. I firmly believe that Google prefers Apple in the world but wants to eliminate Microsoft from the face of the planet.

Microsoft knows this and I believe will find ways to strategically partner with Apple in this fight. One could be brining Office to iPad only and never to Android. Bing is just the first of many strategic moves I think Apple and Microsoft will take to make sure the Google dictatorship does not rule the world.

Why Siri Won’t Go Beyond the iPhone–For Now

Since Apple launched the Siri app on the iPhone 4S last fall, there has been a widespread assumption that Siri’s voice-driven semantic search might soon find its way to other Apple products. At the top of everyone’s list was the still notional Apple television, bolstered by the belief that Steve Jobs’s deathbed claim to have “cracked” TV was based on the development of a voice interface.

Don’t get too excited. I think Siri will continue to improve on the iPhone and might well migrate to the iPad, but its not likely to go anywhere beyond these handheld devices for some time to come. Both the technology and the psychology have to be right, and both are far from ready.

Siri on the iPhone is a big step forward, but it is very far from perfect. Mostly it understands me, sometimes it doesn’t. sometimes it has a useful answer to a question, sometimes it doesn’t. It’s a lot better than any previous voice/natural language effort, but I still rely on the keyboard or other touch interface elements most of the time. Actually, the iPhone makes a natural Siri development platform for Apple because even iPhone users are inured to mobile phones that fall well short of perfection. For example, calls drop, voice quality is often awful, messages arrive hours after they were sent. So we’re prepared to put up with a personal assistant who doesn’t always understand us. Apple, with its sharp focus on user experience, will be reluctant to push Siri into territory where customers may be disappointed by the performance.

Our expectations for television and cars, the logical targets for voice control, are much higher than for mobile phones. At the same time making voice control work is much harder for engineering reasons. Cars are actually the easier challenge. Apple has avoided the automotive market, but others are in the game and Microsoft is the clear leader, especially with its partnership with Ford.

Natural language understanding is a big computer science challenge for voice systems, but there are also a considerable audio engineering issues to solve. Speech recognition requires a high quality audio signal, and the more free-form the speech, the better the audio has to be. An airline reservation system can understand me over a poor cellphone connection (most of the time) largely because the vocabulary and syntax of airline reservations is very constrained. But a Siri-like system is supposed to understand anything.

Siri on the iPhone works as well as it does because the phone starts with a decent microphone system that is close to the speaker and filters out extraneous noise. Cars are a pretty good environment as well. Voice systems usually are activated by pressing a button on the steering wheel that can also mute the audio system. There are lots of good places to put microphone arrays close to the driver. And while the sounds of driving create a lot of ambient noise, it is of the predictable sort that noise-cancellation systems handle well. I expect to see car systems get a lot better, but I don’t see Apple becoming a player. Apple likes to be top dog, and that would not be the case in a relationship with auto makers, who are quite insistent that car buyers are their customers, not those of third-party vendors. (Microsoft may do the software and Nuance the speech recognition, but Sync is a Ford product through and through.)

The living room is far tougher, but here to Microsoft may well have the edge, this time because of Kinect sensor technology. Pure voice control of a television is extremely difficult. Unlike a car, you don’t know where the speaker is going to be, so you need a sophisticated speaker microphone array that can find and focus on the speaker, who might be 10 feet away. Such systems exist, but they are mostly still in the lab and, at least initially, are likely to be quite expensive.

You also need the equivalent of a push-to-talk button, or the voice recognition system is going to be saddled with the near impossible task of hearing anything over the sound of its own audio. Here’s where Kinect might come in very handy. It’s ability to recognize gestures and to combine gestures with speech might yield a much better interface, much faster than voice alone. This plus an enormous research investment in speech and natural language understanding, which admittedly have yet to yiled much in the way of products, might give Microsoft a considerable edge in the battle of the living room.

Of course, the big TV challenge for Apple, Microsoft, or anyone else is striking the deals needs with content owners that will permit a viewing experience that unifies internet video with cable and broadcast TV. Difficult as the technical issues are, this business challenge may prove tougher to crack.

Future iPads Will Cannibalize TVs

As ZDNet’s Adrian Kingsley-Hughes points out, we know absolutely “nothing” about the iPad 3 right now. While pontificating about future Apple products is a lot of fun, drives many page views and makes web site editors very happy, it’s just a pontification factory. At some point in the near future, the iPad will have a better display and will be lighter than its predecessors, which drives me to the conclusion that the next generations of the iPad will start to rapidly cannibalize HDTVs, particularly second or third sets. I’d like to share my thoughts on why this will happen.

Consumers Radically Changing TV Viewing Habits

I have done a lot of consumer research and have been tracking PC use in the living room for years. Around 10 years ago, outside the very tech savvy, users started augmenting their TV viewing experience with a notebook PC. The early majority pecked away at their notebooks as the rest of the family watched something the early majority somewhat ignored. This model then transitioned into family members watching shows that were on the major broadcaster’s web sites. Remember when Big Brother started providing live PC feeds? This model quickly was augmented by Hulu and Netflix diving into the market as an intermediary. The iPad followed which provided very simple, manicured apps that gave access to rich “TV” content from Netflix, Hulu, and even cable companies like Time Warner and Comcast.

These viewing habits drove a wedge between two distinctly usages; personal and group viewing. Mobile devices like the iPad and their services enabled growth of personal viewing and consumers could watch finer slices of what they wanted to watch, when and where they wanted.

Group viewing isn’t going away any time soon, but as more people spend time on personal viewing, group viewing declines. They only exception is “crossover” viewing where a family member is wearing headphones watching another show on a mobile device while other family/dorm members are watching on the HDTV. Regardless of the viewing model, it drives the need more personal viewing devices and less group devices, or certainly drives the behavior to prioritize personal over group.

Consumers are changing their viewing habits from group to personal, but will future iPads be up to the task in terms of video and weight?

Future iPad Display as Good as Watching a 75″ HDTV?

No one publicly knows the iPad 3 resolution for sure, but let’s assume that the lines are doubled horizontally and vertically to provide a “2K” (2,048×1,536) resolution which provides 4X the resolution of the current. I will also assume that content will come in three flavors: 1) upressed to 2K by the iPad, 2) services provide upressed 2K content, and 3) in special cases iTunes will provide native 2K content. Net-net there will be video content that can take advantage of the new and higher resolution.

Most people watch iPad video content between 12″ and 16″ from their eyes depending on if they’re in bed or sitting on a couch. Assuming the iPad 3 is 2K, the visual experience would be similar to watching a 75″ HDTV at 10′. Users vary in terms of visual acuity and even neck length, but mathematically the numbers are accurate and make sense. The farther the TV is away from the user, the larger it must be to compensate for the distance is away from the user. Future iPads will provide a similar video experience as a huge HDTV.

Weight is a Deal-breaker

Many I discuss this with argue that the iPad 2 is light enough to replace much of the TV viewing. There aren’t “standards” that dictate this, but from doing primary research with consumers, there are products that more comfortably enable someone to comfortable hold a device for hours and stare at it comfortably. The Amazon Kindle DX2 at 18.9 ounces (1.18 lbs.) is the closer to right form factor and weight to be held comfortably in bed or on a couch. If you have ever watched a show on an iPad 2 at 1.325 lbs. in the bed, you know exactly what I am talking about. After a while, you wish the iPad would just float so you didn’t have to touch it. Or you find a way to rig the stand so that you can lay on your side without touching it. To effectively replace a TV, future iPads must be significantly lighter to effectively replace personal TV viewing.

5 iPads for the Price of One HDTV

Consumers have an amazing way of rationalizing cool electronics they want to buy. Most consumers use their heart over their head when making an electronics purchase and I can see users rationalizing buying multiple iPads instead of placing their investment into a TV. If Apple were to go after the secondary TV market with vengeance, I could see the consumer rationalism going like this…. “Hmmmm…. I can buy 5 iPads for the price of one high quality 60″ TV. And everyone in the family would have them. I will be the hero of the family in that everyone gets an iPad where they watch what they want to watch and do all the other great iPad things. And, when we want to watch the big game together, we can watch the older, but good enough 50″ we bought 5 years ago.” This is human rationalization at work and happens every day in consumer electronics.

Siri Will Push Consumers Over the Edge

I’m not going out on a limb when I say I believe Apple will integrate Siri into future iPads. The stronger commitment I think they will make is to an entertainment dictionary and natural language capability. As I wrote last September on the fabled “iTV”, instead of popping between Netflix, Hulu+, YouTube, iTunes and TWC TV, I believe Apple will aggregate and index this “channel” content into Siri to provide a one-stop touch and voice enabled experience. In this way, the users can say “find Revenge” and Siri will scan across all of the registered sources and look for re-runs, live or taped versions of “Revenge”, regardless of the source. This is the ultimate remote controls in our world where there are a 1,000s of “channels” available.

This will serve as the final consumer rationalization point they need to make the tradeoff between a new TV or iPad.

Holiday 2012 Will Provide a Directional Indicator

With 3D an unmitigated flop in TVs and flat panel saturation becoming a reality, the TV industry is banking on “Smart TV” to pull it out of the hole in 2012. As I’ve written previously, TVs won’t be very smart in 2012 when it comes to advanced user interface and they aren’t bringing anything else to the table to motivate consumers to replace their old HDTV. What is new is a shiny new iPad with much higher resolution than their TV and the best entertainment remote control interface at a dramatically lower price than a new TV. I believe the future iPad will take a big chunk of the secondary TV and even delay new primary TV purposes through consumer rationalization that it can serve as the primary “personal TV” device and I expect to start to see the effects in the holiday 2012 selling season.

Voice Control Will Disrupt Living Room Electronics

In what seems to be a routine in high-tech journalism and social media now is to speculate on what Apple will do next. The latest and greatest rumor is that Apple will develop an HDTV set. I wrote back in September that Apple should build aTV given the lousy experience and Apple’s ability to fix big user challenges. What hasn’t been talked about a lot is why voice command and control makes so much sense in home electronics and why it will dominate the living room. Its all about the content.

History of U.S. TV Content

For many growing up in the U.S., there were 4-5 stations on TV; ABC, NBC, CBS, PBS and an independent UHF channel. If you ever wanted to know what was on, you just looked into the daily newspaper that was dropped off every morning on the front porch. Then around the early 80’s cable started rolling out and TV moved to around 10-20 channels and included ESPN, MTV CNN, and HBO. The next step was an explosion in channels brought by analog cable, digital cable and satellite. My satellite company, Time Warner, offers 512 different channels. Add that to the unlimited of over the top “channels” or titles available on Netflix, Boxee, and you can easily see the challenge.

The Consumer Problem

With an unlimited amount of things to watch, record, and interact with, finding what you want to watch becomes a huge issue. Paper guides are worthless and integrated TV guides from the cable or satellite boxes are slow and cumbersome. Given the flat and long tail characteristic of choices, multi-variate and unstructured “search” is the answer to find the right content. That is, directories aren’t the answer. The question then becomes, what’s the best way to search.

The Right Kind of Search

If search is the answer, what kind of search? The answer lies in how people would want to find something. Consumers have many ways they look for things.

Some like to do surgical searching where they have exacts. They ask for “The Matrix Revolutions.” Others have a concept or idea of what they are looking for but not exactly; “find the car movie with Will Ferrell and John Reilly” and back comes a few movies like Step Brothers and Talladega Nights. Others may search by an unlimited amount of “mental genres”, or those which are created by the user. They may ask for “all Emmy Award winning movies between 2005 and 2010”. You get the point; the consumer is best served with answers to natural language search and then the call to action is to get that person to the content immediately.

Natural Language Voice Search and Control

The answer to the content search challenge is natural language voice search and control. That’s a mouthful, but basically, tell the TV what you want to watch and it guides you there from thousands of entry points. Two popular implementations exist today for voice search. There are others, like Dragon Naturally Speaking, but those are niche commercial plays.

Microsoft Kinect

Microsoft has done more more to enhance the living room than any other company including Apple, Roku, Boxee and Sony. Microsoft is a leader in IPTV and the innovation leader in entertainment game consoles. With Kinect, a user can use Bing to search and find content. It works well in specific circumstances and at certain points in the experience, but it needs a lot of improvement. Bing needs to find content anywhere in the menu structure, not just at the top level. It also needs to improve upon its ability to work well in a living room full of viewers. Its beam-forming is awesome but needs to get better to the point that it serves as a virtual remote.

Finally, it needs to support natural language search and the ability to narrow down the choices. I have full confidence that they will add these features, but a big question is the hardware. The hardware is seven years old. Software gymnastics and offloading some processing to the Kinect module has been brilliant, but at some point, hardware runs out of gas.

Apple Siri

While certainly not the first to bring voice command and dictation to phones, Apple was the first to bring natural language to the phone. The problem with the current Siri is that its not connected to an entertainment database, its logic isn’t there to narrow down choices, and it isn’t connected to a TV so that once you find what you are looking for you can immediately switch the TV.

As I wrote in September (before Apple 4s and Siri), Apple “could master controlling the TV’s content via voice primarily.” If Apple were to build a TV, they could hypothetically leverage iPhones, iPads, iPods to improve the voice results. While Kinect has a full microphone array and operates best at 6-8 feet, an iPhone microphone could be 6 inches away and would certainly help with the “who owns the remote” problem and with voice recognition. Even better would be if multiple iOS devices could leverage each others sensors. That would be powerful.

While I am skeptical in driving voice control and cognition from the cloud, Apple, if they built a TV, could do more local processing and increase the speed of results. Anyone who has ever used Siri extensively knows what I am talking about here. The first few times Siri for TV fails to bring back results or says “system unavailable”, it gets shelved and never gets used again by many in the household. Part of the the entertainment database needs to be local until the cloud can be 99% accurate.

What about Sony, Samsung, LG, and Toshiba?

I believe that all major CE manufacturers are working on advanced HCI techniques to control CE devices with voice and air gestures. The big question is, do they have the IP and time to “perfect” the interface before Apple and Microsoft dominate the space? There are two parts to natural language control, the “what did they say”, and the “what did they mean”. Apple licences the first part from Nuance but the back end is Siri. Competitors could license the Nuiance front end, but would need to buy or build the “what did they mean” part.

Now that HDTV sales are slowing down, it is even harder to differentiate between HDTVs. Consumers haven’t been willing to spend more for 3D but have been willing to spend more for LED and Smart TV. Once every HDTV is LED, 3D and “smart”, the key differentiator could become voice and air gestures. If Sony, Samsung, LG and Toshiba, aren’t prepared, their world could change dramatically and Microsoft and Apple could have the edge..

Why Apple Had To Release Siri Half-Baked

Siri has been having a bad week. Gizmodo’s Mat Honan called Apple’s voice-response service “a lie.” Daring Fireball’s John Gruber, who rarely has bad things to say about Apple efforts, said it “isn’t up to Apple’s usual level of fit and finish, not by a long shot.” And my colleague Patrick Moorhead tweeted that inconsistency was leading him to reduce his use of the service.

Hang in there, Pat, Siri needs you. I share the frustrations and annoyances of Siri users, but the only way she’s going to get better.

Here’s what I think is going on, with the usual caveat that Apple only shares its thinking with people it can legally keep from talking about it, leaving the rest of us free to speculate. Apple doesn’t much like public beta testing. Before a major release, Microsoft will typically make a new version of Windows or Office to tens of thousand of users for months, allowing developers to find and fix most of the bugs. Apple limits beta testing mostly to internal users and selected developers. It can get away with this because the real-world combinations of Mac or iOS hardware and software are orders of magnitude simpler than in the Windows world.

Siri is very different. The artificial intelligence engine behind the service lacks any inherent understanding of language. It has to be trained to make connections, to extract meaning from a semantic jumble. To even get to the databases and search tools Siri uses to answer question, it first must contract a query from the free-form natural language that humans begin mastering long before they can talk, but which machines find daunting. (See Danny Sullivan’s Search Engine Land post for an excellent analysis of Siri’s struggles with queries about abortion clinics.)

The secret to machine learning is feedback. I expect that Siri carefully logs every failed query, along with what the user does next. And algorithmic analysis of those logs, combined perhaps with some human intervention, means that every mistake contributes to the process of correction. In other words, Siri learns from its errors and the more people use it, the faster it will get better. Benoit Maison has a good explanation of how this works on his blog.

The server-based learning creates a very different situation from the troubled handwriting recognition that helped doom Apple’s Newton 15 years ago (and to which some critics have compared Siri’s troubles.) Newtons were products of a preconnected age, so there was no way for the community of MessagePads to learn from each other’s mistakes. And the extremely limited processing power memory on the Newton itself made the claim that it would learn from its errors an empty promise. The Newton could never get past “egg freckles.”

Now, all of this said, Apple’s approach to Siri is a distinct departure from its usual practice of under-promising and over-delivering. It properly labeled Siri as a “beta” product. But, at the same time, it is using the half-backed feature as a major selling point for the iPhone 4S, hitting it hard in commercials. This is a disservice to customers, who have learned to expect a high polish on Apple products, and has saddled Siri with unreasonably high expectations that now are inspiring a backlash. Apple had to release Siri prematurely to let the learning process go forward. Let’s hope that Apple did not do the service permanent damage with its hype.

When Siri Becomes A Member of the Family

Siri is much more than just a useful feature for Apple’s iPhone 4S. Siri is also incredibly strategic for Apple. I have written quite a bit on the subject of how software platforms become sticky. The point I continually emphasize is that we who study the industry need to understand “ecosystems” more than products. What I mean by that is that consumers, when they buy technology products, are moving from a product buying mentality to an ecosystem buying mentality – they just don’t know it yet.

Products by themselves are not sticky and have very little consumer loyalty. In terms of products, brand or lowest price is what keeps consumers coming back. But as we transition from personal to personalized computing, consumers will stay loyal to ecosystems more than simply brand or product, even though those play into the ecosystem. Think of these products as screens which allow consumers to tap into a rich ecosystem driven by software and services.

As I evaluate products, platforms, and companies’ strategies, I am looking for things that invite consumers into an ecosystem and then encourage ecosystem loyalty. This is essentially the root of differentiation going forward.

If we look at the platform as the basis for an ecosystem, then right now the companies with ecosystems are Apple, Microsoft, Google (with Android) and RIM. Some ecosystems are more fleshed-out than others, but as a baseline those are the four — for now.

The key to any of these companies’ long-term success is to continue developing innovations that keep consumers loyal to their ecosystem. When this happens, consumers are less likely to switch from one platform to another. For example, consumers who have invested time, money, and energy in Apple’s ecosystem are less likely to jump to Android for their next phone due to the high cost and inconvenience of switching.

This is why I think Siri is so incredibly strategic for Apple. Siri in my opinion is the first step in moving computing from personal to personalized — something that happens when your personal electronics learn and understand things about you without you having to personalize it yourself.

When you use Siri, even though it is in beta and in a very early stage of its life-cycle, you observe how it learns and remembers certain key things about you. Inevitably over time as Siri learns more about you and hits her groove as a true personal assistant, this feature will keep you loyal to Apple’s ecosystem.

ap_siri_womans_voice_nt_111024_wg — Credit: AP

Imagine if over the period of a year or two, Siri has developed into a true personal assistant adding value all the way through task automation, discovery of places and events based on personal preference, geo-location assistance and more. After all the time you have spent living with Siri, who is learning quite a bit about you in order to be valuable, would you really fire her and go buy a different smart phone just because it is cheaper?

I don’t know a single executive with a personal assistant (who they often consider as a member of their family) who would fire that assistant just because he or she can find another one who’s cheaper. Rather, when you find a good assistant you hold on to them and stay loyal.

In fact, ask any executive what they hate most about hiring a new assistant and they will tell you it is the initial training process.

The same will be true with Siri as the technology evolves and gets better and even more useful. The amount of hours put into training Siri to understand critical elements of your life, preferences, habits and more would require quite an undertaking and a headache to simply start over with another device, assuming another device has such a feature of course.

This is why Siri is strategic for Apple. Siri is another piece of the Apple ecosystem that will command consumer loyalty. This is why Apple competitors should be concerned. The more people Apple gets into their ecosystem, the less likely they will consider competitors’ products year in and year out.

Ecosystem loyalty will be the battleground of the future and companies who do not build a healthy ecosystem that drives consumer loyalty will be in for an uphill battle.

Why Google and Microsoft Hate Siri

As I watched Andy Rubin’s interview at the WSJ D Asia conference I became highly intrigued by the comments he made about Apple’s Siri. Rubin told Walt Mossberg “ I don’t believe your phone should be an assistant…Your phone is a tool for communicating,” he said, “You shouldn’t be communicating with the phone; you should be communicating with somebody on the other side of the phone.” (

Here is a link to the interview if you haven’t seen it.

And then Microsoft’s Andy Lees, when questioned about Siri said it “isn’t super useful.” At the same time, he noted that Windows Phone 7 has a degree of voice interactivity in the way it connects to Bing, and thus harnesses “the full power of the internet, rather than a certain subset.”

What are these two guys smoking? They both seem to miss the fact that Apple has just introduced voice as a major user interface and that its use of voice coupled with AI on a consumer product like the iPhone is going to change the way consumers think about man-machine interfaces in the future. I wrote about its impact on future UI’s last week and believe that it is just the start of something big.

I have two theories about their response. One is based on jealously and one that is future driven, based on what Siri really will become very soon and its ultimate threat to their businesses. The first has to do with the fact that both companies have had major voice UI technology in the works in their labs for a long time. In the case of Microsoft I was first shown some of their voice research back in 1992. In Google’s case people in the know have told me that they have had a similar project in development for over 7 years. And in both cases they are way–way behind Apple–especially in Siri’s AI capabilities and speech comprehension technology.

Interestingly, for even Apple it has taken a long time to get their voice technology working correctly. In fact, in the early 1990’s, I spent some time with Kaifu Li when he was at Apple working on a speech and voice recognition technology called Plain Talk. At the time, he was considered one of the major minds on this subject and when, after a short stint at Silicon Graphics, he joined Microsoft, one of his key projects was working on speech technology for them. Of course, if you know about Kaifu Li, you know that he left Microsoft to go to Google and was the subject of a major lawsuit between Microsoft and Google because Microsoft thought he would disclose to Google too much of what Microsoft was doing when he joined Google.

Microsoft and Google, especially since they had the mind of Kaifu Li working on various projects while he was at these companies, cannot be too pleased that Apple was the one to actually harness voice and speech comprehension ahead of them since both have been working on similar technologies for quite some time. You can bet that if they were the one’s announcing a breakthrough voice technology they would be touting it as loud as possible. Instead they are downplaying it and to be honest, making real fools of themselves and their companies in the process.

But the real reason these two companies hate Siri is because of what it will become in the very near future. In case you haven’t noticed it yet, Siri’s voice technology is actually a front to some major databases, such as Yelp, Wolfram Alpha and Siri’s own very broad database. But what it is really doing is serving as the entry point for searching these databases. So, I can ask Siri to find me the closest pizza joint and it quickly links me to Yelp, then to Google maps. On the surface this might look good for Google and Yelp since it ties them to these third-party sites that get the advertising revenue from this search. But what if Apple owned their own restaurant recommendation service and mapping system? They could divert all of these ad revenues to themselves. Here is an obvious prediction then if that is the case. How long do you think it is before Apple buys Yelp or Open Table and MapQuest or a similar available mapping service?

How about searching for autos? Ask Siri where the closest BMW dealers are. It comes back and shows you the three or four BMW dealers within a 25 mile radius on a Google Map. But what if it could also tie you to Edmund’s database and instantly give you ratings of their cars, and dealers running specials? Or perhaps you are looking for an apartment in Hoboken? Ask Siri about available apartments in Hoboken and someday it could perhaps link you to Apartment Finder and while they might not need to own this database, Apartment finder would be Siri’s preferred first site to “search” for apartments and Apple would get a share in ad revenue from these searches.

Indeed, it is pretty clear to me that Apple has just scratched the surface of the role Siri will play for them in driving future revenue. At the moment, we are enamored with its ability to enhance the man-machine interface. But that is just the start. Siri is actually on track to become the first point of entrance to “search” engines of all types tied to major databases throughout the world. And it will become the gatekeeper to all types of searches and in the end control what search engine it goes to for its answers.

For this to work for Apple, they need to start acquiring or at least developing tighter revenue related partnerships with existing databases for all types of products and services. And then make Google or Bing the search engine of last resort for Siri to use if can’t find it in its own or its partner’s databases at Apple’s disposal. Oh yeah, and tie all of these searches to their own ad engine and drive as much of Siri’s “search” to one’s they have a revenue share deal with or own.

Yes, Siri is an important product for enhancing our user interface with the iPhone. But Siri is in its infancy. When it grows up, it will be the front end to all types of searches conducted on iPhones, iPads, Mac’s and even Apple TV. And, if I were Google or Microsoft, perhaps I too would be playing down the impact of Siri since they know full well that it is not just a threat to their product platforms, but to their core businesses of search as well. In fact, they should be quaking in their boots since Apple is taking aim at their cash cow search businesses with their technology and could very well impact their fortunes dramatically in the future.

For Apple’s investors, the call for them to start paying dividends on their cash hoard is too short-sighted. Instead, they should be encouraging Apple to start buying up as many databases and services they can and begin the process of entrenching Siri’s role as the first line of offense when searching for a product and service and get the search ad revenue from this for themselves. I believe that if they do this, they could probably add another $3-$5 billion in quarterly revenue to their already healthy business model within three years, as search becomes another profit center for Apple.

So, don’t think of Siri as just a voice UI. Rather, think of it as the gatekeeper to natural language searching of diverse databases and search engines that Apple will link to an ad model that I believe will eventually make Apple the third major search company in the world someday.

The Era of Personal Computing

I have adopted a philosophy in my analysis over the past few years where I distinguish between personal computing and personalized computing.

In a post a few months ago, I wrote about these differences and pointed out that because of the differences in personal and personalized computing the Post PC Era will happen in two different stages.

The first stage is personalized computing. In this era, the one we are currently in, all of our personal computing devices are personalized by us. What I mean by this is we take the time to personalize the devices with our personal content, apps, preferences, interests, etc. In reality, however, how personal are these devices? They don’t actually know anything about us we just simply use them to get jobs done. We customize them and they contain our personal content but they really aren’t that personal.

However in this next phase, the era of personal computing, things may actually get very interesting. In this era our devices will actually start to learn things about us and in the process become truly personal. Our most personal devices will learn our interests, schedule, preferences, habits, personality, etc. I know it sounds a bit scary but that is where we will inevitably end up.

I believe Apple’s latest feature–Siri–demonstrates this future reality of personal computing. As Tim pointed out in his article yesterday, Siri and the underlying artificial intelligence engine, will learn key things about our unique tastes, interests, and more and over time become even more useful as a personal assistant.

What is absolutely central for this personal computing era to become reality is we have to allow our devices to get to know us. Perhaps more specifically we have to trust our devices or the underlying company providing us the personal computing experience.

John Gruber points this very point out in a post with some comments from Ed Wrenbeck, former lead developer of Siri.

In an interview with VectorForm Labs Ed Wrenbeck states:

“For Siri to be really effective, it has to learn a great deal about the user. If it knows where you work and where you live and what kind of places you like to go, it can really start to tailor itself as it becomes an expert on you. This requires a great deal of trust in the institution collecting this data. Siri didn’t have this, but Apple has earned a very high level of trust from its customers.”

In the era of personal computing we will get beyond personalizing our devices and instead enter the era where they truly become personal to us because of their ability to know, learn, and be trained about who we are and our unique interests and needs.

There are many great examples of this in Sci-Fi movies and novels but perhaps my favorite, because it is fresh, is how Tony Stark interacted with Jarvis in the Iron Man movies. Jarvis is what Tony Stark named his personal computer and as you can tell from his interactions in the movie, Jarvis knew quite a bit of the intimate details of Tony Stark.

Jarvis was a personal computer, one that took on an entirely new way to be useful because of the artificial intelligence that was built on top of incredible computing power.

Of course, this all sounds extremely futuristic but it will be the basis of what takes us from having to manually personalize our devices, to a future where our devices truly become personal and indispensable parts of our lives.

Why Siri is Strategic for Apple

Now that I have had some time to work with the new iPhone, and especially the new Siri Voice technology, I have been able to form a couple of opinions about this products market impact.

As I mentioned in a previous post, from a big picture stand point, Apple’s use of voice and speech as a form of input marks the third time Apple has influenced the market when it comes to UI design and navigation. The first time they did it with the mouse and its integration into the Mac, and then with touch by making it the key input for the iPhone. Now comes voice, which I believe will usher in the era of voice input and will start to dramatically impact the future of man-machine interface.

While voice input is a significant part of Siri’s feature set within the new iPhone 4S, it is its AI and speech comprehension technology that really makes it unique. More importantly, the more I use it the more it gets to know who I am, where I live, what I like, who I am related to and the more info it gets on me, the better it gets as well. For example, with in a few searches for Italian restaurants it now knows that this is a type of ethnic cuisine I like and remembers that. So, the next time ask it to find me an Italian restaurant, it becomes more accurate in its recommendations. It now knows my home address and office address and I can give it commands that play off these locations. For example, I can say,“remind me to call my wife when I get to the office” and as I walk into the door of my office complex it reminds me to call her.

There are hundreds of ways that, once it begins to learn more about me, it can be quite useful and helpful. And as Apple has said, they will continue to link it to more powerful databases over time, giving it even greater reach to the information that I might need in my daily life. That linked with its continuing ability to learn about me makes Siri perhaps the stickiest application I have ever used. In the short time I have used it, it has become almost indispensable in a couple of areas.

First, I now mostly speak my tweets and messages instead of typing them in. Second, I use it to input short emails as well. Having the Siri microphone integrated into the keyboard makes it so simple to use and this is now my first line for data entry.

But the third way I use it is related to my business. As a market researcher, I have to do a lot of percentage comparisons when I look at various numbers. Over the years I have become pretty good at working out this math in my mind, but this method is not very precise. I normally come within one-to-three points of the correct answer and in a lot of cases that may be all I need for our predictions since these are based on known data and are informed projections. And in the past if I wanted precise percentages I would bring out the old calculator. But now when I want this number I just ask Siri and she does not guess. Her answers are always exact–and fast.

The other thing it does extremely well is deal with appointments. I just tell it to schedule an appointment and it is done. And if there is a conflict it tells me that as well. Think of it as a smart personal assistant.

BTW, this is not Apple’s first stab at this voice, speech AI concept. In fact, they pretty highlighted it in their Knowledge Navigator multimedia video they did in 1989. In this video it shows a professor interacting with a computer asking it questions and getting direct answers from it in ways that Siri does now. Ironically, this video and futuristic thinking was the brainchild of former CEO John Scully and former Apple Fellow Alan Kay, one of the most futuristic thinkers we have in the world today. But at the time, the technology was not there to do what was projected in the Knowledge Navigator. Even more impressive is the fact that while the Knowledge Navigator was apparently connected to a very large computer, Siri is being done in a pocket computer.

Now, as Siri develops a strong database about me and my likes and dislikes, it is quickly becoming indispensable as a mobile assistant. I suspect that the more Siri and I become closer and it gets to know me better, I am going to be highly unlikely to use something else by another platform. Thus, the stickiness. Something that makes it very likely that I will stay within the Apple ecosystem as long as they continue to innovate and make Siri smarter and even more useful.

Nuance Exec on iPhone 4S, Siri, and the Future of Speech

Though the iPhone 4S appears nearly identical to the current iPhone 4, it is, as my colleague Tim Bajarin points out a revolutionary device because of its voice-based Siri interface. For the past 20 years, we humans have learned to point and click, but this has never been a natural way to interact with our environment. Touch and speech, on the other hand, have been around since we were living in caves.

Photo of Vald Sejnoha — Nuance CTO Vlad Sejnoha

“Speech is no longer an add-on,” says Vladimir Sejnoha, chief technical officer of Nuance, probably the world’s leading speech technology company. “It is a fundamental building block when designing the next generation of user interfaces.”

Sejnoha is faithful to the code of omerta that Apple imposes on its vendors. Although Nuance has supplied technology both to Apple and to Siri before its 2010 acquisition by Apple, he declined to discuss Nuance’s role in the iPhone 4S: “We have a great relationship with Apple. We license technology to them for a number of products. I am not able to go into greater detail. But we are very excited by what they have done. It’s a huge validation of the maturity of the speech market.”

But Sejnoha made no effort to hide his enthusiasm for the Siri approach. “It allows you to find functionality or content that is not even visible,” he says. “It provides a new dimension to smartphone interfaces, which have been sophisticated but shrunken-down desktop metaphors.”

It’s has been a long, hard slog for speech to become a core user interface technology. It took a good thirty years, from the late 60s to the late 90s for speech recognition—the ability to turn spoken words into text—to become practical. “Speech recognition is not completely solved,” says Sejnoha. “We have made great strides over the generations and the environment has changed in our favor. We now have connected systems that can send data through the clouds and update the speech models on devices.”

Recognition alone is a necessary but hardly sufficient tool for building a speech interface. For years, speech input systems have let users do little—sometimes nothing—more than speak menu commands. This made speech very useful in situations were hands-free operation was desirable or necessary, but left speech as a poor second choice where point-and-click or touch controls were available.

The big change embodied by Siri is the marriage of speech recognition with advanced natural language processing. The artificial intelligence, which required both advances in the underlying algorithms and leaps in processing power both on mobile devices and the servers that share the workload, allows software to understand not just words but the intentions behind them. “Set up an appointment with Scott Forstall for 3 pm next Wednesday” requires a program to integrate calendar, contact list, and email apps, create and send and invitation, and come back with an appropriate spoken response.

Sejnoha sees Siri in the iPhone as just a beginning. “Lots of handset OEMs are working on it,” he says. “There is a deep need for differentiation in Andoid and Apple will only light a fire under that. Our model is to work closely with customers and build unique systems tailored to their visions.” And while a speech interface can drive search, it can also become an alternative to it: “One consequence of using natural language in the user interface is direct access to information. We can figure out what you are looking for and take you directly there. You don’t always have to go through a traditional search portal. It will change some business models.”

Nor do the opportunities stop at handsets. “Speech is a big theme for in-car apps because that is a hands busy, eyes busy environment,” Sejnoha says. “All the automotive OEMs are working on next-generation connected systems. The industry is undergoing revolutionary change.”

The health care market is another hot spot. “Natural language is taking center stage in health care,” Sejnoha says. “We are mining data and using the results to populate electronic health records.” Nuance recently signed a deal with IBM to provide technology for a speech front-end to the health care implementation of its Watson question-answering system.

The key to the next breakthroughs in speech technology, Sejnoha says, is making effective use of the vast amount of speech data that now exists, a challenge that has also attracted Nuance competitors Google and Microsoft. “Most algorithms use machine learning and are very data-hungry,” he says. “No one knows yet what to do with tens of thousands of hours of speech data. The race to do that is one. We are doing fundamental research and have a relationship with IBM Research as well. It requires a broad array of techniques to model speech in a robust way and to learn the long tail statistically and the build techniques that can benefit from large amounts of data. It’s a very exciting time.”

Why We Witnessed History at the iPhone 4S Launch

While some people were disappointed that Apple did not introduce the iPhone 5, most pretty much missed the significance of the event and the fact that they were witnessing history.

In 1984, when Steve Jobs introduced the Mac, he did something quite historic. He introduced the Mac’s graphical user interface. But he actually topped himself with the introduction of another technology-the mouse. In essence, he introduced the next user input device that has been at the heart of personal computing for nearly two decades.

What’s interesting about this is that he did not invent the GUI. That came from Xerox Parc. And he did not invent the mouse. Douglas Engelbart invented the mouse. But by marrying them to his OS he reinvented the GUI and OS and gave us a completely new way to deliver the man-machine interface through the mouse. Until that time all computer input was done by textual typing.

Then, in 2007, with the introduction of the iPhone, Jobs and team did it again. He created the touch user interface and this time married it to his iOS. He did not invent touch computing. That technology has been around for 20 years via pen input or minimally within desktop touch UI’s such as those used in HP’s Touchsmart desktops. But he integrated it within iOS and gave the world a completely new way to interact with small, handheld computers. With the new touch gestures part of their laptop trackpad designs, they have even extended it to their core Mac portable computing platform as well. In essence, Jobs second UI act was to bring touch UI’s to mainstream computing.

Now, with the introduction of SIRI, integrated into iOS and a core part of the new iPhone OS, he and the Apple team have given to the world what we will look back on and realize is the next major user input technology-Voice and Speech. As reader Hari Seldon points out, the real breakthrough we will come to realize is in Siri’s “applied artificial intelligence.” It is its speech comprehension that will be its greatest advancement.

Again, he did not invent this technology. But Apple’s genius is to keep trying to make the man-machine interface easier to use and with each form, be it the mouse, touch, or voice, Apple has been the main company to popularize these new inputs and thus help advance the overall way man communicates with machines.

I have personally witnessed all three of these historical technology introductions. When the Mac was introduced in 1984, I was sitting third row center at the Foothill Community College’s auditorium. Then in 2007, I was at Moscone West, fourth row Center when Jobs and team introduced the iPhone with its touch UI. And most recently, I was at their campus auditorium, Building 4 of Infinite Loop, 5th row center, when Tim Cook and his team introduced the iPhone 4S and the new Siri Voice and Speech interface, making this their third major contribution to the advancements of computer input. (I make a habit of remembering exactly where I am when I watch history being made.)

Now here is another interesting point. Although Apple has had this touch UI in place and integrated in to iOS since 2007 and the Mac OS X since last year, only now is the Windows world starting to get serious about integrating touch into their phone and computer operating systems. Although Apple will continue to advance their various touch UI’s, they can rightfully say-been, there, done, that.

It is time to take it up a notch and for them their next user input mountain to scale will be the use of voice and speech as part of their future man-machine interface. It may start with iOS but like touch, I expect this UI to be in the Mac in short time as well.

Yes folks, for those of us at the iPhone 4S launch we witnessed history being made. Unfortunately, for a lot of people in at that event, they missed it.

[thumbsup group_id=”3294″ display=”both” orderby=”date” order=”ASC” show_group_title=”0″ show_group_desc=”0″ show_item_desc=”0″ show_item_title=”1″ ]