As long-time readers will know, I’ve long been a fan of Siri. As I’ve often noted, it’s my primary means of interacting with my iPhone (part of the reason I don’t need a larger screen). I dictate most of my messages, and if it’s possible to ask Siri to do something for me rather than doing it myself, I do.
But Siri does have one major failing: it has no access to third-party apps. There are countless apps where I’d love to be able to get Siri to do the heavy lifting, as I wrote last year in a Feature Request:
What I can’t yet do is ask the time of my next train home, despite having an app on my phone that can answer that question. I can’t ask it to show me today’s Timehop, nor can I ask it to post that to Facebook. I can’t ask it to post something to a Hipchat or Slack chatroom. I can’t ask it to call an Uber car. I can’t ask it to translate ‘Where is the nearest pharmacy’ into Mandarin. I could name many other examples, but you get the idea.
If Apple offered an API to allow third-party developers to take advantage of Siri, I’m confident that many would do so. And I’m certainly not alone in wanting that – in our poll, 95% of you agreed with me.
But it turns out that Siri’s original developers wanted to take things a step further …
Rather than simply ask Siri to call on third-party apps to carry out tasks, they wanted to cut out the middleman and integrate directly with the underlying services themselves. For example, in telling Siri that you wanted a car to collect six of you from your office, it would talk directly to Uber’s servers to make the booking for you. The team’s goal was ‘to reinvent mobile commerce itself.’
Apple, for whatever reason, disagreed. At the time it bought Siri, it had the team strip out support for all the third-party apps originally integrated with the service – some 45 in total – and launch without them. Since then, the team found itself increasingly frustrated at the growing gulf between its ambitions for the service and the far more modest capabilities Apple allowed it to introduce.
The result was that a full third of the team left Apple to create a brand new intelligent assistant that would do all of the things they weren’t allowed to do with Siri: Viv. Yesterday, we got our first look at the result to date – and it’s incredibly impressive. If you haven’t yet watched the video, I highly recommend doing so.
What most distinguishes Viv from Siri is that all of Siri’s queries and responses are hard-coded. Someone had to sit down and think of all the different questions Siri might be asked, and all the different phrasings that might be used, and provide a response to each of them.
Sure, Siri gives the impression of being a little more clever than that, because it often has multiple responses for the same query, so it feels more human-like and less robotic, but at its heart it’s a simple database of queries and responses.
Some of those responses of course contain variables. If you ask Siri whether you’ll need an umbrella in London tonight (you will, by the way), Siri queries a weather database to determine the answer. But there’s relatively limited intelligence in the way it works. This is why Siri co-founders Dag Kittlaus and Adam Cheyer refer to Apple’s implementation of Siri rather dismissively as merely ‘a clever AI chatbot.’
What Viv does is far more sophisticated.
Viv starts by trying to determine the intention of your request. It parses all the different elements of the query and reduces it down to something it can understand. For example, in one convoluted example in the video, Kittlaus asked Viv ‘Will it be warmer than 70-degrees near the Golden Gate Bridge after 5pm the day after tomorrow?’
I just tested that on Siri, and all it could do was a web search for my question. Viv, in contrast, broke down the query into it’s component parts. It identified that ‘warmer than 70 degrees’ was a weather question. It was able to locate ‘near the Golden Gate Bridge’ as a place. It knew that ‘the day after tomorrow’ was (when the query was made) May 11th. And it knew that a specific time meant that it would require hour-by-hour forecasts. It very quickly answered the question.
But it’s how Viv answers the question that’s the amazing part. Rather than consult a database, Viv generates code. That code is the software that has the capability to answer the question. And it does all of that – parse the query, write the code, run it and deliver the answer – every bit as quickly as Siri.
Of course, Kittlaus knew what questions Viv could and couldn’t answer, but he had sufficient confidence that the entire demo was live. There were no recorded responses.
Kittlaus and his team have been working on this for a while now. Wired got a heads-up back in 2014.
Viv breaks through those constraints by generating its own code on the fly, no programmers required. Take a complicated command like “Give me a flight to Dallas with a seat that Shaq could fit in.” Viv will parse the sentence and then it will perform its best trick: automatically generating a quick, efficient program to link third-party sources of information together—say, Kayak, SeatGuru, and the NBA media guide—so it can identify available flights with lots of legroom. And it can do all of this in a fraction of a second.
I remember reading back then the half-joking example of the ultimate goal of a truly intelligent assistant.
Kittlaus says the end result will be a digital assistant who knows what you want before you ask for it. He envisions someone unsteadily holding a phone to his mouth outside a dive bar at 2 am and saying, “I’m drunk.” Without any elaboration, Viv would contact the user’s preferred car service, dispatch it to the address where he’s half passed out, and direct the driver to take him home. No further consciousness required.
The example is amusing, but that’s a powerful expression of how a truly personal intelligent assistant ought to work. It seamlessly and effortlessly combines things it know about us – where we are, where we live, which car service we normally use – to carry out a task with only minimal prompting.
It also means that the type of top-level task I envisaged in my earlier piece appears eminently feasible.
Hey Siri, arrange lunch with Sam next week
Working – I’ll get back to you shortly …
Ok, I arranged lunch with Sam for 1pm next Wednesday at Bistro Union at Clapham Park
That’s using what it knows about me, what it knows about Sam, and access (with permission) to Sam’s calendar at a busy/free and location level to figure out all the details that might otherwise have taken the two of us ten minutes of proposal and counter-proposal.
Why Apple wanted to turn down this kind of power defeats me. Perhaps it’s Apple’s penchant for control. If you use a database of queries and responses, then you have complete control over everything Siri says and does. Viv’s open-ended ‘solve it on the fly’ approach is less predictable, and perhaps, from Apple’s perspective, too risky.
But one thing I do know: loyal as I have been to Siri so far, if Viv really does live up to the hype, then the moment it becomes available on my iPhone, Siri will be history.
Where do you stand on this? Take our poll, and please share your thoughts – and especially examples of the types of queries you’d love to see Viv handle – in the comments.