Quote:
Originally Posted by
muppetry 
But I think the point is that the speech recognition and intent interpretation is local, and the online transactions are just the same ones that you might execute if you were doing it yourself (information lookup, email, text etc.), only maybe done more effectively (in terms of query construction etc.).
I am curious about those claimed efficiency gains though. If they are proprietary transactions with Apple servers, then you could imagine more efficient protocols than http, but if they are dealing with 3rd party web servers then they have no other choices.
I suspect it is similar to the Amazon Fire's Silk browser implementation:
1) A single, efficient, relatively low-speed connection from the device to the company's proprietary servers -- as opposed to multiple, inefficient low-speed connections to target servers.
2) The single request can minimize HTTP and XML overhead in the transmitted packet -- it does not need to be human-readable or conform to any, but the most basic TCP/IP protocols.
A single request to a target server usually involves multiple connections -- to download JavaScript scripts, CSS definitions, images, ad banners, animations, Flash, Flash Content, etc. Say, each image results in a single request-response connection that takes 1/2 second or more to turn around -- over and above the time to download the requested image. A web page with 10 images can easily waste several seconds in turning around request/response connections.
3) The company's servers communicate with the target servers via high-speed/bandwidth backbones.
4) The company's servers can cache frequently-requested data and avoid many requests of target servers.
5) The company's servers can aggregate the information from its caches and from requests to the target servers, then create and send an efficient data packet to the device. Again, there is no need for XML or HTML overhead.
So the device and the company's servers have a single, very efficient request/response connection.
The company's servers do all the heavy lifting (as efficiently as possible) gathering, caching and aggregating data from multiple target servers.
The net result to the user is that sometimes Siri takes a moment, or so, longer to do what you ask. But it is not significant enough to degrade the UX.
In fact, Siri usually warns/notifies you of the extra effort: "Let me think about that", "I think I have an answer for you", etc.
You could say that Siri, Silk and the like give the user the "Best of the web" (speed and content) and eliminates the "Worst of the Web" (slowness, ads, distractions, click-bait, etc.).
It will be interesting to see how monitization of the web changes as a result of this change.
And, I suspect that people, who wanted to, could ask Siri: "Show me the New York Times web page" (in all its current glory).