I'm not sure you grasped the real point of what he was saying.
 You're flipping between talking about Apple and Apple's core OS development team.
Sure, but the speak to text part could feasibly be on-device, and then the comprehension engine could deal with local actions, such as "play music", "send an iMessage", "open app" etc, without recourse to the internet.
Aside from the fact that you've just made up a bunch of functionality to justify the conclusion, none of those things make the watch strap key.  And beyond that, the watch strap being a pliable piece of material that would take some punishment means that Apple shouldn't embed critical components in it or make it a necessary part of the watch structure, it introduces vulnerability, as well as the truth that Apple just aren't very good at that sort of thing, their cables...
In previous previews the dark mode affected in-app controls.  It was hidden at the time, but set an expectation. http://www.cultofmac.com/282685/enabling-yosemites-hidden-dark-mode-feature/
Are you suggesting that when Apple ship a preview with "no known issues" that their Radar backlog is totally clear?  Because that's demonstrably untrue. Many developers have many issues with Apple's responsiveness to bug reporting.
Does Apple do this for TV shows bought through iTunes?
Possibly, but a phone's battery is part of its function, whereas a watch strap is both aesthetic and also dictates its comfort.  I think making the strap non-replaceable would be seen as a significant failing of any such product.
I'm not sure I understand the top row of the table.   Google Now: 93% correctly heard * 86% answered correctly when correctly heard = 79.98% correctly answered overall   Apple Siri: 96% correctly heard * 84% correctly answered when correctly heard = 80.64% correctly answered overall   So Siri, despite being a bit flaky on answering questions correctly, still wins overall.  And it's a bigger advantage to Siri if you only use the un controlled environment...
Think the Google Now Report Card needs an error correction - Heard Incorrectly in a Controlled Environment 100% of the time?!
