You are correct. It's not the device but rather the solution or experience. The question is whether millions of people really want that experience. It's a big and unverified question. I'm skeptical.
The question of whether millions of people want an immersive virtual experience is answered by millions of people already owning VR headsets. What millions of people don't want to do is spend $3500 on one (Microsoft HoloLens is this price) but millions are happy to buy <$500 VR headsets.
That's not the only question to be answered though. Will millions more? Are they using these things or did they just buy them, use them a few times, and then toss them in a drawer?
What I'm trying to get to here is whether there's a viable long-term market for Apple Vision to play in.
P.S. What's more is whether Apple's vision for Apple Vision—"Spatial Computing"—is the thing people want to do with such devices. Is that the problem people want to solve? Games? Sure. Entertainment? To some degree, yes. Computing like that? For some, sure. For millions? Doubtful.
Most VR users aren't using the hardware much, there isn't much VR software available and they aren't designed for long-term use cases like movies and browsing. Apple Vision Pro is designed for long-term use cases.
The appeal of a personal cinema alone is enough for millions of sales (at a reasonable price ~$1000). Standard VR headsets aren't all that great for movies as the resolution isn't so high, they are bulky to wear and the UI is clunky.
There's some recent 3D technology that allows capturing volumetric photographic scenes (best performance on Chrome, native performance would be multiple times better due to browser overhead):
Moving the mouse around on the Swiss Lake scene (right/middle-click to pan), it's clear that experiencing this kind of volumetric photo will only be worthwhile using a VR headset and movies can play like this. It can capture people:
There are a bunch of videos showing people trying AR glasses for long periods of time, they are mostly sponsored videos but they give some insight into the use cases:
At 4:25, the glasses are being used like a phone. The resolution of current low cost VR/AR hardware isn't good enough for text or productivity and it still needs hardware input like keyboard and the software integration of low-end providers is poor. Apple's setup fixed these issues.
The use cases are compelling for the hardware both long-term and for reasonably high unit volume but there's no way it can appeal to millions of buyers at a $3500 price point with the bulkiness of the hardware just now. It's a bit of a stretch to have a couple in a living room spending $7k+ to look like the image below vs buying a ~80" OLED TV for under $3k ( https://www.bhphotovideo.com/c/product/1757501-REG/lg_oled77c3pua_c3_77_4k_hdr.html )
Once the form factor is less bulky and the price is lower, lots of people will have these kind of devices. It's not going to be iPhone level unit volume but it will be entertainment level unit volume like a games console or TV in the tens of millions when the form factor is right. While people will primarily use them for entertainment, that's the same as with a smartphone.
When Apple introduced the iPhone, they listed different user inputs - mouse, click-wheel/iPod and multi-touch. Although multi-touch input can be productive, iPhones and iPads are primarily used for entertainment. The same will be true here with Spatial Computing. It's just an intuitive way of controlling virtual environments, which other people haven't done properly so far. I also don't see why someone wouldn't control their Mac with the headset, it's just showing a display. In the keynote, the example with the 3 huge virtual monitors looks like a viable setup:
I don't think most desktop/laptop users will do this but this is better than buying multiple high-end monitors and they can be huge. Apps like photoshop are always cramped with the UI stuck away in tabs. With this, the app can expand all around the space. The end-goal of a compact, immersive wearable will be much better than the early versions. Right now, this isn't iPhone 1 level, this is like what the early iPods were.
It took around 15 years to reach iPhone X:
Apple Vision Pro and all VR headsets are at the beginning of a roadmap that will need a few years to reach something great. It will likely move faster than the decade of mobile improvements but it will take a little while.
Comments
The appeal of a personal cinema alone is enough for millions of sales (at a reasonable price ~$1000). Standard VR headsets aren't all that great for movies as the resolution isn't so high, they are bulky to wear and the UI is clunky.
There's some recent 3D technology that allows capturing volumetric photographic scenes (best performance on Chrome, native performance would be multiple times better due to browser overhead):
https://gsplat.tech
https://gsplat.tech/hillside-swisslake/ (90MB)
Moving the mouse around on the Swiss Lake scene (right/middle-click to pan), it's clear that experiencing this kind of volumetric photo will only be worthwhile using a VR headset and movies can play like this. It can capture people:
There are a bunch of videos showing people trying AR glasses for long periods of time, they are mostly sponsored videos but they give some insight into the use cases:
At 4:25, the glasses are being used like a phone. The resolution of current low cost VR/AR hardware isn't good enough for text or productivity and it still needs hardware input like keyboard and the software integration of low-end providers is poor. Apple's setup fixed these issues.
The use cases are compelling for the hardware both long-term and for reasonably high unit volume but there's no way it can appeal to millions of buyers at a $3500 price point with the bulkiness of the hardware just now. It's a bit of a stretch to have a couple in a living room spending $7k+ to look like the image below vs buying a ~80" OLED TV for under $3k ( https://www.bhphotovideo.com/c/product/1757501-REG/lg_oled77c3pua_c3_77_4k_hdr.html )
Once the form factor is less bulky and the price is lower, lots of people will have these kind of devices. It's not going to be iPhone level unit volume but it will be entertainment level unit volume like a games console or TV in the tens of millions when the form factor is right. While people will primarily use them for entertainment, that's the same as with a smartphone.
When Apple introduced the iPhone, they listed different user inputs - mouse, click-wheel/iPod and multi-touch. Although multi-touch input can be productive, iPhones and iPads are primarily used for entertainment. The same will be true here with Spatial Computing. It's just an intuitive way of controlling virtual environments, which other people haven't done properly so far. I also don't see why someone wouldn't control their Mac with the headset, it's just showing a display. In the keynote, the example with the 3 huge virtual monitors looks like a viable setup:
https://www.youtube.com/watch?v=GYkq9Rgoj8E&t=7253s
I don't think most desktop/laptop users will do this but this is better than buying multiple high-end monitors and they can be huge. Apps like photoshop are always cramped with the UI stuck away in tabs. With this, the app can expand all around the space. The end-goal of a compact, immersive wearable will be much better than the early versions. Right now, this isn't iPhone 1 level, this is like what the early iPods were.
It took around 15 years to reach iPhone X:
Apple Vision Pro and all VR headsets are at the beginning of a roadmap that will need a few years to reach something great. It will likely move faster than the decade of mobile improvements but it will take a little while.