Canon: No camera can truly capture video for Apple Vision Pro

AppleInsider · March 3, 2024 8:16PM

Canon and other camera companies have been exploring 3D, VR, and AR for a while now, but the Apple Vision Pro represents a very different challenge.

Apple Vision Pro

Executives from camera company Canon see a new business opportunity and potential market for a camera system that can create immersive video content for Apple's Vision Pro. At present, however, none of their cameras can yet handle the resolution and refresh rate Apple's headset requires.

Speaking to camera site Petapixel at the CP+ camera show in Yokohama, Japan last week, Canon officials believe they already have part of the puzzle -- a 5.2mm f/2.8 L lens that is designed specifically for producing VR content. The challenge is that the company doesn't yet have a camera with the refresh speed needed to match the Vision Pro's high-resolution screens.

Some of the immersive environments Apple has supplied already for the Apple Vision Pro have moving elements in them, but are believed to consist of a mix of computer-generated high-resolution static images and what was likely an 8K video system from camera maker RED.

Other companies would like to produce a camera system that could create images capturing real-world environments at the Vision Pro's resolution and refresh rate without resorting to computer graphics. They foresee market demand for tools that can quickly create such environments.

Canon's Yasuhiko Shiomi believes this would require a camera capable of a "100-megapixel resolution at 60 frames per second." It's the refresh rate that is currently complex to get to in combination with a resolution that high. It would amount to 14K video, a 3.5 times improvement over the current 4K standard.

There does exist one system already out there that can produce video to these specs -- the Sphere in Las Vegas has a "Big Sky" camera which is, in fact, an 18K video system. However, it costs millions of dollars, and requires 12 people to operate it, making it impractical for the emerging VR production market.

"At the moment, Canon can't cater to that level of a requirement," Shiomi said. However, Senior Managing Executive Officer and Deputy Head of the Imaging Group at Canon, Go Takura, noted that "technically, theoretically, we can do that."

"The problem is whether we can come up with the products that can be commercially viable and a price can be affordable enough for the customers to be able to buy them," Takura added.

Canon does already have a sensor capable of 100 megapixel resolution, but at present it cannot reach the required 60 frames per second.

"We are polishing our technology so that we can provide the high resolution for the VR purposes," Shiomi said. "So we will continue trying to improve our technology so that we can improve both resolution and speed with a good balance."

Read on AppleInsider

Kierkegaarden · March 3, 2024 8:28PM

Thanks for reporting on this. I’ve been watching to see what Canon will be doing in this space, as they seem to have a strong interest in producing something for it. And it makes me wonder if the other companies see the same opportunities, and what they are working on — better yet, does Apple have plans to release a standalone camera system focused on Spatial Video?

chasm · March 3, 2024 8:50PM

Technically, the Apple Vision Pro (and the iPhone 15 line) are themselves Spatial Video cameras -- but what Canon is talking about here is a camera system for 3D immersive VR video, which is a considerable (from a resolution and refresh standpoint) a considerable step up from Spatial Video.

I wouldn't be surprised if Canon or other camera companies come out with handheld camera models that can do Spatial Video in line with Apple's guidelines by, let's say, the holiday season.

netrox · March 4, 2024 4:40AM

I am not understanding. Where does Canon get the impression that 100MP at 60fps is required for AVP? AVP uses 4K each and there's nothing that suggests that the video has to be that resolution/refresh rate.

What am I missing?

HeyJP23 · March 4, 2024 5:42AM

yeah, I’m missing something here too. The canon virtual 3-D lens they talk about when coupled with an R5 will do four megapixels for each eye at 30 frames for second. What happens if you show 30 frames per second on Apple vision that’s rendering it 60? Seems like it would still just be fine.

damonbpoole · March 4, 2024 2:43PM

netrox said:

I am not understanding. Where does Canon get the impression that 100MP at 60fps is required for AVP? AVP uses 4K each and there's nothing that suggests that the video has to be that resolution/refresh rate.

What am I missing?

There are for sure already many cameras that will produce video that can be played on AVP. The article was not super clear on what they meant, but there is one snippet that sort of explains what they mean: "Some of the immersive environments Apple has supplied already for the Apple Vision Pro have moving elements in them, but are believed to consist of a mix of computer-generated high-resolution static images and what was likely an 8K video system from camera maker RED."

If you just put the AVP on and look around, the environments, for instance the one with the mountains and a lake, look super realistic, almost like you are actually there. To produce that same feeling with a camera means having a 360 degree view at that same resolution and refresh speed. The lowest refresh rate is 90hz. Now consider that in order to produce the sphere that is required to have not only 360 degrees around but also looking up and down, how many 4k screens would that be? My estimate, based on how much space spatial videos currently take up in your view, is something like 10 give or take 2. And since you need one for each eye that's about 20 screens or 80K. 80K at 90FPS is an incredible amount of storage/bitrate.

For comparison, the best out there right now are the 8K 360 videos, but IMHO they look low-resolution when spread across the full immersive sphere required for looking everywhere. 16k feels like it would be at the acceptable level. So, I think it will be years if not decades before we get a video format and cameras that can support what the AVP is capable of supporting, but having 16K would , I think, be pretty amazing.

Cheers,
Damon

7omr · March 4, 2024 2:48PM

The Canon dual fisheye on an EOS R5C produces two image on an 8k frame. The two images render to a single image which is half of 8k. This suggests two synced 8k cameras could work and that it doesn’t all have to occur on a single sensor as is suggested in that statement.

Regarding 60fps, my understanding is that there is meaningful evidence and experience to show that 60fps is a better experience in VR.

sloaah · March 4, 2024 3:59PM

yeah, I’m missing something here too. The canon virtual 3-D lens they talk about when coupled with an R5 will do four megapixels for each eye at 30 frames for second. What happens if you show 30 frames per second on Apple vision that’s rendering it 60? Seems like it would still just be fine.

I'm a filmmaker and have worked in VR in the past, so I can give some insight.

The reason why the resolution is apparently so high is because this is for 180VR films. The videos occupy half of a sphere (180º). Though the Apple displays are 3.6k horizontally, that's at roughly 105º FoV; so 3.6k/105*180 = 6.2k resolution per eye.

If you're recording both frames on one sensor, which is how it's done on the Canon Dual Fisheye lens (and which is the easiest way to keep the lenses to an inter-pupillary distance of 60mm (roughly the distance between our eyes), then you need a resolution of 12.4k (Horizontal) x 6.2k (Vertical) = 77MP. There is also some resolution loss given the fact that the fisheyes are not projecting onto the full sensor – they are project just two circles side by side on a rectangular sensor – so I would imagine 100MP would be roughly right to retain resolution across the scene.

As to frame rate. Cinema is 23.98fps and 180º shutter, which means that the shutter is actually closed half the time and open the other half of the time. It leads to a certain strobing which subconsciously we associate with the world of cinema. Nobody really knows why this is so powerful, but maybe it helps remove us a bit from the scene so our brains associate it more as something we're observing rather than us being part of. Tbh I'm not really sure.

But with immersive video, we want to do the opposite. Rather than emphasise detachment, we want to emphasise immersion. And so we want to shoot at a frame-rate which is roughly at the upper end of what the human eye can discern, removing as much strobing as possible. That means roughly 60fps. The fact that there are two frames being shown, one for each eye, doesn't alter this equation. It still needs to be 60fps per eye.

The Canon dual fisheye on an EOS R5C produces two image on an 8k frame. The two images render to a single image which is half of 8k. This suggests two synced 8k cameras could work and that it doesn’t all have to occur on a single sensor as is suggested in that statement.

That is true, but it is difficult to get the lens spacing to match the 60mm inter-pupillary distance that I mentioned. If you remain constrained to this distance, then a single sensor is the most effective way to achieve this, because you don't have any dead space between the sensors and thus you can maximise sensor size. It can also ensure that you don't have any sync drifting between left and right eyes, which can be a tricky problem to solve.

In theory you could presumably also create some sort of periscope system so that the two sensors can be entirely detached; but I imagine this would be very costly.

Looking at the BTS shots of the Apple cameras, they interestingly don't follow this inter-pupillary distance rule. Nor does the iPhone 15 Pro for that matter. The Vision Pro isn't available in my region, so I haven't had a chance to see what these spatial videos look like, but I wonder if there is some interesting computational work happening to correct for this. That sort of computational photography work – which essentially repositions the lenses in software by combining the image data with depth data – is definitely implemented in how the Vision Pro does it's video pass-through, where the perspective of the cameras at the front of the headset are projected back to where the user's eyes are.

If there is a computational element going on here, then that's hugely interesting because a) it effectively solves this issue with needing to use one sensor, and b) it opens up intriguing possibilities of allowing a little bit of head movement, and corresponding perspective changes (i.e. true spatial video rather than just 3D video – or what is called 6DoF in the industry).

badmonk · March 4, 2024 5:03PM

I wonder how many of the immersion cameras are in use by Apple, their complexity, cost and how quickly they are being ramped up as well as who makes them?

damonbpoole · March 5, 2024 1:32AM

7omr said:

The Canon dual fisheye on an EOS R5C produces two image on an 8k frame. The two images render to a single image which is half of 8k. This suggests two synced 8k cameras could work and that it doesn’t all have to occur on a single sensor as is suggested in that statement.

Regarding 60fps, my understanding is that there is meaningful evidence and experience to show that 60fps is a better experience in VR.

That would be a bit of a surprise. On the Oculus Quest for instance, the minimum requirement for apps is 72FPS and 90 produces a more natural feel and reduces motion sickness. If there is evidence to the contrary I would be very interested to see it.

Cheers!
Damon

ailooped · March 5, 2024 2:28PM

I think the 100 mp stereoscopic capture would be enough to fill the entire 360 space in good resolution, and that is what the article’s Canon guy is talking about. 360VR if you will. Most 360VR capture is not in stereo. And if they are, it looks very low resolution.

4k capture should be enough? If you think about it… 4k is just one locked view of what you look at, a 4k portion of the whole 360 view. The fact that you can look around everywhere, means that it is actually closer to 100mp to catch the entire scene.

I suspect Ai will be quite good at making older movies 3d, and likely can upscale ok too from lower res capture. I see some advancement in that, but no definite solution just yet.

All in all, AR/VR video is very much in its babystage still. No professional great systems are available, but I do think that AVP entering the market has sent the likes of canon into the labs, which is great to know.

edit: clarification

ailooped · March 5, 2024 3:16PM

Another point is that once the resolution in the headsets goes up to say 8k per eye, the capture will have to jump to 200mp to do 360VR in a good resolution for that new tech. If I was a movie production house, I would be looking at capture in the highest possible resolution for 3d, to be futuresafe.

180VR could prove to be THE format for movies though, seems like it might be hard to do narrative action if your audience is more interested in something that is happening the complete other direction, and there is also the problem of having the entire production staff in the picture…

It does seem like tech does have some hurdles to jump, which is cool.

danox · March 6, 2024 5:51PM

Canon/Sony don't have the OS software or program software know how to merge that information into a soc chip at the speed that Apple can, the R1 chip will only get better. Canon/Sony can't avoid the fact that they are going to need to raise their game from now on.

Sony/Canon proably need to do a lot more than just hardware going into the future. If mere humans are under AI pressure in the future companies like Sony, Canon, Intel, AMD, Nvidia, and Qualcomm are under the lack of a in house OS program/ecosystem combined with hardware going into the future. The top of OS software/program ecosystem pyramid with chip design/engineering ability is the only place to be in the future and that includes AI.

The break throughs needed to get all the software and hardware into a pair of glasses will squeeze out several companies Intel, AMD, Nvidia are already out and Qualcomm without a in house OS/ecosystem is at the mercy of a third party Samsung, Google, and Meta.

The only reason Apple is still in the so-call AI game is because of in house OS software/program ecosystem with in house chip design/engineering ability.

Canon: No camera can truly capture video for Apple Vision Pro

Comments