Marvin

About

Username: Marvin
Joined: February 2006
Visits: 131
Last Active: July 24
Roles: moderator
Points: 7,013
Badges: 2
Posts: 15,587

Reactions

2.7KLike19Dislike859Informative

What Apple's three GPU enhancements in A17 Pro and M3 actually do

Marvin

November 2023

blastdoor said:

Does the GPU support 64 bit floating point?

Someone made a library to support this here:

https://github.com/philipturner/metal-float64

It's going to be pretty slow compared to dedicated hardware. High-end consumer GPU double precision is around 1TFLOPs:

https://www.techpowerup.com/gpu-specs/geforce-rtx-4090.c3889

Dedicated GPUs for this are over 30-40TFLOPs:

https://www.techpowerup.com/gpu-specs/h100-pcie-96-gb.c4164
https://www.techpowerup.com/gpu-specs/radeon-instinct-mi300.c4019

but the Nvidia one costs over $40k:

https://www.newegg.com/p/1VK-0066-00022

It's probably most cost-effective using cloud services:

https://evp.cloud/pricing

The above Metal float64 support was to help add support for 64-bit atomics for Unreal Engine's Nanite feature:

https://github.com/philipturner/metal-benchmarks#nanite-atomics

"The Apple GPU architecture only supports 32-bit atomics on pointer values, while other architectures support texture atomics or 64-bit atomics. The latter two are required to run the current implementation of Nanite in Unreal Engine 5 (UE5). Nanite is a very novel rendering algorithm that removes the need for static LOD on vertex meshes. Rendering infinitely detailed meshes requires subpixel resolution and rasterizing pixels entirely in software. To implement a software-rasterized depth buffer, UE5 performs 64-bit atomic comparisons. The depth value is the upper 32 bits; the color is the lower 32. This algorithm is an example of a larger trend toward using GPGPU in rendering.

There was a recent discovery that Nanite can run entirely on 32-bit buffer atomics, at a 2.5x bandwidth/5x latency cost. However, Apple added hardware acceleration to the M2 series of GPUs for Nanite atomics. This includes a single instruction for non-returning UInt64 min or max. It does not include the wider set of atomic instructions typically useful for GPGPU, although such instructions were effectively emulated in the prototypical metal-float64. The A15 and A16, part of the same GPU family as M2, do not support Nanite atomics. Hopefully the A17 will gain support in the next series of chips."

Apple added 64-bit atomics for M2 (Apple8), M3 and A17 (Apple9):

https://developer.apple.com/metal/Metal-Feature-Set-Tables.pdf

Consumer GPUs aren't well suited for double precision computing.

https://www.tomshardware.com/news/intel-arc-will-not-support-fp64-hardware
https://www.techpowerup.com/forums/threads/nerfed-fp64-performance-in-consumer-gpu-cards.272732/

These are features they can add into Mac Pro versions of M3 Extreme. These models would be priced near $10k but if they can do 10-20TFLOPs FP64, it will be useful to some people. I doubt the volume of buyers justifies the manufacturing though and is why the Nvidia one is over $40k.
M3 Ultra could have up to 80 graphics cores

Marvin

November 2023

discountopinion said:
Does anyone feel there is any merit to the rumours of a further interconnect between 2 M?Ultras into something even more extreme?

Would there even be an idea to package up many M3 Ultras into compute nodes like Nvidia is doing with their chips? The power draw from the M3 Ultra is nothing compared to their chips. Maybe this is something for Apple’s iCloud.

The chip images between M1 and M3 look like they have a similar layout, the bottom part of M1 Pro (the GPU) is doubled in the Max chip, CPU is the same 10-core. GPU in M1 Pro is 16-core, Max is 32-core:

In the M3 image, the Pro chip is upside-down but the same GPU part is roughly doubled, M3 Pro has 18-core, M3 Max has 40-core. However, the Max chip is wider now and has a faster CPU (16-core vs 12-core):

The Ultra chip joins two Max chips along the bottom edge:

It's likely they will do the same with M3. But there could be an option to put another GPU component in the middle to make a 3x GPU.

M3 Max is 17TFLOPs (40-core), 2x would be 34TFLOPs (80-core), 3x would be 51TFLOPs (120-core). It would still be short of a 4090 with 3x but basically the same as a maxed out 2019 Mac Pro and around the same as a laptop 4090.

I expect the Mac Studio will top out at a dual chip. The Mac Pro would have more room for an extreme version with extra GPU cores and could add more memory (384GB). They might not feel that investing in a custom chip is worthwhile for such a small shipment volume though (<10k units).
Apple confirms that there is no Apple Silicon 27-inch iMac in the works

Marvin

November 2023

rob53 said:

Apple keeps pushing laptops because they feel portability is what everyone wants.

Apple's not pushing people to laptops, they just sell what people want. The trend towards laptops and mobile has happened everywhere.

Laptop also doesn't mean confined to a small display. A desktop confines people to a large display and fixed location. Many people opt for the best of both (Apple employees use this setup in their offices):

This allows for a larger display than the iMac as well as options like OLED and matching dual displays and should the need arise to take the laptop on vacation, to bed, to the lounge, to work or take it for repair, unplug it and it's portable.
Apple is 'very pleased' with its movie box office, says theater chain

Marvin

November 2023

mpantone said:
The film industry continues to use this benchmark because it's a pretty good indication of a film's popularity relative to the most important film consumer market in the world: the USA.

Seems like there's too many variables for it to be a reliable measure. The opening weekend total puts it behind Cocaine Bear:

https://www.boxofficemojo.com/year/2023/?grossesOption=totalGrosses&sort=openingWeekendGross&ref_=bo_yld__resort#table

But worldwide revenue for Cocaine Bear was $88m:

https://www.boxofficemojo.com/releasegroup/gr445207045/

Even though they are in the same range, Killers of the Flower Moon is already above this:

https://www.boxofficemojo.com/releasegroup/gr3466220037/

There's a big variation in the split between domestic and international from one movie to another because some movies mainly appeal to US audiences. Taylor Swift's movie, like Cocaine Bear, has a roughly 75:25 domestic/international split:

https://www.boxofficemojo.com/releasegroup/gr4238103045/

Oppenheimer is 35:65. Killers of the Flower Moon so far is around 50:50.

The opening weekend for Oppenheimer was just below Taylor Swift's but eventually made 2x domestically and 4.5x internationally.
Fast X domestic total was below Taylor Swift's but made 80% internationally.

Killers of the Flower Moon is 38th worldwide this year:

https://www.boxofficemojo.com/year/world/?ref_=bo_nb_yld_tab

The top 20 mostly makes sense. This movies feels like it should at least be in position 20-30 but that would need another $40m. If it has enough time left in the theaters, maybe it can make it.

It has good reviews at least so it should measure well on streaming and the more of this quality of exclusive content, the better the Apple TV service will be.

https://www.rottentomatoes.com/m/killers_of_the_flower_moon
https://www.rottentomatoes.com/m/oppenheimer_2023
M3 Max benchmarks show Mac Pro performance in a MacBook

Marvin

November 2023

Xed said:

Marvin said:

netrox said:

Xed said:

netrox said:

Xed said:

dewme said:

The final round of Intel Macs are still relevant for users who need to run multiple virtual machines targeting Intel deployments on Windows and Linux and who want to do it from a Mac host. If I were still doing that kind of development I would hang on to that type of Mac until the wheels fell off. But that window is closing fast and the only way forward is to use a beefy Intel Windows box.

Otherwise, Apple Silicon is unbeatable.

Why not run Windows and Linux in a VM in an M-series Mac? I do and it's a screamer.

If anything, it's the option of being able boot directly into Windows with BootCamp that I miss about the Intel Macs.

Can you run x86 Linux on m series Mac’s?!? Is there a virtual x86 emulator already?

I don't know of an emulator for x86 OSes, but it's definitely not impossible and would probably seem pretty compared to the average PC out there, but I was talking about running Linux ARM64 in a VM.

That's what I thought, running ARM version of Ubuntu, as it has done for a while just like Windows for ARM.

I wonder if x86 emulator has come out yet for OS's that aren't compiled for ARM.

There's an x86 emulator:

but the native ARM version will run 5-10x faster than the emulated x86 one.

If you had to run an emulator in an M3 Max MBP, I wonder how much slower it would be than using one of the last gen Intel MBP with a VM.

Much slower it turns out. Emulated is barely usable, left is x86 Linux, right is arm native. Native is nearly 30x faster:

On M3 Max, it wouldn't even be as fast as a VM on a Mac from over 10 years ago. There may be ways to tune an x86 version to run better but it would be best to keep an old x86 machine around for x86 VMs.

For most use cases, a native ARM VM is the way to go and potentially emulating x86 apps separately like how Windows ARM and Rosetta work.

Native VMs are about 70-90% of the native system performance.