New Mac Pro may not support PCI-E GPUs

13»

Comments

  • Reply 41 of 55
    MarvinMarvin Posts: 15,322moderator
    tht said:
    cgWerks said:
    tht said:
    … The big issue, how do they get a class competitive GPU? Will be interesting to see what they do. The current GPU performance in the M2 Max is intriguing and if they can get 170k GB5 Metal scores with an M2 Ultra, that's probably enough. But, it's probably going to be something like 130k GB5 Metal. Perhaps they will crank the clocks up to ensure it is more performant than the Radeon Pro 6900X …
    The devil is in the details. I’ve seen M1 Pro/Max do some fairly incredible things in certain 3D apps that match high-end AMD/Nvidia, while at the same time, there are things the top M1 Ultra fails at so miserably, it isn’t usable, and is bested by a low-mid-end AMD/Nvidia PC.

    I suppose if they keep scaling everything up, they’ll kind of get there for the most part. But, remember the previous Mac Pro could have 4x or more of those fast GPUs. Most people don’t need that, so maybe they have no intention of going back there again. But, I hope they have some plan to be realistically competitive with more common mid-to-high end PCs with single GPUs. If they can’t even pull that off, they may as well just throw in the towel and abandon GPU-dependant professional markets.
    If only they can keep scaling up. Scaling GPU compute performance with more GPU cores has been the Achilles heel of Apple Silicon. I bet not being able to scale GPU performance is the primary reason why the M1 Mac Pro was not shipped or got to validation stage. On a per core basis, 64 GPU cores in the M1 Ultra is performing at little over half half (GB5 Metal 1.5k points per core) of what a GPU core does in an 8 GPU core M1 (2.6k per core). It's basically half if you compare the Ultra to the A14 GPU core performance. And you can see the scaling efficiency get worse and worse when comparing 4, 8, 14, 16, 24, 32, 48 and 64 cores.

    The GPU team inside Apple is not doing a good job with their predictions of performance. They have done a great job at the smartphone, tablet and even laptop level, but getting the GPU architecture to scale to desktops and workstations has been a failure. Apple was convinced that the Ultra and Extreme models would provide competitive GPU performance. This type of decision isn't based on some GPU lead blustering that this architecture would work. It should have been based on modeled chip simulations showing that it would work and what potential it would have. After that, a multi-billion decision would be made. So, something is up in the GPU architecture team inside Apple imo. Hopefully they will recover and fix the scaling by the time the M3 architecture ships. The M2 versions has improved GPU core scaling efficiency, but not quite enough to make a 144 GPU core model worthwhile, if the rumors of the Extreme model being canceled are true (I really hope not).

    If the GPU scaling for the M1 Ultra was say 75% efficient, it would have scored about 125k in GB5 Metal. About the performance of a Radeon Pro 6800. An Extreme version with 128 GPU cores at 60% efficiency would be 200k in GB5 Metal. That's Nvidia 3080 territory, maybe even 3090. Both would have been suitable for a Mac Pro, but alas no. The devil is in the details. The Apple Silicon GPU team fucked up imo.
    There are some tests where it scales better. Here they test 3DMark at 7:25:



    Ultra gets 34k, Max gets 20k so 70% increase.

    Later in the video at 20:00, they test Tensorflow Metal and Ultra GPU scales to as high as 93% faster than Max.

    https://github.com/tlkh/tf-metal-experiments

    Some GPU software won't scale well due to CPU holding it back, especially if it's Rosetta translated.

    It's hard to tell if it's a hardware issue or software/OS/drivers, likely a combination. If the performance gain is more consistent with M2 Ultra and doesn't get fixed with the same software on M1 Ultra, it will be clearer that it's been a hardware issue. The good thing is they have software like Blender now that they can test against a 4090 and figure out what they need to do.

    The strange part is that the CPU is scaling very well, almost double in most tests. The AMD W6800X Duo scales well for compute too and those separate GPU chips are connected together in a slower way than Apple's chip. There are the same kind of drops in some tests but OctaneX is near double:

    https://barefeats.com/pro-w6800x-duo-other-gpus.html

    As you say, the GPU scaling issue could have been the reason for not doing an M1 Extreme. M2 Ultra and the next Mac Pro will answer a lot of questions.

    An Extreme model wouldn't have to be 4x Ultra chips, the edges on the Max chips could both be connected to a special GPU chip that only has GPU cores. Given that the Max is around 70W, they can do 4x Max GPU cores just on the extra chip (152) plus 76 on the others for 228 cores. This would be 90TFLOPs and would be needed to rival a 4090 and this extra chip can have hardware RT cores. This should scale better as 2/3 of the GPU cores would be on the same chip.
    cgWerkstenthousandthingsfastasleep
  • Reply 42 of 55
    Marvin said:
    tht said:
    cgWerks said:
    tht said:
    … The big issue, how do they get a class competitive GPU? Will be interesting to see what they do. The current GPU performance in the M2 Max is intriguing and if they can get 170k GB5 Metal scores with an M2 Ultra, that's probably enough. But, it's probably going to be something like 130k GB5 Metal. Perhaps they will crank the clocks up to ensure it is more performant than the Radeon Pro 6900X …
    The devil is in the details. I’ve seen M1 Pro/Max do some fairly incredible things in certain 3D apps that match high-end AMD/Nvidia, while at the same time, there are things the top M1 Ultra fails at so miserably, it isn’t usable, and is bested by a low-mid-end AMD/Nvidia PC.

    I suppose if they keep scaling everything up, they’ll kind of get there for the most part. But, remember the previous Mac Pro could have 4x or more of those fast GPUs. Most people don’t need that, so maybe they have no intention of going back there again. But, I hope they have some plan to be realistically competitive with more common mid-to-high end PCs with single GPUs. If they can’t even pull that off, they may as well just throw in the towel and abandon GPU-dependant professional markets.
    If only they can keep scaling up. Scaling GPU compute performance with more GPU cores has been the Achilles heel of Apple Silicon. I bet not being able to scale GPU performance is the primary reason why the M1 Mac Pro was not shipped or got to validation stage. On a per core basis, 64 GPU cores in the M1 Ultra is performing at little over half half (GB5 Metal 1.5k points per core) of what a GPU core does in an 8 GPU core M1 (2.6k per core). It's basically half if you compare the Ultra to the A14 GPU core performance. And you can see the scaling efficiency get worse and worse when comparing 4, 8, 14, 16, 24, 32, 48 and 64 cores.

    The GPU team inside Apple is not doing a good job with their predictions of performance. They have done a great job at the smartphone, tablet and even laptop level, but getting the GPU architecture to scale to desktops and workstations has been a failure. Apple was convinced that the Ultra and Extreme models would provide competitive GPU performance. This type of decision isn't based on some GPU lead blustering that this architecture would work. It should have been based on modeled chip simulations showing that it would work and what potential it would have. After that, a multi-billion decision would be made. So, something is up in the GPU architecture team inside Apple imo. Hopefully they will recover and fix the scaling by the time the M3 architecture ships. The M2 versions has improved GPU core scaling efficiency, but not quite enough to make a 144 GPU core model worthwhile, if the rumors of the Extreme model being canceled are true (I really hope not).

    If the GPU scaling for the M1 Ultra was say 75% efficient, it would have scored about 125k in GB5 Metal. About the performance of a Radeon Pro 6800. An Extreme version with 128 GPU cores at 60% efficiency would be 200k in GB5 Metal. That's Nvidia 3080 territory, maybe even 3090. Both would have been suitable for a Mac Pro, but alas no. The devil is in the details. The Apple Silicon GPU team fucked up imo.
    There are some tests where it scales better. Here they test 3DMark at 7:25:



    Ultra gets 34k, Max gets 20k so 70% increase.

    Later in the video at 20:00, they test Tensorflow Metal and Ultra GPU scales to as high as 93% faster than Max.

    https://github.com/tlkh/tf-metal-experiments

    Some GPU software won't scale well due to CPU holding it back, especially if it's Rosetta translated.

    It's hard to tell if it's a hardware issue or software/OS/drivers, likely a combination. If the performance gain is more consistent with M2 Ultra and doesn't get fixed with the same software on M1 Ultra, it will be clearer that it's been a hardware issue. The good thing is they have software like Blender now that they can test against a 4090 and figure out what they need to do.

    The strange part is that the CPU is scaling very well, almost double in most tests. The AMD W6800X Duo scales well for compute too and those separate GPU chips are connected together in a slower way than Apple's chip. There are the same kind of drops in some tests but OctaneX is near double:

    https://barefeats.com/pro-w6800x-duo-other-gpus.html

    As you say, the GPU scaling issue could have been the reason for not doing an M1 Extreme. M2 Ultra and the next Mac Pro will answer a lot of questions.

    An Extreme model wouldn't have to be 4x Ultra chips, the edges on the Max chips could both be connected to a special GPU chip that only has GPU cores. Given that the Max is around 70W, they can do 4x Max GPU cores just on the extra chip (152) plus 76 on the others for 228 cores. This would be 90TFLOPs and would be needed to rival a 4090 and this extra chip can have hardware RT cores. This should scale better as 2/3 of the GPU cores would be on the same chip.
    I wonder if the AMD Duo MPX modules, which use Infinity Fabric internally (to join two GPUs) in conjunction with the external "Infinity Fabric Link" Mac Pro interconnect to bypass PCIe (although PCIe remains) and create an Infinity Fabric quad (a total of four GPUs, despite the "duo" name), might serve as a model for something Apple could do, using UltraFusion (or the next generation of it) to link the base SoC to MPX GPUs. Imagine what Apple could do with that.

    My understanding is that Infinity Fabric and UltraFusion are similar technologies, and if Apple and AMD can collaborate to create the Infinity Fabric Link, they can also coordinate the next generation(s). I guess that's really the thing that doesn't make sense to me -- that AMD and Apple would walk away from their alliance just as things start getting interesting, with RDNA 3 architecture on the AMD side, and Apple's own silicon with next-generation UltraFusion on the other. Metal 3 supports advanced AMD GPUs (i.e., those in the iMac Pro and the lattice Mac Pro), so until I hear otherwise, I'll continue to believe that Apple is going to leave the door open to AMD modules (even if only via PCIe, the status quo)...
    edited January 2023 cgWerksmattinozfastasleep
  • Reply 43 of 55
    cgWerkscgWerks Posts: 2,952member
    tenthousandthings said:
    I wonder if the AMD Duo MPX modules, which use Infinity Fabric internally (to join two GPUs) in conjunction with the external "Infinity Fabric Link" Mac Pro interconnect to bypass PCIe (although PCIe remains) and create an Infinity Fabric quad (a total of four GPUs, despite the "duo" name), might serve as a model for something Apple could do, using UltraFusion (or the next generation of it) to link the base SoC to MPX GPUs. Imagine what Apple could do with that.

    My understanding is that Infinity Fabric and UltraFusion are similar technologies, and if Apple and AMD can collaborate to create the Infinity Fabric Link, they can also coordinate the next generation(s). I guess that's really the thing that doesn't make sense to me -- that AMD and Apple would walk away from their alliance just as things start getting interesting, with RDNA 3 architecture on the AMD side, and Apple's own silicon with next-generation UltraFusion on the other. Metal 3 supports advanced AMD GPUs (i.e., those in the iMac Pro and the lattice Mac Pro), so until I hear otherwise, I'll continue to believe that Apple is going to leave the door open to AMD modules (even if only via PCIe, the status quo)...
    I really hope you’re right. Unless Apple has some super-secret, blow us out of the water, GPU plan in the works, I fear that without something like this, we Mac users are going to constantly be stuck in catch-up mode and begging developers to try and rig us a solution.

    As much as I was/am excited about Apple Silicon, if that is the ultimate outcome, I’d rather just retain the Intel compatibility. I’m sure most average users disagree, as Apple Silicon is vastly superior in many ways (mostly to do with power), but for prosumers and real pros with heavier computing needs, the power-advantages of Apple Silicon aren’t nearly as important if the rest isn’t there along with that.
    tenthousandthings
  • Reply 44 of 55
    danoxdanox Posts: 2,848member
    nubus said:
    Apple Silicon is designed as fixed bundles of CPU, GPU, ML, and RAM. It doesn't play well the modularity that has been with us since Mac II. And why buy fixed performance when you can scale using the cloud? Workstations made sense 30 years ago with SGI, NeXT, and Sun Microsystems. Apple should make a partnership with AWS (or make a Mac cloud), improve the dull non-design of Studio, and kill the Pro.
    Cool concept for cloud computing. The down side if internet connection is down, the computer is down. Centralized workstation makes sense if everyone is back at the office and work. Post pandemic changes all that.
    That and the lack of trust with that third-party handling your information and I don’t trust Microsoft, Google, or Amazon as far as I can throw an elephant.
    cgWerks
  • Reply 45 of 55
    thttht Posts: 5,442member
    Marvin said:
    tht said:
    cgWerks said:
    tht said:
    … The big issue, how do they get a class competitive GPU? Will be interesting to see what they do. The current GPU performance in the M2 Max is intriguing and if they can get 170k GB5 Metal scores with an M2 Ultra, that's probably enough. But, it's probably going to be something like 130k GB5 Metal. Perhaps they will crank the clocks up to ensure it is more performant than the Radeon Pro 6900X …
    The devil is in the details. I’ve seen M1 Pro/Max do some fairly incredible things in certain 3D apps that match high-end AMD/Nvidia, while at the same time, there are things the top M1 Ultra fails at so miserably, it isn’t usable, and is bested by a low-mid-end AMD/Nvidia PC.

    I suppose if they keep scaling everything up, they’ll kind of get there for the most part. But, remember the previous Mac Pro could have 4x or more of those fast GPUs. Most people don’t need that, so maybe they have no intention of going back there again. But, I hope they have some plan to be realistically competitive with more common mid-to-high end PCs with single GPUs. If they can’t even pull that off, they may as well just throw in the towel and abandon GPU-dependant professional markets.
    If only they can keep scaling up. Scaling GPU compute performance with more GPU cores has been the Achilles heel of Apple Silicon. I bet not being able to scale GPU performance is the primary reason why the M1 Mac Pro was not shipped or got to validation stage. On a per core basis, 64 GPU cores in the M1 Ultra is performing at little over half half (GB5 Metal 1.5k points per core) of what a GPU core does in an 8 GPU core M1 (2.6k per core). It's basically half if you compare the Ultra to the A14 GPU core performance. And you can see the scaling efficiency get worse and worse when comparing 4, 8, 14, 16, 24, 32, 48 and 64 cores.

    The GPU team inside Apple is not doing a good job with their predictions of performance. They have done a great job at the smartphone, tablet and even laptop level, but getting the GPU architecture to scale to desktops and workstations has been a failure. Apple was convinced that the Ultra and Extreme models would provide competitive GPU performance. This type of decision isn't based on some GPU lead blustering that this architecture would work. It should have been based on modeled chip simulations showing that it would work and what potential it would have. After that, a multi-billion decision would be made. So, something is up in the GPU architecture team inside Apple imo. Hopefully they will recover and fix the scaling by the time the M3 architecture ships. The M2 versions has improved GPU core scaling efficiency, but not quite enough to make a 144 GPU core model worthwhile, if the rumors of the Extreme model being canceled are true (I really hope not).

    If the GPU scaling for the M1 Ultra was say 75% efficient, it would have scored about 125k in GB5 Metal. About the performance of a Radeon Pro 6800. An Extreme version with 128 GPU cores at 60% efficiency would be 200k in GB5 Metal. That's Nvidia 3080 territory, maybe even 3090. Both would have been suitable for a Mac Pro, but alas no. The devil is in the details. The Apple Silicon GPU team fucked up imo.
    There are some tests where it scales better. Here they test 3DMark at 7:25:



    Ultra gets 34k, Max gets 20k so 70% increase.

    Later in the video at 20:00, they test Tensorflow Metal and Ultra GPU scales to as high as 93% faster than Max.

    https://github.com/tlkh/tf-metal-experiments

    Some GPU software won't scale well due to CPU holding it back, especially if it's Rosetta translated.

    It's hard to tell if it's a hardware issue or software/OS/drivers, likely a combination. If the performance gain is more consistent with M2 Ultra and doesn't get fixed with the same software on M1 Ultra, it will be clearer that it's been a hardware issue. The good thing is they have software like Blender now that they can test against a 4090 and figure out what they need to do.

    The strange part is that the CPU is scaling very well, almost double in most tests. The AMD W6800X Duo scales well for compute too and those separate GPU chips are connected together in a slower way than Apple's chip. There are the same kind of drops in some tests but OctaneX is near double:

    https://barefeats.com/pro-w6800x-duo-other-gpus.html

    As you say, the GPU scaling issue could have been the reason for not doing an M1 Extreme. M2 Ultra and the next Mac Pro will answer a lot of questions.

    An Extreme model wouldn't have to be 4x Ultra chips, the edges on the Max chips could both be connected to a special GPU chip that only has GPU cores. Given that the Max is around 70W, they can do 4x Max GPU cores just on the extra chip (152) plus 76 on the others for 228 cores. This would be 90TFLOPs and would be needed to rival a 4090 and this extra chip can have hardware RT cores. This should scale better as 2/3 of the GPU cores would be on the same chip.
    I wonder if the AMD Duo MPX modules, which use Infinity Fabric internally (to join two GPUs) in conjunction with the external "Infinity Fabric Link" Mac Pro interconnect to bypass PCIe (although PCIe remains) and create an Infinity Fabric quad (a total of four GPUs, despite the "duo" name), might serve as a model for something Apple could do, using UltraFusion (or the next generation of it) to link the base SoC to MPX GPUs. Imagine what Apple could do with that.

    My understanding is that Infinity Fabric and UltraFusion are similar technologies, and if Apple and AMD can collaborate to create the Infinity Fabric Link, they can also coordinate the next generation(s). I guess that's really the thing that doesn't make sense to me -- that AMD and Apple would walk away from their alliance just as things start getting interesting, with RDNA 3 architecture on the AMD side, and Apple's own silicon with next-generation UltraFusion on the other. Metal 3 supports advanced AMD GPUs (i.e., those in the iMac Pro and the lattice Mac Pro), so until I hear otherwise, I'll continue to believe that Apple is going to leave the door open to AMD modules (even if only via PCIe, the status quo)...
    Yes, my best guess for what Apple is going to do for more compute performance in the Mac Pro was essentially an Ultra or an Extreme in an MPX module. The machine would have a master Ultra or Extreme SoC that boots the machine and controls the MPX modules. You can add 3 or 4 2-slot wide MPX modules or 2 4-slot wide MPX modules. Most software will allow you to select which MPX module, or modules, to use. Don't think an additional connection, like an infinity fabric, over and above 16 lanes of PCIe is necessary, as most of the problems that can make use of these types of architectures are embarrassingly parallel.

    UltraFusion represents something a bit different than AMD's Infinity Fabric (ex-Apple exec Mark Papermaster has a patent on IF!). An Infinity fabric is basically a serial connection for chips with about 400 GB/s of bandwidth. UltraFusion is a silicon bridge. It's a piece of silicon, not wire traces in the PCB with connection ports on the chip like AMD's Infinity Fabric, and is overlaid on top of the edges of the Max chip silicon.

    Apple's chips have an on-die (read as silicon) fabric bus that connects the core components of the chips: CPU complexes, memory interfaces, GPU complexes, ML and media engines, SLC, PCIe, etc. The UltraFusion silicon bridge or interposer basically extends that fabric bus so to a second Max chip, at 2500 GB/s, with low enough latency that 2 GPUs can appear as one. I don't know if there is a hit to GPU core scaling performance because of this. I bet there probably is.

    For the Extreme SoC, I was thinking the "ExtremeFusion" silicon bridge would be a double long version of UltraFusion one, also wider, and would have a 4 port fabric bus switch and 32 lanes of PCIe in it. So 4 Max chips would behave like one chip, and it would have 32 lanes of PCIe 4 coming out of it for 8 slots. Memory would be LPDDR, no DIMM slots. And they could stack the LPDDR 4-high. 16 stacks of 24 GB LPDDR5 stacked 4-high gets you 1.5 TB of memory. Hmm, just like the memory capacity of the 2019 Mac Pro.

    Tremendous expense for a machine that sells what, 100k units per year? This level of integration doesn't come cheap. Heck, I'd like Apple to make an MBP16 with an Ultra to amortize the costs. ;)
    cgWerkstenthousandthingsfastasleep
  • Reply 46 of 55
    cgWerkscgWerks Posts: 2,952member
    tht said:
    … Tremendous expense for a machine that sells what, 100k units per year? This level of integration doesn't come cheap. Heck, I'd like Apple to make an MBP16 with an Ultra to amortize the costs. ;)
    I like what you’re talking about, but it does sound expensive. I’m more trying to figure out what Apple’s solution will be to compete with the $2k PC. (Though I get that’s a bit OT for this particular article.)
  • Reply 47 of 55
    thttht Posts: 5,442member
    Marvin said:
    tht said:
    cgWerks said:
    tht said:
    … The big issue, how do they get a class competitive GPU? Will be interesting to see what they do. The current GPU performance in the M2 Max is intriguing and if they can get 170k GB5 Metal scores with an M2 Ultra, that's probably enough. But, it's probably going to be something like 130k GB5 Metal. Perhaps they will crank the clocks up to ensure it is more performant than the Radeon Pro 6900X …
    The devil is in the details. I’ve seen M1 Pro/Max do some fairly incredible things in certain 3D apps that match high-end AMD/Nvidia, while at the same time, there are things the top M1 Ultra fails at so miserably, it isn’t usable, and is bested by a low-mid-end AMD/Nvidia PC.

    I suppose if they keep scaling everything up, they’ll kind of get there for the most part. But, remember the previous Mac Pro could have 4x or more of those fast GPUs. Most people don’t need that, so maybe they have no intention of going back there again. But, I hope they have some plan to be realistically competitive with more common mid-to-high end PCs with single GPUs. If they can’t even pull that off, they may as well just throw in the towel and abandon GPU-dependant professional markets.
    If only they can keep scaling up. Scaling GPU compute performance with more GPU cores has been the Achilles heel of Apple Silicon. I bet not being able to scale GPU performance is the primary reason why the M1 Mac Pro was not shipped or got to validation stage. On a per core basis, 64 GPU cores in the M1 Ultra is performing at little over half half (GB5 Metal 1.5k points per core) of what a GPU core does in an 8 GPU core M1 (2.6k per core). It's basically half if you compare the Ultra to the A14 GPU core performance. And you can see the scaling efficiency get worse and worse when comparing 4, 8, 14, 16, 24, 32, 48 and 64 cores.

    The GPU team inside Apple is not doing a good job with their predictions of performance. They have done a great job at the smartphone, tablet and even laptop level, but getting the GPU architecture to scale to desktops and workstations has been a failure. Apple was convinced that the Ultra and Extreme models would provide competitive GPU performance. This type of decision isn't based on some GPU lead blustering that this architecture would work. It should have been based on modeled chip simulations showing that it would work and what potential it would have. After that, a multi-billion decision would be made. So, something is up in the GPU architecture team inside Apple imo. Hopefully they will recover and fix the scaling by the time the M3 architecture ships. The M2 versions has improved GPU core scaling efficiency, but not quite enough to make a 144 GPU core model worthwhile, if the rumors of the Extreme model being canceled are true (I really hope not).

    If the GPU scaling for the M1 Ultra was say 75% efficient, it would have scored about 125k in GB5 Metal. About the performance of a Radeon Pro 6800. An Extreme version with 128 GPU cores at 60% efficiency would be 200k in GB5 Metal. That's Nvidia 3080 territory, maybe even 3090. Both would have been suitable for a Mac Pro, but alas no. The devil is in the details. The Apple Silicon GPU team fucked up imo.
    There are some tests where it scales better. Here they test 3DMark at 7:25:



    Ultra gets 34k, Max gets 20k so 70% increase.

    Later in the video at 20:00, they test Tensorflow Metal and Ultra GPU scales to as high as 93% faster than Max.

    https://github.com/tlkh/tf-metal-experiments

    Some GPU software won't scale well due to CPU holding it back, especially if it's Rosetta translated.

    It's hard to tell if it's a hardware issue or software/OS/drivers, likely a combination. If the performance gain is more consistent with M2 Ultra and doesn't get fixed with the same software on M1 Ultra, it will be clearer that it's been a hardware issue. The good thing is they have software like Blender now that they can test against a 4090 and figure out what they need to do.

    The strange part is that the CPU is scaling very well, almost double in most tests. The AMD W6800X Duo scales well for compute too and those separate GPU chips are connected together in a slower way than Apple's chip. There are the same kind of drops in some tests but OctaneX is near double:

    https://barefeats.com/pro-w6800x-duo-other-gpus.html

    As you say, the GPU scaling issue could have been the reason for not doing an M1 Extreme. M2 Ultra and the next Mac Pro will answer a lot of questions.

    An Extreme model wouldn't have to be 4x Ultra chips, the edges on the Max chips could both be connected to a special GPU chip that only has GPU cores. Given that the Max is around 70W, they can do 4x Max GPU cores just on the extra chip (152) plus 76 on the others for 228 cores. This would be 90TFLOPs and would be needed to rival a 4090 and this extra chip can have hardware RT cores. This should scale better as 2/3 of the GPU cores would be on the same chip.
    It goes as it always does, The hardware has to accelerate ill-tuned software because software cycles are like 5x longer than hardware cycles, and it's not in their control to boot. So, they have to design a GPU arch that does well for everything because developers won't be tuning their software for it. Reminder that tuning is not the same thing as a recompile. Blender is a drop in the bucket. Heck, Apple has enough trouble just optimizing their own software. Not sure if FCP has been updated for the Ultra yet even. Don't get me wrong. They need to spend resources getting software tuned for Metal, and ARM, good to see they are doing it with Blender, but there is a lot of software out there.

    If Apple was developing a separate GPU chip, I think we would have heard it by now. Their current path is basically the cheapest possible path. They are just designing 1 Mac chip: the Max chip. (The M1 or M2 are just like A12X, A10X chips that they have been doing for iPads for a long while). The Pro chip is a cut down version of the Max chip, the Ultra is 2 Max chips with a silicon interposer, and hypothetically, the Extreme will be or would have been 4 Max chips connected with a larger silicon interposer. High level, if GPU performance scaling was about 60% from 1 core to 128 cores, they would have been done and have competitive performance to Nvidia. At the 10k ft manager level, that probably sounded great! But the devil is in the details. Srouji had and has his work cut out for him. They had to know GPU performance scaling wasn't going well in 2020 at least.

    Well, they may change plans to designing 2 distinct chips: a CPU chip and a GPU chip. So, that they can scale both ways in the future, but today, it's really just the one chip.
    cgWerkstenthousandthingsfastasleep
  • Reply 48 of 55
    mattinozmattinoz Posts: 2,316member
    Marvin said:
    tht said:
    cgWerks said:
    tht said:
    … The big issue, how do they get a class competitive GPU? Will be interesting to see what they do. The current GPU performance in the M2 Max is intriguing and if they can get 170k GB5 Metal scores with an M2 Ultra, that's probably enough. But, it's probably going to be something like 130k GB5 Metal. Perhaps they will crank the clocks up to ensure it is more performant than the Radeon Pro 6900X …
    The devil is in the details. I’ve seen M1 Pro/Max do some fairly incredible things in certain 3D apps that match high-end AMD/Nvidia, while at the same time, there are things the top M1 Ultra fails at so miserably, it isn’t usable, and is bested by a low-mid-end AMD/Nvidia PC.

    I suppose if they keep scaling everything up, they’ll kind of get there for the most part. But, remember the previous Mac Pro could have 4x or more of those fast GPUs. Most people don’t need that, so maybe they have no intention of going back there again. But, I hope they have some plan to be realistically competitive with more common mid-to-high end PCs with single GPUs. If they can’t even pull that off, they may as well just throw in the towel and abandon GPU-dependant professional markets.
    If only they can keep scaling up. Scaling GPU compute performance with more GPU cores has been the Achilles heel of Apple Silicon. I bet not being able to scale GPU performance is the primary reason why the M1 Mac Pro was not shipped or got to validation stage. On a per core basis, 64 GPU cores in the M1 Ultra is performing at little over half half (GB5 Metal 1.5k points per core) of what a GPU core does in an 8 GPU core M1 (2.6k per core). It's basically half if you compare the Ultra to the A14 GPU core performance. And you can see the scaling efficiency get worse and worse when comparing 4, 8, 14, 16, 24, 32, 48 and 64 cores.

    The GPU team inside Apple is not doing a good job with their predictions of performance. They have done a great job at the smartphone, tablet and even laptop level, but getting the GPU architecture to scale to desktops and workstations has been a failure. Apple was convinced that the Ultra and Extreme models would provide competitive GPU performance. This type of decision isn't based on some GPU lead blustering that this architecture would work. It should have been based on modeled chip simulations showing that it would work and what potential it would have. After that, a multi-billion decision would be made. So, something is up in the GPU architecture team inside Apple imo. Hopefully they will recover and fix the scaling by the time the M3 architecture ships. The M2 versions has improved GPU core scaling efficiency, but not quite enough to make a 144 GPU core model worthwhile, if the rumors of the Extreme model being canceled are true (I really hope not).

    If the GPU scaling for the M1 Ultra was say 75% efficient, it would have scored about 125k in GB5 Metal. About the performance of a Radeon Pro 6800. An Extreme version with 128 GPU cores at 60% efficiency would be 200k in GB5 Metal. That's Nvidia 3080 territory, maybe even 3090. Both would have been suitable for a Mac Pro, but alas no. The devil is in the details. The Apple Silicon GPU team fucked up imo.
    There are some tests where it scales better. Here they test 3DMark at 7:25:



    Ultra gets 34k, Max gets 20k so 70% increase.

    Later in the video at 20:00, they test Tensorflow Metal and Ultra GPU scales to as high as 93% faster than Max.

    https://github.com/tlkh/tf-metal-experiments

    Some GPU software won't scale well due to CPU holding it back, especially if it's Rosetta translated.

    It's hard to tell if it's a hardware issue or software/OS/drivers, likely a combination. If the performance gain is more consistent with M2 Ultra and doesn't get fixed with the same software on M1 Ultra, it will be clearer that it's been a hardware issue. The good thing is they have software like Blender now that they can test against a 4090 and figure out what they need to do.

    The strange part is that the CPU is scaling very well, almost double in most tests. The AMD W6800X Duo scales well for compute too and those separate GPU chips are connected together in a slower way than Apple's chip. There are the same kind of drops in some tests but OctaneX is near double:

    https://barefeats.com/pro-w6800x-duo-other-gpus.html

    As you say, the GPU scaling issue could have been the reason for not doing an M1 Extreme. M2 Ultra and the next Mac Pro will answer a lot of questions.

    An Extreme model wouldn't have to be 4x Ultra chips, the edges on the Max chips could both be connected to a special GPU chip that only has GPU cores. Given that the Max is around 70W, they can do 4x Max GPU cores just on the extra chip (152) plus 76 on the others for 228 cores. This would be 90TFLOPs and would be needed to rival a 4090 and this extra chip can have hardware RT cores. This should scale better as 2/3 of the GPU cores would be on the same chip.
    I wonder if the AMD Duo MPX modules, which use Infinity Fabric internally (to join two GPUs) in conjunction with the external "Infinity Fabric Link" Mac Pro interconnect to bypass PCIe (although PCIe remains) and create an Infinity Fabric quad (a total of four GPUs, despite the "duo" name), might serve as a model for something Apple could do, using UltraFusion (or the next generation of it) to link the base SoC to MPX GPUs. Imagine what Apple could do with that.

    My understanding is that Infinity Fabric and UltraFusion are similar technologies, and if Apple and AMD can collaborate to create the Infinity Fabric Link, they can also coordinate the next generation(s). I guess that's really the thing that doesn't make sense to me -- that AMD and Apple would walk away from their alliance just as things start getting interesting, with RDNA 3 architecture on the AMD side, and Apple's own silicon with next-generation UltraFusion on the other. Metal 3 supports advanced AMD GPUs (i.e., those in the iMac Pro and the lattice Mac Pro), so until I hear otherwise, I'll continue to believe that Apple is going to leave the door open to AMD modules (even if only via PCIe, the status quo)...
    Yes seems strange to design a modular case knowing Apple Silicon was coming not upgrade it for the Intel that was in it then basically throw out everything. Why both even making the MPX modules just use standard GPU cards unless they are a test bed for something. 

    Seems to me they'd be prefect for Apple Silicon MPX units which runs much cooler than the GPU chips even with an ultra. 

    Follow the same idea as the AMD/GPU MPX modules it would seem they could basically rework the top end of the motherboard and add 2 more MPX slots (4 wide space double connector for bandwidth, no need for the intermediate slots). the lower one aligned to the top port slot in the back of the case currently used for the IO board. So that the first Apple Silicon MPX slots to this opening provides the IO directly on board a second could fit above with either a blind slot or case back is reworked slightly to add and extra mail slot. 

    Then they'd have a mainboard that only needs to update for PCIeCXL standards as it's just a big distribution system.

    Does AMD IF have a hub chip that sits between 2 GPU chip/chiplets? 
    Apple could do something similar to sit between 2 M2 Max chips and make the M2 Ultra have lots of PCIe lanes one one edge of the interloper and and fabric connector on the other to bridge 2 Ultras?

    Would get super interesting if Apple Ultra fusion fabric could be compatible with AMD infinity fabric to the point you could bridge over the top of and AS MPX and AMD MPX and have them share. You could then maybe have 2 independent systems in a rack case. If that works thou Apple could do very interesting things with the Mac Studio. 
    cgWerks
  • Reply 49 of 55
    programmerprogrammer Posts: 3,458member
    I haven't seen it mentioned here that there is a fundamental architectural difference between the Apple Silicon GPU and pretty much every other GPU architecture (except Imagination/PowerVR):  it is a tile-based architecture.  If you look at the Metal programming manual, there are numerous differences in how the two types of architectures need to be dealt with by the application.  Those differences will be even more severe at the OS/driver level.  Apple is quite aggressive at dropping old hardware in order to reduce their software burden (given, as mentioned above, software lifecycles are actually longer than hardware delivery cycles), and I would imagine they want to get to a place where (in Metal 4 or 5) they can completely focus on their hardware's tile-based architecture, and not have to accommodate AMD/nVidia designs.  Same applies to the rumoured "ray tracing" support -- they'll want to do it their way, and will ship it once they get the design working (maybe M3, as clearly missed the shipping schedule in 2022).

    The problem, of course, is that most of their market is in mobile devices and consumer computers.  So scaling to the workstation level is going to take a back seat, or at least have to make do with however well it scales.  In the M1 generation, if obviously didn't scale well enough to ship a Mac Pro.  Enough time has passed now that the market pressure for a Mac Pro means they're going to have to give it a go on the M2.  It is unlikely to live up to everyone's expectations.

    As others have observed above, I expect to see PCIe slots in the Mac Pro.  The Apple Silicon supports it for Thunderbolt 4.  What kinds of devices it can support will depend on which version they implement, how much power delivery they provide, and (most importantly) the device drivers.  Due to the architectural differences described above, I am not expecting to see GPUs from other vendors supported as display devices.  The market to use GPU cards as non-display devices (e.g. ML or other compute acceleration) is small and not interesting for Apple nor nVidia to support, so I doubt anyone will do the necessary driver development... but the hardware ought to function, so conceivably someone could do an open source driver (for non-display uses).  That's very niche though, so not holding my breath except as a cool demo.  Other kinds of PCIe cards will mostly be up to the card vendors to supply drivers for, and that will be similar to past Mac Pro support levels (porting those to Apple Silicon shouldn't be too painful for the devs).

    Memory expandability is another tricky issue.  While the chips they're embedding in their M1/M2 packaging are standard memory chips, they've not had to deal with the challenges of longer traces that go out to the motherboard and to DIMMs of varying quality.  So I'm doubtful they will switch to motherboard-based DIMMs in the Mac Pro.  Other strategies to get to large memory sizes are possible, particularly with OS support, but my guess is that they'll just live with whatever their M2 Ultra can manage -- not likely to exceed 128GB.  A PCIe RAM board as VM swap store, with minimal software support required, could offer an option to users who need huge memory footprints, but there is obviously an efficiency loss there.  Without a more sophisticated chip interconnect though, there aren't really many options open to Apple Silicon (yet)...

    The Ultra's edge interconnect allowed 2 M1 Max chips to connect.  Scaling that to more than 2 chips is non-trivial because high performance interconnects like that are very very sensitive to trace lengths, cross-talk noise between wires, etc.  Who knows what they've done in terms of the protocol across this interconnect though, their approach may support a different hardware implementation that could handle an interposer or perhaps 2 edges on a single chip.  Chip edge real estate is precious though -- you only get 4 of edges, and in the M1/M2 two of those edges are very busy providing the memory and I/O connections.  So perhaps they could have connectors on either end and go to 3 chips?  Or accept the inevitable latencies involved in an interposer/switch of some kind (not to mention how hard it is to build such a thing) and enable going to more "chiplets" including 3D stacking options?  Cramming all this physically in the same package gets tough and the manufacturing tech for doing this is still really expensive, so is Apple going to try and pioneer such a thing only for a very high end low volume product like the Mac Pro?  I'm doubtful they will at this stage... right now I think they need to get a low-risk Apple Silicon Mac Pro into the market with as much expandability as they can manage right now.  That won't be enough for everyone (nothing ever is), but it'll satisfy some of the market.  Going forward, this sort of chiplet-based fabric approach will become more mainstream and will make more sense for Apple to adopt across more of its product line, and that may give them better scaling opportunities at the Mac Pro end of the product line.

    So I think that, while the upcoming Mac Pro may not hit all the check boxes you'd like, that doesn't mean it is doomed going forward, and doesn't mean it won't find a worthwhile enough market.  Apple went through the 2013 Mac Pro fiasco, so they're aware of what the Mac Pro demand really is, they just may not be able to fully achieve it on this iteration.  They're late on their "two years to all Apple Silicon" promise (partly blamed on the pandemic), so no doubt they want to get something out there so they can stop supporting Intel chips ASAP.  

    cgWerkstenthousandthingsfastasleep
  • Reply 50 of 55
    fastasleepfastasleep Posts: 6,417member
    tht said:
    This level of integration doesn't come cheap. Heck, I'd like Apple to make an MBP16 with an Ultra to amortize the costs. ;)
    Yes please. As if my $6K MBP wasn't expensive enough, I might go for half the storage in exchange for an Ultra. :)
  • Reply 51 of 55
    I haven't seen it mentioned here that there is a fundamental architectural difference between the Apple Silicon GPU and pretty much every other GPU architecture (except Imagination/PowerVR):  it is a tile-based architecture.  If you look at the Metal programming manual, there are numerous differences in how the two types of architectures need to be dealt with by the application.  Those differences will be even more severe at the OS/driver level.  Apple is quite aggressive at dropping old hardware in order to reduce their software burden (given, as mentioned above, software lifecycles are actually longer than hardware delivery cycles), and I would imagine they want to get to a place where (in Metal 4 or 5) they can completely focus on their hardware's tile-based architecture, and not have to accommodate AMD/nVidia designs.  Same applies to the rumoured "ray tracing" support [...]
    Great post, really helpful, thank you for taking the time to write it. Pardon me if I'm wrong about this, but isn't everyone moving toward tile-based/chiplet-based architectures, not just Apple? I know AMD's RDNA 3 architecture uses "chiplets" (AMD's term for tiles). So even while Apple is moving toward dropping support in Metal 4 or 5 for older architectures, the rest of the industry is moving along with them. So accommodating AMD's current, chiplet-based 7000-series designs (and future designs) in the Apple Silicon lattice Mac Pro wouldn't necessarily be too much of a burden?
    cgWerks
  • Reply 52 of 55
    MarvinMarvin Posts: 15,322moderator
    I haven't seen it mentioned here that there is a fundamental architectural difference between the Apple Silicon GPU and pretty much every other GPU architecture (except Imagination/PowerVR):  it is a tile-based architecture.  If you look at the Metal programming manual, there are numerous differences in how the two types of architectures need to be dealt with by the application.  Those differences will be even more severe at the OS/driver level.  Apple is quite aggressive at dropping old hardware in order to reduce their software burden (given, as mentioned above, software lifecycles are actually longer than hardware delivery cycles), and I would imagine they want to get to a place where (in Metal 4 or 5) they can completely focus on their hardware's tile-based architecture, and not have to accommodate AMD/nVidia designs.  Same applies to the rumoured "ray tracing" support [...]
    Great post, really helpful, thank you for taking the time to write it. Pardon me if I'm wrong about this, but isn't everyone moving toward tile-based/chiplet-based architectures, not just Apple? I know AMD's RDNA 3 architecture uses "chiplets" (AMD's term for tiles). So even while Apple is moving toward dropping support in Metal 4 or 5 for older architectures, the rest of the industry is moving along with them. So accommodating AMD's current, chiplet-based 7000-series designs (and future designs) in the Apple Silicon lattice Mac Pro wouldn't necessarily be too much of a burden?
    Apple's tile-based GPU refers to the way it renders images, not the way the physical chips are structured:

    https://developer.apple.com/documentation/metal/tailor_your_apps_for_apple_gpus_and_tile-based_deferred_rendering
    https://www.rastergrid.com/blog/gpu-tech/2021/07/gpu-architecture-types-explained/

    AMD GPUs use physical chiplets but they use immediate mode rendering rather than tile-based rendering.

    This means that GPU drivers have to accommodate each rendering architecture. If Apple writes the drivers, they have to maintain both.

    They've already done this as Metal software runs ok on AMD GPUs. In the Blender Metal features, Apple added support for AMD and Intel:

    https://developer.blender.org/T92212

    They have an active install base over 100m who still have Intel/AMD and this will be the case for a few years.
    edited February 2023 programmerfastasleeptenthousandthings
  • Reply 53 of 55
    programmerprogrammer Posts: 3,458member
    Marvin said:
    Apple's tile-based GPU refers to the way it renders images, not the way the physical chips are structured:

    https://developer.apple.com/documentation/metal/tailor_your_apps_for_apple_gpus_and_tile-based_deferred_rendering
    https://www.rastergrid.com/blog/gpu-tech/2021/07/gpu-architecture-types-explained/

    AMD GPUs use physical chiplets but they use immediate mode rendering rather than tile-based rendering.

    This means that GPU drivers have to accommodate each rendering architecture. If Apple writes the drivers, they have to maintain both.

    They've already done this as Metal software runs ok on AMD GPUs. In the Blender Metal features, Apple added support for AMD and Intel:

    https://developer.blender.org/T92212

    They have an active install base over 100m who still have Intel/AMD and this will be the case for a few years.

    Marvin beat me to it.  Great answer, with reference links to back it up!  Thanks.

    While Apple is currently stuck supporting immediate mode rendering architecture, eventually they won't be... if they stop building new hardware that uses it.  And they can start adding new features which only support their preferred (tile-based) rendering architecture much sooner than that.  As soon as all the Intel and AMD chips are gone from their models currently on sale, their software people can move to a model where new graphics functionality might be "Apple Silicon only".  That's a harder sell if they're still selling Intel/AMD based hardware. 
    tenthousandthings
  • Reply 54 of 55
    cgWerkscgWerks Posts: 2,952member
    programmer said:
    Marvin beat me to it.  Great answer, with reference links to back it up!  Thanks.

    While Apple is currently stuck supporting immediate mode rendering architecture, eventually they won't be... if they stop building new hardware that uses it.  And they can start adding new features which only support their preferred (tile-based) rendering architecture much sooner than that.  As soon as all the Intel and AMD chips are gone from their models currently on sale, their software people can move to a model where new graphics functionality might be "Apple Silicon only".  That's a harder sell if they're still selling Intel/AMD based hardware. 
    Yeah, that TBR vs IMR article is great. I haven’t read it well yet, but just skimmed it quickly. I guess I’m wondering, though, once they get to 100% TBR, won’t they still be facing the downside trade-offs of that solution, compared to systems that stay IMR (ie. PCs with Nvidia/AMD)?

    Of course, as the article points out, there are benefits as well. Maybe Apple’s memory architecture reduces the downsides. But, unless the industry shifts, the benchmark will be the PC systems, and Apple’s systems will be criticized on situations where they can’t keep up. (ex: the article talked about high-complexity scenes, for example, where TBR isn’t as good.)
    edited February 2023
  • Reply 55 of 55
    programmerprogrammer Posts: 3,458member
    cgWerks said:
    Of course, as the article points out, there are benefits as well. Maybe Apple’s memory architecture reduces the downsides. But, unless the industry shifts, the benchmark will be the PC systems, and Apple’s systems will be criticized on situations where they can’t keep up. (ex: the article talked about high-complexity scenes, for example, where TBR isn’t as good.)
    This is always a risk of doing things differently.  Particularly if the different approach requires different software.  Apple's approach may be as good (or better, even) than everyone else's, but if it requires software devs to "think different" in order to realize that advantage, then Apple is going to get criticized over it.  This may or may not matter to Apple, depending on exactly what the issues are, or how much of an advantage they extract from it.  The TBR approach is typically well suited to mobile implementations, and Apple may be happy to have their non-mobile solutions suffer in a few applications (especially niche) in order to have their mobile solutions be significantly better.  And it may also be a generational thing -- with increased degrees of SoC integration and improved processes in the future, the TBR architecture may be able to extract more value from that than IMR architectures.  Its a very complex problem and nigh impossible to predict how things will play out.  Having its own silicon allows Apple to be "different"... that doesn't always mean "better" (and "better" depends on the lens being used).

    cgWerksfastasleep
Sign In or Register to comment.