tht said: The GPU really is a platform issue for Apple. Virtually all GPU compute code is optimized for CUDA, and hardly any for Metal. It's a long road ahead. They need to get developers to optimize their GPU compute code for Metal. They need to sell enough units for developers to make money. Those units have to be competitive to the competition. So, a Mac Pro will Apple Silicon needs to be rack mountable, be able to have 4 or 5 of those 128 g-core GPUs, consume 2x less power than Nvidia and AMD, and Apple needs to have their own developers optimize code for Metal. Like with Blender, but a lot of other open source software and closed source software. Otherwise, it is the same old same old. Apple's high end hardware is for content creation (FCP, Logic, etc), and not much else.
There are advantages of having the GPU right there on-chip, but given how well eGPUs competed with internal GPUs for so many things, how much of a benefit will it ultimately be? I wish Apple would go a BOTH/AND route.
Some of the answers as to why there isn't a more linear performance with the Studio Ultra maybe answered by looking at the history of AMD's Threadripper.
When the Threadripper was first released is was hailed for bringing huge numbers of CPU cores to the masses and in tests like Cinebench it excelled but testers and benchmarkers soon discovered that there was an inherent design flaw that if a piece of software used threads on separate CPU chiplets there was a latency penalty to be paid. IIRC AMD made a 'Gaming' mode which switched off software access to all but one chiplet. While the Threadripper presents itself to the system as a single CPU on the hardware level it is of course 4 mini CPUs flying in close formation which share a data bus to keep them all in sync. Depending on which cores were communicating this lead to tangible performance degradation. Software developers soon learned to keep all their threads on the one chiplet if possible. AMD has obviously done a lot of work on subsequent versions of Threadripper and Ryzen which have reduced these latency issues to virtually nil.
With the M1 architecture the situation is exacerbated as the M1 SOCs don't just have CPU cores but GPU cores and a whole host of other goodies. The lack of linear performance with the Ultra is probably due to the interchip latency either directly hitting performance or that the latency is so great that developers simply are not using the GPU or codec cores in the second SOC. It's early days and developers will be working to do what Threadripper developers learned to do and understand where the inefficiencies lie and work around them and limit the amount of core to core communication for a particular process and keep certain threads on one SOC. I can see this being particularly necessary with GPU based tasks which may be heavily impacted by the SOC to SOC bus.
Tasks like 3D rendering either on CPU or GPU are less likely to be impacted and show a near linear performance because rendering can easily be split evenly across cores.
I think the Mac Pro, if it has the expected 4x SOCs, will show even worse performance scaling just as the Threadrippers initially showed. I expect the Mac Pro to have an Apple discrete GPU but if this GPU is a PCIe based GPU I think it's function will be akin to a built-in eGPU as again the latencies of the PCIe bus will mean developers choose to use the integrated GPUs for most interactive workloads like video editing and the discrete GPU for heavy compute operations like science simulation calculations and rendering.
The M1 architecture is whole new beast and we have to temper our expectations and not believe all the marketing blurb Apple throws at us. In ideal situations the Studio Ultra will give near 2x performance but the problem is much of the software we run works in ways far from the ideal way for M1 which is why we're seeing a modest uplift with Studio Ultra over Studio Max to no uplift at all. It's all going to be application dependent. I've seen several benchmarks of video exports not showing any improvement, in this instance I expect it's not worth the latency penalty using the second bank of codecs for this task.
If I were buying a Studio it would be the Max version, right now it offers best value for money as you're getting the performance you're paying for. With the Ultra you're getting some of the performance some of the time.
If I'm right, in time Apple will reduce inter SOC communication latency just like AMD did with Threadripper so the scaling improves. Future versions of M (x) will scale much better and end up being better value for money.
I could be completely wrong but if I am explain why.
I think the Mac Pro, if it has the expected 4x SOCs, will show even worse performance scaling just as the Threadrippers initially showed. I expect the Mac Pro to have an Apple discrete GPU but if this GPU is a PCIe based GPU I think it's function will be akin to a built-in eGPU as again the latencies of the PCIe bus will mean developers choose to use the integrated GPUs for most interactive workloads like video editing and the discrete GPU for heavy compute operations like science simulation calculations and rendering.
You realise that all (modern) discrete GPUs use PCIe right? "Built in eGPU" isn't a thing, that's just a graphics card.
I think the Mac Pro, if it has the expected 4x SOCs, will show even worse performance scaling just as the Threadrippers initially showed. I expect the Mac Pro to have an Apple discrete GPU but if this GPU is a PCIe based GPU I think it's function will be akin to a built-in eGPU as again the latencies of the PCIe bus will mean developers choose to use the integrated GPUs for most interactive workloads like video editing and the discrete GPU for heavy compute operations like science simulation calculations and rendering.
You realise that all (modern) discrete GPUs use PCIe right? "Built in eGPU" isn't a thing, that's just a graphics card.
You missed the point. Well done.
I used the term eGPU to differentiate it from a typical discrete GPU usage. I think most people who are using eGPUs are not using them as frontline GPUs and are using them for compute applications. It's common for Mac 3D artists to use eGPUs for rendering. Did you trigger yourself before you completed reading the paragraph?
The M1 architecture uses integrated GPUs which makes communication with the CPU vastly more efficient by being on the same package and using the same unified memory. A discrete GPU sat on a PCIe bus no matter how powerful can perform worse than these integrated GPUs for certain tasks i.e. akin to an eGPU in the previous Intel generation of Macs. I thought the readership here was a bit more savvy to be honest.
I think the Mac Pro, if it has the expected 4x SOCs, will show even worse performance scaling just as the Threadrippers initially showed. I expect the Mac Pro to have an Apple discrete GPU but if this GPU is a PCIe based GPU I think it's function will be akin to a built-in eGPU as again the latencies of the PCIe bus will mean developers choose to use the integrated GPUs for most interactive workloads like video editing and the discrete GPU for heavy compute operations like science simulation calculations and rendering.
You realise that all (modern) discrete GPUs use PCIe right? "Built in eGPU" isn't a thing, that's just a graphics card.
You missed the point. Well done.
I used the term eGPU to differentiate it from a typical discrete GPU usage. I think most people who are using eGPUs are not using them as frontline GPUs and are using them for compute applications. It's common for Mac 3D artists to use eGPUs for rendering. Did you trigger yourself before you completed reading the paragraph?
The M1 architecture uses integrated GPUs which makes communication with the CPU vastly more efficient by being on the same package and using the same unified memory. A discrete GPU sat on a PCIe bus no matter how powerful can perform worse than these integrated GPUs for certain tasks i.e. akin to an eGPU in the previous Intel generation of Macs. I thought the readership here was a bit more savvy to be honest.
That's not the common usage of eGPU. Try using the same language as everyone else and a bit less of the tossing out of insults.
I think the Mac Pro, if it has the expected 4x SOCs, will show even worse performance scaling just as the Threadrippers initially showed. I expect the Mac Pro to have an Apple discrete GPU but if this GPU is a PCIe based GPU I think it's function will be akin to a built-in eGPU as again the latencies of the PCIe bus will mean developers choose to use the integrated GPUs for most interactive workloads like video editing and the discrete GPU for heavy compute operations like science simulation calculations and rendering.
You realise that all (modern) discrete GPUs use PCIe right? "Built in eGPU" isn't a thing, that's just a graphics card.
You missed the point. Well done.
I used the term eGPU to differentiate it from a typical discrete GPU usage. I think most people who are using eGPUs are not using them as frontline GPUs and are using them for compute applications. It's common for Mac 3D artists to use eGPUs for rendering. Did you trigger yourself before you completed reading the paragraph?
The M1 architecture uses integrated GPUs which makes communication with the CPU vastly more efficient by being on the same package and using the same unified memory. A discrete GPU sat on a PCIe bus no matter how powerful can perform worse than these integrated GPUs for certain tasks i.e. akin to an eGPU in the previous Intel generation of Macs. I thought the readership here was a bit more savvy to be honest.
That's not the common usage of eGPU. Try using the same language as everyone else and a bit less of the tossing out of insults.
I don't disagree with the rest of your post.
I'll try and use more Fisher-Price level language that leaves less for you to interpret incorrectly in the future.
I didn't interpret incorrectly, I understood you perfectly; I pointed out your incorrect use of a word.
Are you always this salty when someone finds fault with you? Maturity, it's a good look.
Clearly you STILL don't understand. I was clearly talking of its function and not its type, I even prefaced the term with 'akin to'. More intelligent people will have understood what I was saying, sorry you didn't and still don't.
I think the Mac Pro, if it has the expected 4x SOCs, will show even worse performance scaling just as the Threadrippers initially showed. I expect the Mac Pro to have an Apple discrete GPU but if this GPU is a PCIe based GPU I think it's function will be akin to a built-in eGPU as again the latencies of the PCIe bus will mean developers choose to use the integrated GPUs for most interactive workloads like video editing and the discrete GPU for heavy compute operations like science simulation calculations and rendering.
You realise that all (modern) discrete GPUs use PCIe right? "Built in eGPU" isn't a thing, that's just a graphics card.
You missed the point. Well done.
I used the term eGPU to differentiate it from a typical discrete GPU usage. I think most people who are using eGPUs are not using them as frontline GPUs and are using them for compute applications. It's common for Mac 3D artists to use eGPUs for rendering. Did you trigger yourself before you completed reading the paragraph?
The M1 architecture uses integrated GPUs which makes communication with the CPU vastly more efficient by being on the same package and using the same unified memory. A discrete GPU sat on a PCIe bus no matter how powerful can perform worse than these integrated GPUs for certain tasks i.e. akin to an eGPU in the previous Intel generation of Macs. I thought the readership here was a bit more savvy to be honest.
That's not the common usage of eGPU. Try using the same language as everyone else and a bit less of the tossing out of insults.
I don't disagree with the rest of your post.
I'll try and use more Fisher-Price level language that leaves less for you to interpret incorrectly in the future.
I didn't interpret incorrectly, I understood you perfectly; I pointed out your incorrect use of a word.
Are you always this salty when someone finds fault with you? Maturity, it's a good look.
Clearly you STILL don't understand. I was clearly talking of its function and not its type, I even prefaced the term with 'akin to'. More intelligent people will have understood what I was saying, sorry you didn't and still don't.
Dude, you confused your point when you said built-in eGPU (a contradiction in terms, built-in != external) when you simply meant a graphics card. Accept it and we can move on. Throwing a strop and calling other people stupid is just making you look unhinged.
As I said, I think you’re on the money in every other regard, a discrete graphics card has an interface bottleneck in PCIe (like eGPUs have with Thunderbolt) that Apple can negate with an integrated GPU by engineering their own direct interface. They seem to be very good at high speed interfaces.
Again, I know there is no such thing as a built in eGPU. I am also expecting that the reader to also understand there is no such thing as a built in eGPU, I used the term as shorthand to describe the function I believe a discrete GPU in the future Mac Pro will have as an additional compute device.
I expect the Mac Pro will come with 2x and 4x SOCs with 64/128 GPU cores but also with a PCIe discrete GPU option with 64, 128, or 256 GPU core options. I think the discrete GPU will be more of an additional compute device for heavy lifting and not used as the frontline GPU which will be managed by the integrated GPUs on the SOCs. I used the term built-in eGPU to denote the function of a discrete GPU in the Mac Pro as it would have a similar role to what people use eGPUs for today with current Intel Macs but it would also have inefficiencies associated with it due to it being a PCIe device like eGPUs have inefficiencies associated with their usage.
When I wrote that I expected people to understand the nuance and not immediately get triggered and say, 'Bruv, you know a built-in eGPU ain't a fing.'
I might be wrong and Apple is planning something completely different and they have found a way to have a discrete GPU that doesn't come with all the latency inefficiencies of sitting on a PCIe bus that they've gone out of their way to eradicate with the M1 design philosophy. Heck, they may not even bring a discrete GPU to market at all in which case the Mac Pro will be rather disappointing as far as raw compute performance is concerned.
UrbaneLegend said: ... I expect the Mac Pro will come with 2x and 4x SOCs with 64/128 GPU cores but also with a PCIe discrete GPU option with 64, 128, or 256 GPU core options. ... Heck, they may not even bring a discrete GPU to market at all in which case the Mac Pro will be rather disappointing as far as raw compute performance is concerned.
I sure hope they bring this discrete GPU, at least in the Mac Pro. I'd be even more excited if they brought back the eGPU (so it could be used with non-Mac Pros). I'm sure (as you noted in some other threads) there are advantages we'll end up seeing from the integrated GPU and huge amount of RAM available. But, a lot of modern GPU stuff isn't as much about the bus speed, and more about 'remote' processing power and special functionality (like hardware raytracing).
I guess the good news is that Apple seems quite good at this custom hardware, like we're seeing with video encoding performance. Hopefully they apply some of that to other GPU functions. Unfortunately, the more I think about it, I'd probably just go with the Studio Max version, as I think the Ultra is going to suffer 1st-gen regret, for some disciplines, at least.
tht said: The GPU really is a platform issue for Apple. Virtually all GPU compute code is optimized for CUDA, and hardly any for Metal. It's a long road ahead. They need to get developers to optimize their GPU compute code for Metal. They need to sell enough units for developers to make money. Those units have to be competitive to the competition. So, a Mac Pro will Apple Silicon needs to be rack mountable, be able to have 4 or 5 of those 128 g-core GPUs, consume 2x less power than Nvidia and AMD, and Apple needs to have their own developers optimize code for Metal. Like with Blender, but a lot of other open source software and closed source software. Otherwise, it is the same old same old. Apple's high end hardware is for content creation (FCP, Logic, etc), and not much else.
There are advantages of having the GPU right there on-chip, but given how well eGPUs competed with internal GPUs for so many things, how much of a benefit will it ultimately be? I wish Apple would go a BOTH/AND route.
Yeah, I'd hope that Apple would have the ambition to try to take market share in 3D, CAD, and other GPU compute markets in an "all of the above" strategy, but they are so far behind on the software side right now. They haven't really demonstrated that they want to take market share. If the M2 generation has raytracing hardware, which is a required feature in 3D markets now, that would be a sign that they want to, but that is 6 months out still.
Without the software being available on macOS, let alone optimized, there won't be any advantages. I can see them going through all the same thoughts and deciding it is not worth the effort. They are caught in a chicken and the egg situation with it. This type of situation is only solved by the OEM publishing and generating the software themselves. Developers are going to come just because the hardware is there. Same thing with 3D games.
tht said: Yeah, I'd hope that Apple would have the ambition to try to take market share in 3D, CAD, and other GPU compute markets in an "all of the above" strategy, but they are so far behind on the software side right now. They haven't really demonstrated that they want to take market share. If the M2 generation has raytracing hardware, which is a required feature in 3D markets now, that would be a sign that they want to, but that is 6 months out still.
Without the software being available on macOS, let alone optimized, there won't be any advantages. I can see them going through all the same thoughts and deciding it is not worth the effort. They are caught in a chicken and the egg situation with it. This type of situation is only solved by the OEM publishing and generating the software themselves. Developers are going to come just because the hardware is there. Same thing with 3D games.
Seems like a relatively simple solution, though. Why not just write the drivers and add AMD support back in, along with GPUs. Don't we get the best of both worlds then? Lower-end Mac users get WAY more GPU power than older Intel integrated solutions, mid-level users can take either approach, and higher end users can put boxes of GPUs with just the feature sets they need, together.
Yes, maybe I'll change my tune if I see Apple aggressively start adding hardware features like raytracing. I think if they can compete or take the lead, along with power savings and a more fresh reputation these days (no longer an underdog brand), they could actually pull a lot of software development their way.
But, you're also right in asking if they care. They must care a bit to help app devs like Blender. But, otherwise, they do seem mostly focused on video production. I get that, it is where they shine. But, I hope they go after the rest as well. Gaming, though, might be a lost cause.
Comments
When the Threadripper was first released is was hailed for bringing huge numbers of CPU cores to the masses and in tests like Cinebench it excelled but testers and benchmarkers soon discovered that there was an inherent design flaw that if a piece of software used threads on separate CPU chiplets there was a latency penalty to be paid. IIRC AMD made a 'Gaming' mode which switched off software access to all but one chiplet. While the Threadripper presents itself to the system as a single CPU on the hardware level it is of course 4 mini CPUs flying in close formation which share a data bus to keep them all in sync. Depending on which cores were communicating this lead to tangible performance degradation. Software developers soon learned to keep all their threads on the one chiplet if possible. AMD has obviously done a lot of work on subsequent versions of Threadripper and Ryzen which have reduced these latency issues to virtually nil.
With the M1 architecture the situation is exacerbated as the M1 SOCs don't just have CPU cores but GPU cores and a whole host of other goodies. The lack of linear performance with the Ultra is probably due to the interchip latency either directly hitting performance or that the latency is so great that developers simply are not using the GPU or codec cores in the second SOC. It's early days and developers will be working to do what Threadripper developers learned to do and understand where the inefficiencies lie and work around them and limit the amount of core to core communication for a particular process and keep certain threads on one SOC. I can see this being particularly necessary with GPU based tasks which may be heavily impacted by the SOC to SOC bus.
Tasks like 3D rendering either on CPU or GPU are less likely to be impacted and show a near linear performance because rendering can easily be split evenly across cores.
I think the Mac Pro, if it has the expected 4x SOCs, will show even worse performance scaling just as the Threadrippers initially showed. I expect the Mac Pro to have an Apple discrete GPU but if this GPU is a PCIe based GPU I think it's function will be akin to a built-in eGPU as again the latencies of the PCIe bus will mean developers choose to use the integrated GPUs for most interactive workloads like video editing and the discrete GPU for heavy compute operations like science simulation calculations and rendering.
The M1 architecture is whole new beast and we have to temper our expectations and not believe all the marketing blurb Apple throws at us. In ideal situations the Studio Ultra will give near 2x performance but the problem is much of the software we run works in ways far from the ideal way for M1 which is why we're seeing a modest uplift with Studio Ultra over Studio Max to no uplift at all. It's all going to be application dependent. I've seen several benchmarks of video exports not showing any improvement, in this instance I expect it's not worth the latency penalty using the second bank of codecs for this task.
If I were buying a Studio it would be the Max version, right now it offers best value for money as you're getting the performance you're paying for. With the Ultra you're getting some of the performance some of the time.
If I'm right, in time Apple will reduce inter SOC communication latency just like AMD did with Threadripper so the scaling improves. Future versions of M (x) will scale much better and end up being better value for money.
I could be completely wrong but if I am explain why.
I used the term eGPU to differentiate it from a typical discrete GPU usage. I think most people who are using eGPUs are not using them as frontline GPUs and are using them for compute applications. It's common for Mac 3D artists to use eGPUs for rendering. Did you trigger yourself before you completed reading the paragraph?
The M1 architecture uses integrated GPUs which makes communication with the CPU vastly more efficient by being on the same package and using the same unified memory. A discrete GPU sat on a PCIe bus no matter how powerful can perform worse than these integrated GPUs for certain tasks i.e. akin to an eGPU in the previous Intel generation of Macs. I thought the readership here was a bit more savvy to be honest.
I don't disagree with the rest of your post.
As I said, I think you’re on the money in every other regard, a discrete graphics card has an interface bottleneck in PCIe (like eGPUs have with Thunderbolt) that Apple can negate with an integrated GPU by engineering their own direct interface. They seem to be very good at high speed interfaces.
I expect the Mac Pro will come with 2x and 4x SOCs with 64/128 GPU cores but also with a PCIe discrete GPU option with 64, 128, or 256 GPU core options. I think the discrete GPU will be more of an additional compute device for heavy lifting and not used as the frontline GPU which will be managed by the integrated GPUs on the SOCs. I used the term built-in eGPU to denote the function of a discrete GPU in the Mac Pro as it would have a similar role to what people use eGPUs for today with current Intel Macs but it would also have inefficiencies associated with it due to it being a PCIe device like eGPUs have inefficiencies associated with their usage.
When I wrote that I expected people to understand the nuance and not immediately get triggered and say, 'Bruv, you know a built-in eGPU ain't a fing.'
I might be wrong and Apple is planning something completely different and they have found a way to have a discrete GPU that doesn't come with all the latency inefficiencies of sitting on a PCIe bus that they've gone out of their way to eradicate with the M1 design philosophy. Heck, they may not even bring a discrete GPU to market at all in which case the Mac Pro will be rather disappointing as far as raw compute performance is concerned.
I guess the good news is that Apple seems quite good at this custom hardware, like we're seeing with video encoding performance. Hopefully they apply some of that to other GPU functions. Unfortunately, the more I think about it, I'd probably just go with the Studio Max version, as I think the Ultra is going to suffer 1st-gen regret, for some disciplines, at least.
Without the software being available on macOS, let alone optimized, there won't be any advantages. I can see them going through all the same thoughts and deciding it is not worth the effort. They are caught in a chicken and the egg situation with it. This type of situation is only solved by the OEM publishing and generating the software themselves. Developers are going to come just because the hardware is there. Same thing with 3D games.
Yes, maybe I'll change my tune if I see Apple aggressively start adding hardware features like raytracing. I think if they can compete or take the lead, along with power savings and a more fresh reputation these days (no longer an underdog brand), they could actually pull a lot of software development their way.
But, you're also right in asking if they care. They must care a bit to help app devs like Blender. But, otherwise, they do seem mostly focused on video production. I get that, it is where they shine. But, I hope they go after the rest as well. Gaming, though, might be a lost cause.