Compared: 14-inch MacBook Pro vs. 13-inch M1 MacBook Pro vs. Intel 13-inch MacBook Pro

2

Comments

  • Reply 21 of 44
    I wonder what the actual dimensions of the M1 Max chip are thinking about the square Mac mini form factor? The 14" laptop size is partially determined by the length of the keyboard with speakers on either side and a very usable trackpad below.
    watto_cobra
  • Reply 22 of 44
    crowleycrowley Posts: 10,453member
    Marvin said:
    tht said:
    So if the 4x claim is true, what is a plausible explanation? 
    It’s on the website:

    Prerelease Cinema 4D S25 real-time 3D performance tested using a 1.98GB scene. 

    Prerelease Cinema 4D S25 and prerelease Redshift v3.0.54 tested using a 1.32GB scene.

    Tested with prerelease Affinity Photo 1.10.2.263 using the built-in benchmark version 11021. 

    These all are 4x (M1 Max) and 2x (M1 Pro) over the 5600M.
    Right, it's 4x faster than the 5600M but what I'm trying to say is that we cannot use the GeekBench numbers as a guide, which would match the M1 Max up to a 20+ TeraFLOPS AMD Radeon Pro W6900X. But the M1 Max clear match the top AMD or NVIDA mobile chips, using 40% less power.

    One more chip to go! The M1 Max Extreme or whatever they call it. If it's a 64-core GPU it will match the top card but to match the dual cards it will have to be 128-cores. That's going to be one big wafer. If the power saving scale up, it's going to be a game changer.
    Some tests will be affected by the memory setup. The 5600M has 8GB of video memory. If they load up a 3D scene with 4K/8K textures, it can run out of memory and then it has to keep swapping textures with system memory (similar to a system paging when it runs out of system memory). The M1 Max chip has up to 64GB of unified memory so it can keep everything in memory.

    There was an image posted about the chip designs where the higher end chips will be multiples of the M1 Max, at least 2x and 4x. Maybe they will have a 3x too but that wouldn't be needed. They could call them Extreme and Ultimate or they could even use Duo like they do for the Radeon Pro Duo. M1 Max Duo, M1 Max Quad.

    The largest would be 16x the size of the M1 chip, much like the Threadripper chips:



    The M1 Max has 57 billion transistors, the 3990x Threadripper has around 40 billion (just for CPU, no GPU). 4x the M1 Max will have 228 billion if they just take multiples of the chips. If they multiply the cores separately then it will be a bit less.

    It will be a very powerful chip and even crazier to think it would be able to fit into the 2013 Mac Pro cylinder enclosure. If Intel/AMD had kept up, it could have been with their chips but they didn't and Apple had to revert back to an old form factor to accommodate their inefficient hardware.
    Does the integration of the memory and GPU on the M-series SoCs not create issues for multiple CPU architectures?  Seems like it might (I claim no expertise here, just guessing).
    watto_cobra
  • Reply 23 of 44
    thttht Posts: 5,421member
    tht said:
    So if the 4x claim is true, what is a plausible explanation? 
    It’s on the website:

    Prerelease Cinema 4D S25 real-time 3D performance tested using a 1.98GB scene. 

    Prerelease Cinema 4D S25 and prerelease Redshift v3.0.54 tested using a 1.32GB scene.

    Tested with prerelease Affinity Photo 1.10.2.263 using the built-in benchmark version 11021. 

    These all are 4x (M1 Max) and 2x (M1 Pro) over the 5600M.
    Right, it's 4x faster than the 5600M but what I'm trying to say is that we cannot use the GeekBench numbers as a guide, which would match the M1 Max up to a 20+ TeraFLOPS AMD Radeon Pro W6900X. But the M1 Max clear match the top AMD or NVIDA mobile chips, using 40% less power.

    One more chip to go! The M1 Max Extreme or whatever they call it. If it's a 64-core GPU it will match the top card but to match the dual cards it will have to be 128-cores. That's going to be one big wafer. If the power saving scale up, it's going to be a game changer.
    Well, we can't go either way since GPUs at this level are quite task specific. So, buyers really should be looking for measures of performance based on their workflow. Same story on multicore CPU performance.

    I was just answering your question. The 2x and 4x over the 5600M are likely based on benchmark testing of Cinema4D and Affinity Photo. For GB5 Metal compute, it's running a suite of processes that theoretically exercise different aspects of the GPU. So, it's a kind of a broadband measure of GPU performance, but probably doesn't cover them all. Apple's unique advantage is have access to 32/64 GB of memory, and GB5 doesn't test for memory constrained processes that is an advantage for Apple here. So YMMV and find tests measuring one's workflow.
    watto_cobra
  • Reply 24 of 44
    thttht Posts: 5,421member
    ApplePoor said:
    I wonder what the actual dimensions of the M1 Max chip are thinking about the square Mac mini form factor? The 14" laptop size is partially determined by the length of the keyboard with speakers on either side and a very usable trackpad below.
    Based on a 425 mm2 SoC size, the M1 Max MCM is 1676 mm2, or 16.76 cm2, or 2.6 in2. Seems smaller than it looks! Maybe my math is wrong.

    For the Mac mini, the M1 Max MCM won't be the problem. It will be the cooling system. The current Mac mini is big enough that it won't be a problem, even with its odd shape for cooling and all. The rumor is that the new Mac mini will be thinner, smaller. Maybe they only plan on using the M1 Pro for the Mac mini?
    watto_cobra
  • Reply 25 of 44
    MarvinMarvin Posts: 15,310moderator
    crowley said:
    Marvin said:
    tht said:
    So if the 4x claim is true, what is a plausible explanation? 
    It’s on the website:

    Prerelease Cinema 4D S25 real-time 3D performance tested using a 1.98GB scene. 

    Prerelease Cinema 4D S25 and prerelease Redshift v3.0.54 tested using a 1.32GB scene.

    Tested with prerelease Affinity Photo 1.10.2.263 using the built-in benchmark version 11021. 

    These all are 4x (M1 Max) and 2x (M1 Pro) over the 5600M.
    Right, it's 4x faster than the 5600M but what I'm trying to say is that we cannot use the GeekBench numbers as a guide, which would match the M1 Max up to a 20+ TeraFLOPS AMD Radeon Pro W6900X. But the M1 Max clear match the top AMD or NVIDA mobile chips, using 40% less power.

    One more chip to go! The M1 Max Extreme or whatever they call it. If it's a 64-core GPU it will match the top card but to match the dual cards it will have to be 128-cores. That's going to be one big wafer. If the power saving scale up, it's going to be a game changer.
    Some tests will be affected by the memory setup. The 5600M has 8GB of video memory. If they load up a 3D scene with 4K/8K textures, it can run out of memory and then it has to keep swapping textures with system memory (similar to a system paging when it runs out of system memory). The M1 Max chip has up to 64GB of unified memory so it can keep everything in memory.

    There was an image posted about the chip designs where the higher end chips will be multiples of the M1 Max, at least 2x and 4x. Maybe they will have a 3x too but that wouldn't be needed. They could call them Extreme and Ultimate or they could even use Duo like they do for the Radeon Pro Duo. M1 Max Duo, M1 Max Quad.

    The largest would be 16x the size of the M1 chip, much like the Threadripper chips:

    The M1 Max has 57 billion transistors, the 3990x Threadripper has around 40 billion (just for CPU, no GPU). 4x the M1 Max will have 228 billion if they just take multiples of the chips. If they multiply the cores separately then it will be a bit less.

    It will be a very powerful chip and even crazier to think it would be able to fit into the 2013 Mac Pro cylinder enclosure. If Intel/AMD had kept up, it could have been with their chips but they didn't and Apple had to revert back to an old form factor to accommodate their inefficient hardware.
    Does the integration of the memory and GPU on the M-series SoCs not create issues for multiple CPU architectures?  Seems like it might (I claim no expertise here, just guessing).
    The image they showed at the event of the chip was this:



    People have x-rayed the M1 chip to see the layout:



    https://s3.i-micronews.com/uploads/2020/12/SP20608-Yole-Apple-M1-System-on-Chip-Flyer-1.pdf
    https://www.eetasia.com/teardown-identifying-apple-m1s-distinct-circuit-blocks/
    https://www.systemplus.fr/wp-content/uploads/2020/12/SP20608-Apple-M1-System-on-Chip-Sample.pdf

    They can scale the processing units separately from the memory. This would allow them to sell higher-end units with lower amounts of RAM.

    I expect the 27" iMacs to offer M1 Pro, Max and Max Duo chips starting at 16GB RAM and going up to 128GB RAM. Mac Pro if there is one would be Max Duo and Quad, likely starting at 32GB and going up to 256GB - this config could easily go in an iMac too.
    watto_cobra
  • Reply 26 of 44
    My order has changed to "preparing to ship" and my Apple Wallet shows there will be 6% cash back on both the extended warranty and the computer.
    watto_cobra
  • Reply 27 of 44
    XedXed Posts: 2,519member
    ApplePoor said:
    My order has changed to "preparing to ship" and my Apple Wallet shows there will be 6% cash back on both the extended warranty and the computer.
    How are you getting 6%?
    watto_cobra
  • Reply 28 of 44
    Some recent computer orders seem to be getting that percentage.
    watto_cobra
  • Reply 29 of 44
    Marvin said:
    crowley said:
    Marvin said:
    tht said:
    So if the 4x claim is true, what is a plausible explanation? 
    It’s on the website:

    Prerelease Cinema 4D S25 real-time 3D performance tested using a 1.98GB scene. 

    Prerelease Cinema 4D S25 and prerelease Redshift v3.0.54 tested using a 1.32GB scene.

    Tested with prerelease Affinity Photo 1.10.2.263 using the built-in benchmark version 11021. 

    These all are 4x (M1 Max) and 2x (M1 Pro) over the 5600M.
    Right, it's 4x faster than the 5600M but what I'm trying to say is that we cannot use the GeekBench numbers as a guide, which would match the M1 Max up to a 20+ TeraFLOPS AMD Radeon Pro W6900X. But the M1 Max clear match the top AMD or NVIDA mobile chips, using 40% less power.

    One more chip to go! The M1 Max Extreme or whatever they call it. If it's a 64-core GPU it will match the top card but to match the dual cards it will have to be 128-cores. That's going to be one big wafer. If the power saving scale up, it's going to be a game changer.
    Some tests will be affected by the memory setup. The 5600M has 8GB of video memory. If they load up a 3D scene with 4K/8K textures, it can run out of memory and then it has to keep swapping textures with system memory (similar to a system paging when it runs out of system memory). The M1 Max chip has up to 64GB of unified memory so it can keep everything in memory.

    There was an image posted about the chip designs where the higher end chips will be multiples of the M1 Max, at least 2x and 4x. Maybe they will have a 3x too but that wouldn't be needed. They could call them Extreme and Ultimate or they could even use Duo like they do for the Radeon Pro Duo. M1 Max Duo, M1 Max Quad.

    The largest would be 16x the size of the M1 chip, much like the Threadripper chips:

    The M1 Max has 57 billion transistors, the 3990x Threadripper has around 40 billion (just for CPU, no GPU). 4x the M1 Max will have 228 billion if they just take multiples of the chips. If they multiply the cores separately then it will be a bit less.

    It will be a very powerful chip and even crazier to think it would be able to fit into the 2013 Mac Pro cylinder enclosure. If Intel/AMD had kept up, it could have been with their chips but they didn't and Apple had to revert back to an old form factor to accommodate their inefficient hardware.
    Does the integration of the memory and GPU on the M-series SoCs not create issues for multiple CPU architectures?  Seems like it might (I claim no expertise here, just guessing).
    The image they showed at the event of the chip was this:



    People have x-rayed the M1 chip to see the layout:



    https://s3.i-micronews.com/uploads/2020/12/SP20608-Yole-Apple-M1-System-on-Chip-Flyer-1.pdf
    https://www.eetasia.com/teardown-identifying-apple-m1s-distinct-circuit-blocks/
    https://www.systemplus.fr/wp-content/uploads/2020/12/SP20608-Apple-M1-System-on-Chip-Sample.pdf

    They can scale the processing units separately from the memory. This would allow them to sell higher-end units with lower amounts of RAM.

    I expect the 27" iMacs to offer M1 Pro, Max and Max Duo chips starting at 16GB RAM and going up to 128GB RAM. Mac Pro if there is one would be Max Duo and Quad, likely starting at 32GB and going up to 256GB - this config could easily go in an iMac too.
    I think you’re right except that the M1 Extreme will be a single chip. 256GB of Ram sounds right but I still have my doubts about big standard DIMMs. 
    watto_cobra
  • Reply 30 of 44
    Marvin said:
    crowley said:
    Marvin said:
    tht said:
    So if the 4x claim is true, what is a plausible explanation? 
    It’s on the website:

    Prerelease Cinema 4D S25 real-time 3D performance tested using a 1.98GB scene. 

    Prerelease Cinema 4D S25 and prerelease Redshift v3.0.54 tested using a 1.32GB scene.

    Tested with prerelease Affinity Photo 1.10.2.263 using the built-in benchmark version 11021. 

    These all are 4x (M1 Max) and 2x (M1 Pro) over the 5600M.
    Right, it's 4x faster than the 5600M but what I'm trying to say is that we cannot use the GeekBench numbers as a guide, which would match the M1 Max up to a 20+ TeraFLOPS AMD Radeon Pro W6900X. But the M1 Max clear match the top AMD or NVIDA mobile chips, using 40% less power.

    One more chip to go! The M1 Max Extreme or whatever they call it. If it's a 64-core GPU it will match the top card but to match the dual cards it will have to be 128-cores. That's going to be one big wafer. If the power saving scale up, it's going to be a game changer.
    Some tests will be affected by the memory setup. The 5600M has 8GB of video memory. If they load up a 3D scene with 4K/8K textures, it can run out of memory and then it has to keep swapping textures with system memory (similar to a system paging when it runs out of system memory). The M1 Max chip has up to 64GB of unified memory so it can keep everything in memory.

    There was an image posted about the chip designs where the higher end chips will be multiples of the M1 Max, at least 2x and 4x. Maybe they will have a 3x too but that wouldn't be needed. They could call them Extreme and Ultimate or they could even use Duo like they do for the Radeon Pro Duo. M1 Max Duo, M1 Max Quad.

    The largest would be 16x the size of the M1 chip, much like the Threadripper chips:

    The M1 Max has 57 billion transistors, the 3990x Threadripper has around 40 billion (just for CPU, no GPU). 4x the M1 Max will have 228 billion if they just take multiples of the chips. If they multiply the cores separately then it will be a bit less.

    It will be a very powerful chip and even crazier to think it would be able to fit into the 2013 Mac Pro cylinder enclosure. If Intel/AMD had kept up, it could have been with their chips but they didn't and Apple had to revert back to an old form factor to accommodate their inefficient hardware.
    Does the integration of the memory and GPU on the M-series SoCs not create issues for multiple CPU architectures?  Seems like it might (I claim no expertise here, just guessing).
    The image they showed at the event of the chip was this:

    People have x-rayed the M1 chip to see the layout:

    https://s3.i-micronews.com/uploads/2020/12/SP20608-Yole-Apple-M1-System-on-Chip-Flyer-1.pdf
    https://www.eetasia.com/teardown-identifying-apple-m1s-distinct-circuit-blocks/
    https://www.systemplus.fr/wp-content/uploads/2020/12/SP20608-Apple-M1-System-on-Chip-Sample.pdf

    They can scale the processing units separately from the memory. This would allow them to sell higher-end units with lower amounts of RAM.

    I expect the 27" iMacs to offer M1 Pro, Max and Max Duo chips starting at 16GB RAM and going up to 128GB RAM. Mac Pro if there is one would be Max Duo and Quad, likely starting at 32GB and going up to 256GB - this config could easily go in an iMac too.
    Zero chance they put a Pro or Max in the M1. No fabled headless Mac for you. It will be the M2 with an 8-core CPU/GPU and up to 32GB or 64GB of RAM. They are simply not going to cannibalize iMac sales. My guess 6 performance and 2 efficiency cores.
    edited October 2021 watto_cobra
  • Reply 31 of 44
    Marvin said:
    crowley said:
    Does the integration of the memory and GPU on the M-series SoCs not create issues for multiple CPU architectures?  Seems like it might (I claim no expertise here, just guessing).
    [...]
    They can scale the processing units separately from the memory. This would allow them to sell higher-end units with lower amounts of RAM.

    I expect the 27" iMacs to offer M1 Pro, Max and Max Duo chips starting at 16GB RAM and going up to 128GB RAM. Mac Pro if there is one would be Max Duo and Quad, likely starting at 32GB and going up to 256GB - this config could easily go in an iMac too.
    No, Crowley's right. There's a BIG issue here, and how they solve it is going to fascinate (and possibly terrify!) a lot of people.

    The problem is that it's not easy to build massively parallel CPUs. One reason is the need for more memory bandwidth. Look at GPUs to see how this can be dealt with. Memory bandwidth is a huge factor in performance, which is why high-end ones all use GDDR5 (or 6) and they all have wide busses (like, 384 bits wide!)... except for the ones using HBM RAM, which is just that trend taken to an even further extreme (wider buses, though somewhat slower gbps/pin). This is all very expensive, and so CPUs (with sometimes an order or two of magnitude more RAM) have been slower getting there - the largest have only 8 channels of DDR4 RAM.

    The other big challenge is just getting all the cores to be able to talk to each other and to the RAM behind memory controllers that are part of a remote core's cluster/complex. Dealing with cache coherency (or deciding not to), how many layers of cache, etc., is all part of the deepest wizardry. And whether you build a giant mesh or several rings, or do something else, power is a huge issue. It's been estimated that the uncore in the biggest EPYCs can consume ~50% of the entire power budget of the chip.

    Building a Mac Pro with 4x M1 Maxes would be *very* tricky. In fact, I'll say right out that you can't just do that. You could start with the M1 Max and change it in various ways to get to where you need to go, but each choice you make comes with constraints and trade-offs.

    For example, say you decided to get as close as you can to just putting four M1 Maxes in a box. You already face an enormous challenge, which is to minimize latency between far cores (that is, core on one max chip trying to talk to another, or, more to the point, talking to the RAM managed by that other core). There's also the issue of what you do about cache coherency. Do you build a giant LLC that sits in the middle which all the SLCs talk to? Does all memory access go through that? (Hint: almost certainly not.) This would look somewhat like a hypertrophied Threadripper or Epyc. The chances that they could make this work are pretty poor, unless they bring in some VERY new technology - which is entirely possible. In particular, if they go all-in on TSMC "3DFabric" tech (InFO, CoWoS, etc.) and possibly work hard on some integrated cooling tech, then maybe this could be possible, mostly due to the ridiculously low power consumption they're hitting.

    But this leaves out a very important question. What do you do about that RAM, actually? How do you even get room for enough traces to simply talk to all that RAM? And if you want anything even remotely like the memory capacity the current Intel chips have, how are you going to accomplish that? It may be that only HBM can even get you close to that, and that is *extremely* expensive. Like, prohibitively so except for high-end Pro buyers.

    Fundamentally, the biggest issue is that the integrated close RAM that's such a big part of their performance magic is just not scalable. There's physically not enough room for it or its traces, since you need a lot of that room for inter-CPU links (which would push the RAM to... where?). You can't just fill up a larger diameter as speed-of-light issues will start to affect latency.

    But they have a plan. They *are* going to solve this. And whether that involves 3D stacking of some sort, order-of-magnitude larger caches, HBM... we don't know yet. It might involve radical cooling solutions. And there's always the possibility of them doing something really new, which is of course the most exciting possibility of all.

    Really we don't even know if they're going to maintain the unified memory. It seems at least plausible that they won't given the call from some quarters for multiple GPUs. I wouldn't bet on it though.

    So... in a few months to a year, we'll learn what the answer is. Don't think for a second they won't have a good answer. But it's going to be a surprise, whatever it is, and you're going to see idiots all over the net going "that'll never work, it's lies!" until the benchmarks come out. And whatever it is, it will be *damn* impressive. And 99% of the people using them will never understand that. Oh well. :-)
    edited October 2021 crowleywatto_cobra
  • Reply 32 of 44
    Just to follow up to that last post - HBM won't get them even close to where they need to go, if they want large memory support. It's worse than DDRx. Where I was going with that suggestion, but left out, is the possibility that they might have two memory zones, near/fast and remote/slow. Say, up to 96 or 192GB of HBM3, or 256GB LPDDR5 very close to the chips and directly controlled by them, with another 4TB handled by more traditional controllers, linked by a large LLC combined with more memory channels/controllers. That however would require a huge investment in OS R&D, to get a good handle on how to deal with the memory zones.

    This would look to some extent like NUMA in multichip Intel boxes (or maybe moreso, first-gen EPYCs), and if there's one thing we've learned from that, it's that NUMA is a pain in the ass and often not handled well. Apple could probably handle that, but it might also require cooperation from Apps to maximize performance, and that... is mostly not very likely to happen.
  • Reply 33 of 44
    crowleycrowley Posts: 10,453member
    Marvin said:
    crowley said:
    Does the integration of the memory and GPU on the M-series SoCs not create issues for multiple CPU architectures?  Seems like it might (I claim no expertise here, just guessing).
    [...]
    They can scale the processing units separately from the memory. This would allow them to sell higher-end units with lower amounts of RAM.

    I expect the 27" iMacs to offer M1 Pro, Max and Max Duo chips starting at 16GB RAM and going up to 128GB RAM. Mac Pro if there is one would be Max Duo and Quad, likely starting at 32GB and going up to 256GB - this config could easily go in an iMac too.
    No, Crowley's right. There's a BIG issue here, and how they solve it is going to fascinate (and possibly terrify!) a lot of people.

    The problem is that it's not easy to build massively parallel CPUs. One reason is the need for more memory bandwidth. Look at GPUs to see how this can be dealt with. Memory bandwidth is a huge factor in performance, which is why high-end ones all use GDDR5 (or 6) and they all have wide busses (like, 384 bits wide!)... except for the ones using HBM RAM, which is just that trend taken to an even further extreme (wider buses, though somewhat slower gbps/pin). This is all very expensive, and so CPUs (with sometimes an order or two of magnitude more RAM) have been slower getting there - the largest have only 8 channels of DDR4 RAM.

    The other big challenge is just getting all the cores to be able to talk to each other and to the RAM behind memory controllers that are part of a remote core's cluster/complex. Dealing with cache coherency (or deciding not to), how many layers of cache, etc., is all part of the deepest wizardry. And whether you build a giant mesh or several rings, or do something else, power is a huge issue. It's been estimated that the uncore in the biggest EPYCs can consume ~50% of the entire power budget of the chip.

    Building a Mac Pro with 4x M1 Maxes would be *very* tricky. In fact, I'll say right out that you can't just do that. You could start with the M1 Max and change it in various ways to get to where you need to go, but each choice you make comes with constraints and trade-offs.

    For example, say you decided to get as close as you can to just putting four M1 Maxes in a box. You already face an enormous challenge, which is to minimize latency between far cores (that is, core on one max chip trying to talk to another, or, more to the point, talking to the RAM managed by that other core). There's also the issue of what you do about cache coherency. Do you build a giant LLC that sits in the middle which all the SLCs talk to? Does all memory access go through that? (Hint: almost certainly not.) This would look somewhat like a hypertrophied Threadripper or Epyc. The chances that they could make this work are pretty poor, unless they bring in some VERY new technology - which is entirely possible. In particular, if they go all-in on TSMC "3DFabric" tech (InFO, CoWoS, etc.) and possibly work hard on some integrated cooling tech, then maybe this could be possible, mostly due to the ridiculously low power consumption they're hitting.

    But this leaves out a very important question. What do you do about that RAM, actually? How do you even get room for enough traces to simply talk to all that RAM? And if you want anything even remotely like the memory capacity the current Intel chips have, how are you going to accomplish that? It may be that only HBM can even get you close to that, and that is *extremely* expensive. Like, prohibitively so except for high-end Pro buyers.

    Fundamentally, the biggest issue is that the integrated close RAM that's such a big part of their performance magic is just not scalable. There's physically not enough room for it or its traces, since you need a lot of that room for inter-CPU links (which would push the RAM to... where?). You can't just fill up a larger diameter as speed-of-light issues will start to affect latency.

    But they have a plan. They *are* going to solve this. And whether that involves 3D stacking of some sort, order-of-magnitude larger caches, HBM... we don't know yet. It might involve radical cooling solutions. And there's always the possibility of them doing something really new, which is of course the most exciting possibility of all.

    Really we don't even know if they're going to maintain the unified memory. It seems at least plausible that they won't given the call from some quarters for multiple GPUs. I wouldn't bet on it though.

    So... in a few months to a year, we'll learn what the answer is. Don't think for a second they won't have a good answer. But it's going to be a surprise, whatever it is, and you're going to see idiots all over the net going "that'll never work, it's lies!" until the benchmarks come out. And whatever it is, it will be *damn* impressive. And 99% of the people using them will never understand that. Oh well. :-)
    Thanks, I figured that something like would be a problem, it makes sense that multiple systems on a chip on the same computer would present challenges.  The integration of the SoC means that parallelisation would be more akin to Xgrid than Grand Central Dispatch, so they'd need some very beefy controller chips to manage everything without latency.
    watto_cobra
  • Reply 34 of 44
    MarvinMarvin Posts: 15,310moderator
    Zero chance they put a Pro or Max in the M1. No fabled headless Mac for you. It will be the M2 with an 8-core CPU/GPU and up to 32GB or 64GB of RAM. They are simply not going to cannibalize iMac sales. My guess 6 performance and 2 efficiency cores.
    If you meant the mini there, they typically don't put good chips in it. I don't think it would be needed either. If people want this headless setup, they can just buy a 14" MBP with Max chip and put it in a dock. It's a bit expensive with the best GPU at $3k+ but at least it's an option. A mini with similar setup would likely be $2k+ anyway. Not everybody likes the idea of using a docked laptop over a desktop but besides the price, it offers exactly what people ask for in a headless Mac with the good GPUs. I wouldn't rule out the possibility of the mini getting higher end chips though, they can even ship the previous generation chips in it when M2 is out.
    What do you do about that RAM, actually? How do you even get room for enough traces to simply talk to all that RAM? And if you want anything even remotely like the memory capacity the current Intel chips have, how are you going to accomplish that? It may be that only HBM can even get you close to that, and that is *extremely* expensive. Like, prohibitively so except for high-end Pro buyers.

    Fundamentally, the biggest issue is that the integrated close RAM that's such a big part of their performance magic is just not scalable. There's physically not enough room for it or its traces, since you need a lot of that room for inter-CPU links (which would push the RAM to... where?). You can't just fill up a larger diameter as speed-of-light issues will start to affect latency.

    But they have a plan. They *are* going to solve this. And whether that involves 3D stacking of some sort, order-of-magnitude larger caches, HBM... we don't know yet. It might involve radical cooling solutions. And there's always the possibility of them doing something really new, which is of course the most exciting possibility of all.

    Really we don't even know if they're going to maintain the unified memory. It seems at least plausible that they won't given the call from some quarters for multiple GPUs. I wouldn't bet on it though.
    HBM would be expensive for iMac models but the price is ok for Mac Pro level. Here it estimates 16GB HMB2 at $320:

    https://www.fudzilla.com/news/graphics/48019-radeon-vii-16gb-hbm-2-memory-cost-around-320

    $1280 for 64GB, $2560 for 128GB. That's not a lot of money for that much memory. An upcoming Radeon GPU is reported to offer up to 128GB HBM2E:

    https://www.tomshardware.com/news/amd-aldebaran-memory-subsystem-detailed

    Intel will use HBM too:

    https://www.tweaktown.com/news/80272/intel-confirms-sapphire-rapids-cpus-will-use-hbm-drops-in-late-2022/index.html

    It says here there will be a successor to HBM in late 2022:

    https://www.pcgamer.com/an-ultra-bandwidth-successor-to-hbm2e-memory-is-coming-but-not-until-2022/

    I don't think the links between chips are as important. Supercomputers are made up of separate machines. The separate GPUs in the current Mac Pro are connected by Infinity Fabric at 84GB/s. M1 Max has 400GB/s memory bandwidth.

    A lot of tasks that work in parallel can just be moved to the other chips for example processing separate frames in video software or separate render buckets in 3D software.

    But as you say, we can assume they've planned it out. They employ experts in their field so they'll have a solution to scale things up. If it's good enough for Intel to do this in server chips and AMD to do it in their GPUs, it should be fine for Apple to do it too:

    https://www.nextplatform.com/2021/08/19/intel-finally-gets-chiplet-religion-with-server-chips/




    edited October 2021 williamlondonwatto_cobra
  • Reply 35 of 44
    Marvin said:
    What do you do about that RAM, actually? How do you even get room for enough traces to simply talk to all that RAM? And if you want anything even remotely like the memory capacity the current Intel chips have, how are you going to accomplish that? It may be that only HBM can even get you close to that, and that is *extremely* expensive. Like, prohibitively so except for high-end Pro buyers.

    Fundamentally, the biggest issue is that the integrated close RAM that's such a big part of their performance magic is just not scalable. There's physically not enough room for it or its traces, since you need a lot of that room for inter-CPU links (which would push the RAM to... where?). You can't just fill up a larger diameter as speed-of-light issues will start to affect latency.

    But they have a plan. They *are* going to solve this. And whether that involves 3D stacking of some sort, order-of-magnitude larger caches, HBM... we don't know yet. It might involve radical cooling solutions. And there's always the possibility of them doing something really new, which is of course the most exciting possibility of all.

    Really we don't even know if they're going to maintain the unified memory. It seems at least plausible that they won't given the call from some quarters for multiple GPUs. I wouldn't bet on it though.
    HBM would be expensive for iMac models but the price is ok for Mac Pro level. Here it estimates 16GB HMB2 at $320:

    https://www.fudzilla.com/news/graphics/48019-radeon-vii-16gb-hbm-2-memory-cost-around-320

    $1280 for 64GB, $2560 for 128GB. That's not a lot of money for that much memory. An upcoming Radeon GPU is reported to offer up to 128GB HBM2E:

    https://www.tomshardware.com/news/amd-aldebaran-memory-subsystem-detailed

    Intel will use HBM too:

    https://www.tweaktown.com/news/80272/intel-confirms-sapphire-rapids-cpus-will-use-hbm-drops-in-late-2022/index.html

    It says here there will be a successor to HBM in late 2022:

    https://www.pcgamer.com/an-ultra-bandwidth-successor-to-hbm2e-memory-is-coming-but-not-until-2022/

    I don't think the links between chips are as important. Supercomputers are made up of separate machines. The separate GPUs in the current Mac Pro are connected by Infinity Fabric at 84GB/s. M1 Max has 400GB/s memory bandwidth.

    A lot of tasks that work in parallel can just be moved to the other chips for example processing separate frames in video software or separate render buckets in 3D software.

    But as you say, we can assume they've planned it out. They employ experts in their field so they'll have a solution to scale things up. If it's good enough for Intel to do this in server chips and AMD to do it in their GPUs, it should be fine for Apple to do it too:

    https://www.nextplatform.com/2021/08/19/intel-finally-gets-chiplet-religion-with-server-chips/
    [graphics removed]
    You're missing several key points.

    SK Hynix announced their HBM3 for shipment next year just a couple days ago. Max stack height is 12, total stack capacity is 24GB. That requires *1024* pins for data. *Per stack*. You just can't get enough capacity using HBM (as I pointed out right after the post you quoted). So HBM might be a component of their solution, but it can't be the entire answer, unless they're willing to completely give up on large-memory configurations. That seems unlikely, but... no way to know for now.

    Your larger error is in thinking interchip links aren't so important. They are fundamental to whatever solution Apple comes up with. The key to Apple's "secret sauce" right now is their unified memory. But if you think that the magic comes just from the CPU and GPU having equal access to the same memory pool, you're missing half the story. The other half is that memory bandwidth is huge. Oh, and there's a third "half" (heh), which is the astounding cache architecture (with very low latency, a big part of the overall picture). That sets them apart from everyone else.

    Interchip links are key not just because of bandwidth issues, but also latency. Your examples mix a bunch of different technologies that are appropriate for different applications, but not for linking cores in a large system. They also have dramatically different requirements than pure memory busses or supercomputer links (ethernet or infiniband most often). I glossed over a bunch of that briefly when I mentioned cache coherency issues.

    If you want to get a bit of a sense of why that all matters, Andrei over at Anandtech built a very cool benchmark that shows a grid with cross-core latency figures for large CPUs, which he uses when reviewing chips like EPYCs and Xeons. (I think he's used it in the Apple Ax reviews too.) You can see the impact the various technologies have.

    As for chiplets - yes, that's the obvious path forward generally for everyone. But it's not at all obvious how you combine that idea with unified RAM. This is what I was talking about above - if you build a chiplet setup, you're comitted to either some sort of NUMA setup, or to a central memory controller. (That is, the architecture of the first EPYCs, versus the architecture of the most recent generation.) Both have benefits and drawbacks, and both are problematic if you're trying to do what Apple has done with their memory architecture. You run into a variety of issues with physical layout and density. Among other things, this produces huge heat engineering challenges, and grave dificulties physically routing enough traces between CPUs and RAM.

    This is not something you can handwave away by pointing at other implementations. Apple is going to have to do something different. And when they do, it's going to be very exciting. Don't preemptively diminish it by likening it to existing chips. It won't be, unless they give up on close unified memory.
    williamlondonwatto_cobra
  • Reply 36 of 44
    crowley said:
    Marvin said:
    crowley said:
    Does the integration of the memory and GPU on the M-series SoCs not create issues for multiple CPU architectures?  Seems like it might (I claim no expertise here, just guessing).
    [...]
    They can scale the processing units separately from the memory. This would allow them to sell higher-end units with lower amounts of RAM.

    I expect the 27" iMacs to offer M1 Pro, Max and Max Duo chips starting at 16GB RAM and going up to 128GB RAM. Mac Pro if there is one would be Max Duo and Quad, likely starting at 32GB and going up to 256GB - this config could easily go in an iMac too.
    No, Crowley's right. There's a BIG issue here, and how they solve it is going to fascinate (and possibly terrify!) a lot of people.

    The problem is that it's not easy to build massively parallel CPUs. One reason is the need for more memory bandwidth. Look at GPUs to see how this can be dealt with. Memory bandwidth is a huge factor in performance, which is why high-end ones all use GDDR5 (or 6) and they all have wide busses (like, 384 bits wide!)... except for the ones using HBM RAM, which is just that trend taken to an even further extreme (wider buses, though somewhat slower gbps/pin). This is all very expensive, and so CPUs (with sometimes an order or two of magnitude more RAM) have been slower getting there - the largest have only 8 channels of DDR4 RAM.

    The other big challenge is just getting all the cores to be able to talk to each other and to the RAM behind memory controllers that are part of a remote core's cluster/complex. Dealing with cache coherency (or deciding not to), how many layers of cache, etc., is all part of the deepest wizardry. And whether you build a giant mesh or several rings, or do something else, power is a huge issue. It's been estimated that the uncore in the biggest EPYCs can consume ~50% of the entire power budget of the chip.

    Building a Mac Pro with 4x M1 Maxes would be *very* tricky. In fact, I'll say right out that you can't just do that. You could start with the M1 Max and change it in various ways to get to where you need to go, but each choice you make comes with constraints and trade-offs.

    For example, say you decided to get as close as you can to just putting four M1 Maxes in a box. You already face an enormous challenge, which is to minimize latency between far cores (that is, core on one max chip trying to talk to another, or, more to the point, talking to the RAM managed by that other core). There's also the issue of what you do about cache coherency. Do you build a giant LLC that sits in the middle which all the SLCs talk to? Does all memory access go through that? (Hint: almost certainly not.) This would look somewhat like a hypertrophied Threadripper or Epyc. The chances that they could make this work are pretty poor, unless they bring in some VERY new technology - which is entirely possible. In particular, if they go all-in on TSMC "3DFabric" tech (InFO, CoWoS, etc.) and possibly work hard on some integrated cooling tech, then maybe this could be possible, mostly due to the ridiculously low power consumption they're hitting.

    But this leaves out a very important question. What do you do about that RAM, actually? How do you even get room for enough traces to simply talk to all that RAM? And if you want anything even remotely like the memory capacity the current Intel chips have, how are you going to accomplish that? It may be that only HBM can even get you close to that, and that is *extremely* expensive. Like, prohibitively so except for high-end Pro buyers.

    Fundamentally, the biggest issue is that the integrated close RAM that's such a big part of their performance magic is just not scalable. There's physically not enough room for it or its traces, since you need a lot of that room for inter-CPU links (which would push the RAM to... where?). You can't just fill up a larger diameter as speed-of-light issues will start to affect latency.

    But they have a plan. They *are* going to solve this. And whether that involves 3D stacking of some sort, order-of-magnitude larger caches, HBM... we don't know yet. It might involve radical cooling solutions. And there's always the possibility of them doing something really new, which is of course the most exciting possibility of all.

    Really we don't even know if they're going to maintain the unified memory. It seems at least plausible that they won't given the call from some quarters for multiple GPUs. I wouldn't bet on it though.

    So... in a few months to a year, we'll learn what the answer is. Don't think for a second they won't have a good answer. But it's going to be a surprise, whatever it is, and you're going to see idiots all over the net going "that'll never work, it's lies!" until the benchmarks come out. And whatever it is, it will be *damn* impressive. And 99% of the people using them will never understand that. Oh well. :-)
    Thanks, I figured that something like would be a problem, it makes sense that multiple systems on a chip on the same computer would present challenges.  The integration of the SoC means that parallelisation would be more akin to Xgrid than Grand Central Dispatch, so they'd need some very beefy controller chips to manage everything without latency.
    It's not even that simple. "Beefy controllers" won't do the trick here, as you can't just add transistors and expect to cut latency.

    You could indeed build a system with multiple M1 Maxes. But performance in any task that spanned multiple M1s would be terrible, because interchip latency would be awful- they have no bus suitable for the task. It would be as you said, much like four separate systems linked with Ethernet or Infiniband. It would fail miserably compared to large chips like EPYCs or Xeons. No, Apple will be doing something else.

    At the very least, the chips (or chiplets) they use will have a high-bandwidth low-latency link just to talk to other chip(let)s. But that's just a start. There's some very serious engineering going on! Expect to be surprised.
    watto_cobra
  • Reply 37 of 44
    melgrossmelgross Posts: 33,510member
    saarek said:
    nicholfd said:
    saarek said:
    It's a shame that the switch to Apple silicone has resulted on a free for all price gouge by Apple. Bye bye dreams of Apple silicon meaning better pricing.
    You know this how?  Source?  No one has any idea what it costs to produce the M1 Pro & M1 Max.

    You're right of course, I don't know what the actual cost (R&D and manufacturing) is for the new Apple Silicon. What I do know is that they added $300 to the 2016 MacBook Pro line when they added the Touch Bar, a Touch Bar that is not on these new machines. They also don't have to pay a hefty fee to Intel and yet raised the cost of these MacBooks by what, $200?

    If the Touch Bar costs $300 and the new price is $200 more we are paying $500 more for the MacBook Pro than the one that shipped back in 2015.

    So yeah, rightly or wrongly I think Apple know they have a good thing going with Apple Silicon and they are asking us all to get lubed up and prerpare for penetration with their pricing.
    There is no way the touch bar cost $300. Apple made other changes as well. It’s likely the chips in that machine cost somewhat more too. These new chips are pretty large, as large as Intel chips. They likely cost as much, or even more to produce, with the custom ram packages. Additionally, are you not thinking about the screen? My iPad Pro 12.9” cost an additional $100 because of the new miniled screen. That’s a 12.9” screen, while these are 14.2”.
  • Reply 38 of 44
    melgrossmelgross Posts: 33,510member
    crowley said:
    Marvin said:
    tht said:
    So if the 4x claim is true, what is a plausible explanation? 
    It’s on the website:

    Prerelease Cinema 4D S25 real-time 3D performance tested using a 1.98GB scene. 

    Prerelease Cinema 4D S25 and prerelease Redshift v3.0.54 tested using a 1.32GB scene.

    Tested with prerelease Affinity Photo 1.10.2.263 using the built-in benchmark version 11021. 

    These all are 4x (M1 Max) and 2x (M1 Pro) over the 5600M.
    Right, it's 4x faster than the 5600M but what I'm trying to say is that we cannot use the GeekBench numbers as a guide, which would match the M1 Max up to a 20+ TeraFLOPS AMD Radeon Pro W6900X. But the M1 Max clear match the top AMD or NVIDA mobile chips, using 40% less power.

    One more chip to go! The M1 Max Extreme or whatever they call it. If it's a 64-core GPU it will match the top card but to match the dual cards it will have to be 128-cores. That's going to be one big wafer. If the power saving scale up, it's going to be a game changer.
    Some tests will be affected by the memory setup. The 5600M has 8GB of video memory. If they load up a 3D scene with 4K/8K textures, it can run out of memory and then it has to keep swapping textures with system memory (similar to a system paging when it runs out of system memory). The M1 Max chip has up to 64GB of unified memory so it can keep everything in memory.

    There was an image posted about the chip designs where the higher end chips will be multiples of the M1 Max, at least 2x and 4x. Maybe they will have a 3x too but that wouldn't be needed. They could call them Extreme and Ultimate or they could even use Duo like they do for the Radeon Pro Duo. M1 Max Duo, M1 Max Quad.

    The largest would be 16x the size of the M1 chip, much like the Threadripper chips:



    The M1 Max has 57 billion transistors, the 3990x Threadripper has around 40 billion (just for CPU, no GPU). 4x the M1 Max will have 228 billion if they just take multiples of the chips. If they multiply the cores separately then it will be a bit less.

    It will be a very powerful chip and even crazier to think it would be able to fit into the 2013 Mac Pro cylinder enclosure. If Intel/AMD had kept up, it could have been with their chips but they didn't and Apple had to revert back to an old form factor to accommodate their inefficient hardware.
    Does the integration of the memory and GPU on the M-series SoCs not create issues for multiple CPU architectures?  Seems like it might (I claim no expertise here, just guessing).
    I don’t see why. Many multi chip architectures have RAM allocated per chip.
  • Reply 39 of 44
    Come on now, get your facts correct. This is the specs on Apple website:

    Macbook Pro 14"
    Up to 11 hours wireless web
    Up to 17 hours Apple TV app movie playback

    Macbook Pro 13"
    Up to 17 hours wireless web
    Up to 20 hours Apple TV app movie playback

    Despite having a bulkier case with more volume for batteries, the 14" MacBook Pro still can't beat the battery life of the 13", according to Linus on his Youtube channel, the 13" actually exceeded 20 hrs of battery life. This is expected as the M1 Pro/Max doubles the power consumption. Besides price, battery life is a selling point of the 13".
  • Reply 40 of 44
    saareksaarek Posts: 1,520member
    melgross said:
    saarek said:
    nicholfd said:
    saarek said:
    It's a shame that the switch to Apple silicone has resulted on a free for all price gouge by Apple. Bye bye dreams of Apple silicon meaning better pricing.
    You know this how?  Source?  No one has any idea what it costs to produce the M1 Pro & M1 Max.

    You're right of course, I don't know what the actual cost (R&D and manufacturing) is for the new Apple Silicon. What I do know is that they added $300 to the 2016 MacBook Pro line when they added the Touch Bar, a Touch Bar that is not on these new machines. They also don't have to pay a hefty fee to Intel and yet raised the cost of these MacBooks by what, $200?

    If the Touch Bar costs $300 and the new price is $200 more we are paying $500 more for the MacBook Pro than the one that shipped back in 2015.

    So yeah, rightly or wrongly I think Apple know they have a good thing going with Apple Silicon and they are asking us all to get lubed up and prerpare for penetration with their pricing.
    There is no way the touch bar cost $300. Apple made other changes as well. It’s likely the chips in that machine cost somewhat more too. These new chips are pretty large, as large as Intel chips. They likely cost as much, or even more to produce, with the custom ram packages. Additionally, are you not thinking about the screen? My iPad Pro 12.9” cost an additional $100 because of the new miniled screen. That’s a 12.9” screen, while these are 14.2”.
    The TouchBar obviously didn’t cost Apple $300, but it is the additional they charged. The key selling point of the 2016 MacBook Pro without TouchBar is that it cost $300 less not to have it.

    Every MacBook Pro is “the best we have ever shipped” and it’s true. Screen technology, as well as everything else moves on. 

    Apple will charge whatever they think the market will bear. With the Apple Silicon they have a huge advantage and have decided to increase their legendary margins accordingly.

    It’s a shame they didn’t create a 14” M1 MacBook Pro as the new base model.
    muthuk_vanalingam
Sign In or Register to comment.