Apple Silicon iMac & MacBook Pro expected in 2021, 32-core Mac Pro in 2022

Flytrap · December 7, 2020 5:35PM

22july2013 said:

I presume this article is talking about 32 cores being on the same die. If true, that's good to hear. But don't see why Apple couldn't put multiple physical packages with N-cores each on the same system board. Moreover, Apple should do this as an expansion card for existing Intel Mac Pros so that Intel Mac Pros can run M1-apps natively.

For anyone paying attention to Apple over the last few years... and for anyone paying attention at their most recent product redesigns... Apple has clearly declared war on end-user replaceable or upgradable modularity. Apple Silicon gives them the ability to make a giant leap towards turning the Mac into a personal computing appliance. They will get close, but, in my opinion, they will never really accomplish that goal because it will be impossible to turn what is currently a general computing operating system that people can run almost anything that they want on, into a locked-down firmware-like OS that can only run apps that Apple has deemed worthy to be allowed.

Fewer and fewer Apple devices offer end-user upgradable storage, memory, CPU, GPU, etc. None of the latest Apple Silicon Macs offer any means for an end-user to upgrade or replace the CPU, Memory, Storage, GPU... or anything that was not originally ordered with the machine, for that matter. If you have an Intel Mac, you can install and run any recent version of macOS. On an Apple Silicon Mac, you can only install and run macOS Big Sur... macOS is likely to be the only end-user upgradable part of a Mac appliance in future.

MacPro · December 7, 2020 5:48PM

rob53 said:

Why stop at 32 cores. The current TOP500 supercomputer is comprised of a ton of Fujitsu A64X 48 compute core CPUs based on ARM v8.2-A. Cray is also working on changing to the ARM architecture. The current implementation is only running at 2.2 GHz using a 7nm CMOS FinFET design. This is on one chip! (ref: https://en.wikipedia.org/wiki/Fujitsu_A64FX, also https://www.fujitsu.com/downloads/SUPER/a64fx/a64fx_datasheet.pdf)

Apple is already fabricating at 5nm and there's no reason why they couldn't build something similar, although I expect it to be a larger SoC than the current M1. The Fujitsu CPU appears to be more or less a standard size package, although it's only the CPU. (ref: https://www.anandtech.com/show/15885/hpc-systems-special-offer-two-a64fx-nodes-in-a-2u-for-40k)

Apple isn't the only major computer company going ARM. It's good to see Apple finally expand to the Mac line. Those performance charts everyone's used to seeing with a simply curve are going to be amazed at how steep the jumps are once Apple really gets going. I ordered an M1 MBA simply because I've ordered ver 1 of several Apple Macs before. Crazy thing is this entry level Mac is faster than my current iMac. Talk about a steep performance curve!

Note: I keep confusing ARM with AMD so previously I might have commented about the Fugaku supercomputer being AMD-based. If I did, my mistake.

Cray will be using Macs to design their computers soon

auxio · December 7, 2020 6:11PM

mjtomlin said:

rob53 said:

Why stop at 32 cores. The current TOP500 supercomputer is comprised of a ton of Fujitsu A64X 48 compute core CPUs based on ARM v8.2-A. Cray is also working on changing to the ARM architecture. The current implementation is only running at 2.2 GHz using a 7nm CMOS FinFET design. This is on one chip! (ref: https://en.wikipedia.org/wiki/Fujitsu_A64FX, also https://www.fujitsu.com/downloads/SUPER/a64fx/a64fx_datasheet.pdf)

Those are designed for massive parallelism. This is not something that would be necessary for what is essentially a single user computer. You could get more perceived performance from a lower core, higher clock speed design.

It really depends on the application. For example, hi-res video rendering would certainly benefit from more cores. Mac Pros aren't being designed for the average computer user.

zimmie · December 7, 2020 6:14PM

ph382 said:

rob53 said:

Why stop at 32 cores.

I don't see even 32 happening for anything but the Mac Pro. I saw forum comments recently that games don't and can't use more than six cores. How much RAM (and heat) would you need to feed 32 cores?

Games certainly could use more than six CPU cores if the developers cared enough to build them that way. As an example, each AI agent could run in its own thread, and with enough cores, each thread gets its own core.

Most games don't use parallel CPU cores effectively because they don't need to. Of course, most games use multiple GPU cores effectively because the engines and the underlying graphics APIs make parallel video processing mandatory.

blastdoor said:

ph382 said:

rob53 said:

Why stop at 32 cores.

I don't see even 32 happening for anything but the Mac Pro. I saw forum comments recently that games don't and can't use more than six cores. How much RAM (and heat) would you need to feed 32 cores?

https://www.anandtech.com/show/15044/the-amd-ryzen-threadripper-3960x-and-3970x-review-24-and-32-cores-on-7nm

The 32 core Threadripper 3970x has a TDP of 280 watts on a 7nm process. It has four DDR4 3200 RAM channels.

Based on comparisons of the M1 to mobile Ryzen, I would expect an ASi 32 core SOC to have a TDP much lower than 280 watts.

I bet a 32 core ASi SOC on a 5nm process could fit within the thermal envelope of an iMac Pro.

In such comparisons, it is worth noting the Threadripper cores have two-way multithreading, making them effectively two cores each (with some limitations: doing the same thing in two threads isn't faster, but a core can do two different things at once). Thus, it would be closer to, say, an M1 with 56 CPU cores. Probably ~190W CPU TDP (assuming seven Firestorm clusters with eight cores per cluster, at 3.45W per core).

Also of note: AMD's Zen 2 cores have 32 kB of L1i, 32 kB of L1d, and 512 kB of L2 cache each. Apple's high-performance cores (Firestorm) have 192 kB of L1i and 128 kB of L1d each and 12 MB of L2 cache to share between them. Apple's low-power cores (Icestorm) have 128 kB L1i and 64 kB L1d each with 4 MB of shared L2 cache between them. Taken together, the M1 has more instruction cache (1.25 MB versus 1 MB), slightly less data cache (.75 MB versus 1 MB), and the same amount of L2 cache (16 MB) as the whole Threadripper 3970x. This lets the M1 punch waaaay above its thermal class for things which can fit in its RAM.

auxio · December 7, 2020 6:14PM

Rayz2016 said:

winston2010 said:

22july2013 said:

I presume this article is talking about 32 cores being on the same die. If true, that's good to hear. But don't see why Apple couldn't put multiple physical packages with N-cores each on the same system board. Moreover, Apple should do this as an expansion card for existing Intel Mac Pros so that Intel Mac Pros can run M1-apps natively.

Why stop at Intel Macs? heck if they could put M1 on a PCI expansion card, Let all Windows PC's run Mac/iOS software and the world is your oyster!

The only way this would work is if Apple charged the same price for the card as they did for the whole machine.

There is no way Apple is going to burn its own house down like this, and then have to support every single PC in existence.

Especially when PCs are a drop in the sales bucket compared to mobile devices.

tmay · December 7, 2020 6:18PM

Flytrap said:

22july2013 said:

I presume this article is talking about 32 cores being on the same die. If true, that's good to hear. But don't see why Apple couldn't put multiple physical packages with N-cores each on the same system board. Moreover, Apple should do this as an expansion card for existing Intel Mac Pros so that Intel Mac Pros can run M1-apps natively.

For anyone paying attention to Apple over the last few years... and for anyone paying attention at their most recent product redesigns... Apple has clearly declared war on end-user replaceable or upgradable modularity. Apple Silicon gives them the ability to make a giant leap towards turning the Mac into a personal computing appliance. They will get close, but, in my opinion, they will never really accomplish that goal because it will be impossible to turn what is currently a general computing operating system that people can run almost anything that they want on, into a locked-down firmware-like OS that can only run apps that Apple has deemed worthy to be allowed.

Fewer and fewer Apple devices offer end-user upgradable storage, memory, CPU, GPU, etc. None of the latest Apple Silicon Macs offer any means for an end-user to upgrade or replace the CPU, Memory, Storage, GPU... or anything that was not originally ordered with the machine, for that matter. If you have an Intel Mac, you can install and run any recent version of macOS. On an Apple Silicon Mac, you can only install and run macOS Big Sur... macOS is likely to be the only end-user upgradable part of a Mac appliance in future.

The Market has spoken, again, and again, and again, and it doesn't value user upgradable hardware, excepting for a small market segment at the top.

dk49 · December 7, 2020 6:49PM

Meanwhile, I am still waiting for the Apple Car..

mdriftmeyer · December 7, 2020 6:50PM

blastdoor said:

ph382 said:

rob53 said:

Why stop at 32 cores.

I don't see even 32 happening for anything but the Mac Pro. I saw forum comments recently that games don't and can't use more than six cores. How much RAM (and heat) would you need to feed 32 cores?

https://www.anandtech.com/show/15044/the-amd-ryzen-threadripper-3960x-and-3970x-review-24-and-32-cores-on-7nm

The 32 core Threadripper 3970x has a TDP of 280 watts on a 7nm process. It has four DDR4 3200 RAM channels.

Based on comparisons of the M1 to mobile Ryzen, I would expect an ASi 32 core SOC to have a TDP much lower than 280 watts.

I bet a 32 core ASi SOC on a 5nm process could fit within the thermal envelope of an iMac Pro.

It has 8 not 4 DDR 4 3200 ECC RAM Channels and those are limited by the OEMs. Zen supports 2TB of DDR4 RAM since Zen 2. And Threadripper is limited to 32 Cores presently because they haven't moved to the Zen 4 5nm process with RDNA 3.0/CDNA 2.0 based solutions that are on a unified memory plane.

Those 32 cores would be a 16/16 Big/Little and then combine their GPU and other co-processors and you have a much larger SoC or very small cores.

TR 3 arrives this January along with EPYC 3 Milan with 64/128 Cores. The next releases as Lisa Su has stated and their software ROCm has shown will be integrating Xilinx co-processors into the Zen 4/RDNA 3.0/CDNA 2.0 based solutions and beyond.

Both AMD and Xilinx have Architecture licenses to ARM and have been designing and producing ARM processors for years. Xilinx itself has an arsenal of solutions in ARM.

32 Cores would only be in the Mac Pro. 8/8 cores in the iMac and 12/12 in the iMac Pro is pushing it.

In 2022 Jim Keller's CPU designs from Intel hit the market. The upcoming Zen architecture designs will be announced in January 2021 at the CES Virtual conference. AMD has already announced by 2025 its conservative Product sales of Hardware will be over $22 Billion. That's up from this year's just over $8 Billion.

Apple has zero interest in supporting anything beyond their Matrix of hardware options and people believing they want to be all solutions to all people don't understand and never have understood the mission statement of Apple.

A lot of the R&D in M1 is going into their IoT future products and Automobile products.

mdriftmeyer · December 7, 2020 6:53PM

tipoo said:

Just to add something, GPU core counts are all counted differently and meaningless across architectures. An Apple GPU core is 128 ALUs, say an Intel one is 8.

Seeing what they did with the 8C M1, the prospect of a 128 core Apple GPU is amazingly tantalizing, that's 16,384 unified shaders.

Tantalizing and absurd because you will be starving most of your shaders sitting their doing nothing and that's not how GPUs work. Apple is investing in their specialty processing to be the heavy lifter--the Neural Engine, ML Engine, Audio/Video Decode/Encode process most of the system's requests already.

frantisek · December 7, 2020 7:07PM

tipoo said:

Just to add something, GPU core counts are all counted differently and meaningless across architectures. An Apple GPU core is 128 ALUs, say an Intel one is 8.

Seeing what they did with the 8C M1, the prospect of a 128 core Apple GPU is amazingly tantalizing, that's 16,384 unified shaders.

I did some trivial calculation taking 15% zear increase in performance and 128 core M3 graphic could offer 400 000 score for OpenCL and 460 000 for Metal. That 2x and 4,7 more then top performing cards.While keeping 60 W TDP. Take as really gross guess based on current numbers without taking decrease of performance in high core number unit that can occur. I have no expertise.

22july2013 · December 7, 2020 7:19PM

Flytrap said:

22july2013 said:

I presume this article is talking about 32 cores being on the same die. If true, that's good to hear. But don't see why Apple couldn't put multiple physical packages with N-cores each on the same system board. Moreover, Apple should do this as an expansion card for existing Intel Mac Pros so that Intel Mac Pros can run M1-apps natively.

For anyone paying attention to Apple over the last few years... and for anyone paying attention at their most recent product redesigns... Apple has clearly declared war on end-user replaceable or upgradable modularity. Apple Silicon gives them the ability to make a giant leap towards turning the Mac into a personal computing appliance. They will get close, but, in my opinion, they will never really accomplish that goal because it will be impossible to turn what is currently a general computing operating system that people can run almost anything that they want on, into a locked-down firmware-like OS that can only run apps that Apple has deemed worthy to be allowed.

Fewer and fewer Apple devices offer end-user upgradable storage, memory, CPU, GPU, etc. None of the latest Apple Silicon Macs offer any means for an end-user to upgrade or replace the CPU, Memory, Storage, GPU... or anything that was not originally ordered with the machine, for that matter. If you have an Intel Mac, you can install and run any recent version of macOS. On an Apple Silicon Mac, you can only install and run macOS Big Sur... macOS is likely to be the only end-user upgradable part of a Mac appliance in future.

First off, if the market doesn't want upgradeable Macs, who are you to order Apple to do otherwise?

Second off, if the market wants a locked-down, firmware-like secure OS, who are you to order Apple to do otherwise? Do you oppose freedom of choice? Do you want everyone to be forced by law to follow the Android or Microsoft model?

Third off, why do you consider the current crop of Macs to be non-upgradeable since they have Thunderbolt ports that allow plenty of external hardware upgrades?

Fourth off, why aren't you ranting about the lack of upgradeable TVs? Or why aren't you ranting about the lack of TV repair shops? Nobody cares to upgrade or repair their TVs anymore. Who are you to say that people should care? TVs are commodities now. Nobody cares about what's inside them. Same with computers. Nobody cares to upgrade a CPU, despite your desire to do that.

blastdoor · December 7, 2020 7:33PM

mdriftmeyer said:

blastdoor said:

ph382 said:

rob53 said:

Why stop at 32 cores.

I don't see even 32 happening for anything but the Mac Pro. I saw forum comments recently that games don't and can't use more than six cores. How much RAM (and heat) would you need to feed 32 cores?

https://www.anandtech.com/show/15044/the-amd-ryzen-threadripper-3960x-and-3970x-review-24-and-32-cores-on-7nm

The 32 core Threadripper 3970x has a TDP of 280 watts on a 7nm process. It has four DDR4 3200 RAM channels.

Based on comparisons of the M1 to mobile Ryzen, I would expect an ASi 32 core SOC to have a TDP much lower than 280 watts.

I bet a 32 core ASi SOC on a 5nm process could fit within the thermal envelope of an iMac Pro.

It has 8 not 4 DDR 4 3200 ECC RAM Channels and those are limited by the OEMs. Zen supports 2TB of DDR4 RAM since Zen 2. And Threadripper is limited to 32 Cores presently because they haven't moved to the Zen 4 5nm process with RDNA 3.0/CDNA 2.0 based solutions that are on a unified memory plane.

Epyc has 8 channels; Threadripper 4.
Threadripper is NOT limited to 32 cores; the 3990x has 64.

blastdoor · December 7, 2020 7:45PM

According to Anandtech, the speedup from running two threads on a Zen 3 core rather than one appears to be about 22%:

https://www.anandtech.com/show/16261/investigating-performance-of-multithreading-on-zen-3-and-amd-ryzen-5000

Also, Zen cores use a lot more power than Firestorm cores. To run 32 Zen cores simultaneously requires a big drop in clock speed. While a single Zen 3 core running at 5 GHz more or less matches a Firestorm core running at 3.2 GHz, 32 Zen cores running at the same time run at 3.7 GHz (that's a drop of about 35%). I'll bet Firestorm cores don't drop nearly that much clock speed in a 32 core configuration.

So.... I'll hazard a guess that 32 Firestorm cores are ballpark equivalent, or maybe even faster, than 32 Zen 3 cores.

zimmie said:

In such comparisons, it is worth noting the Threadripper cores have two-way multithreading, making them effectively two cores each (with some limitations: doing the same thing in two threads isn't faster, but a core can do two different things at once). Thus, it would be closer to, say, an M1 with 56 CPU cores. Probably ~190W CPU TDP (assuming seven Firestorm clusters with eight cores per cluster, at 3.45W per core).

rob53 · December 7, 2020 8:57PM

22july2013 said:

Flytrap said:

22july2013 said:

I presume this article is talking about 32 cores being on the same die. If true, that's good to hear. But don't see why Apple couldn't put multiple physical packages with N-cores each on the same system board. Moreover, Apple should do this as an expansion card for existing Intel Mac Pros so that Intel Mac Pros can run M1-apps natively.

For anyone paying attention to Apple over the last few years... and for anyone paying attention at their most recent product redesigns... Apple has clearly declared war on end-user replaceable or upgradable modularity. Apple Silicon gives them the ability to make a giant leap towards turning the Mac into a personal computing appliance. They will get close, but, in my opinion, they will never really accomplish that goal because it will be impossible to turn what is currently a general computing operating system that people can run almost anything that they want on, into a locked-down firmware-like OS that can only run apps that Apple has deemed worthy to be allowed.

Fewer and fewer Apple devices offer end-user upgradable storage, memory, CPU, GPU, etc. None of the latest Apple Silicon Macs offer any means for an end-user to upgrade or replace the CPU, Memory, Storage, GPU... or anything that was not originally ordered with the machine, for that matter. If you have an Intel Mac, you can install and run any recent version of macOS. On an Apple Silicon Mac, you can only install and run macOS Big Sur... macOS is likely to be the only end-user upgradable part of a Mac appliance in future.

First off, if the market doesn't want upgradeable Macs, who are you to order Apple to do otherwise?

Second off, if the market wants a locked-down, firmware-like secure OS, who are you to order Apple to do otherwise? Do you oppose freedom of choice? Do you want everyone to be forced by law to follow the Android or Microsoft model?

Third off, why do you consider the current crop of Macs to be non-upgradeable since they have Thunderbolt ports that allow plenty of external hardware upgrades?

Fourth off, why aren't you ranting about the lack of upgradeable TVs? Or why aren't you ranting about the lack of TV repair shops? Nobody cares to upgrade or repair their TVs anymore. Who are you to say that people should care? TVs are commodities now. Nobody cares about what's inside them. Same with computers. Nobody cares to upgrade a CPU, despite your desire to do that.

Agree, especially the part about TVs. I bought a cheap TV and it lasted less than a month. Service was called and they replaced the only circuit board they could replace. Still didn't work. To fix it, they needed to replace the entire display because the controller board was attached to it. Again, didn't work so I was able to get my money back. It's less expensive for TV manufacturers to NOT have repairable parts. It means less parts inventory, especially when they change things almost every year, and reduced repair labor, just replace the whole thing.

Yes, we are and have been in a consumable, throwaway, market where repair costs are than replacement costs. I'm talking about replacement costs, not the price we pay.

As for #1-#3, the vast majority of consumers simply want something that will work for as long as it can or until they want something new to keep up with the Jones (or whatever the latest, most used last name is). People on this forum and other tech forums want repairability and upgradeability. Those days are gone. Buy the most you can afford and use it for the longest you can while adding TB peripherals. I bought the 2020 MBA with 8GB RAM and 512GB storage. I need more than that if I were going to replace my iMac but I also bought OWC's TB3 Envoy along with a 1TB NVMe card. The Envoy was $79 (can't remember if I paid less on pre-order) and is the least expensive was to add large, fast external storage to a Mac. The nice thing is this device has a plastic clip that holds the enclosure to the top of a laptop. I plan on using it to hold my iTunes and possibly Photos library although I'd need a larger SSD for that. I thought techies (40 years working in computerized publishing) would figure these things out. I guess some just want a huge box to put everything in even though most of the time they never move anything.

disclaimer: I am retired and have never worked for MacSales/OWC but I've purchased plenty of equipment from them.

Image: https://forums.appleinsider.com/uploads/editor/qv/h7n1pxwijnxi.jpg

zimmie · December 7, 2020 9:41PM

blastdoor said:

According to Anandtech, the speedup from running two threads on a Zen 3 core rather than one appears to be about 22%:

https://www.anandtech.com/show/16261/investigating-performance-of-multithreading-on-zen-3-and-amd-ryzen-5000

Also, Zen cores use a lot more power than Firestorm cores. To run 32 Zen cores simultaneously requires a big drop in clock speed. While a single Zen 3 core running at 5 GHz more or less matches a Firestorm core running at 3.2 GHz, 32 Zen cores running at the same time run at 3.7 GHz (that's a drop of about 35%). I'll bet Firestorm cores don't drop nearly that much clock speed in a 32 core configuration.

So.... I'll hazard a guess that 32 Firestorm cores are ballpark equivalent, or maybe even faster, than 32 Zen 3 cores.

zimmie said:

In such comparisons, it is worth noting the Threadripper cores have two-way multithreading, making them effectively two cores each (with some limitations: doing the same thing in two threads isn't faster, but a core can do two different things at once). Thus, it would be closer to, say, an M1 with 56 CPU cores. Probably ~190W CPU TDP (assuming seven Firestorm clusters with eight cores per cluster, at 3.45W per core).

Optimized workloads should be able to get much more than 22% speed improvement, but that seems about right for an average, unoptimized-load improvement. I'm not terribly familiar with Zen, and was not aware it had to slow down that much to run all the cores (or had to boost a single core that high to match a Firestorm, whichever way you cut it). In that case, yeah, 32 Firestorm cores plus eight Icestorm cores (0.325W each) should be more than a match for 32 Zen cores. 112.6W CPU TDP versus 280W. I bet 24 Firestorm plus eight Icestorm would give a 32-core Zen a run for its money, too, and that would just be 85.4W CPU TDP.

I wonder if the Zen limitation is power intake or heat dissipation. With a ~1.4v vCore, that's ~200 amps of current it's passing around. All the Zen chips seem to cap out at 280W, which makes me suspect a limitation of the MCM packaging tech.

commentzilla · December 7, 2020 9:57PM

Just what I thought would happen. Although not much guesswork involved considering the time remaining during the transition.

The iMac will of course come first on the low-end with the M1. Then the high-end MBP and iMacs with 12-16 cores. Followed by the MacPro. I don't think core counts matter much, they will size them simply to blow away the current line of Intel chips as a show of force. When I first heard about a smaller MacPro case, it was easy to assume it was for a less powerful MacPro version but after I saw the thermal performance of the M1 it because clear that the move to Apple Silicon would require less cool and likely few RAM slots with a lot of unified memory and graphics onboard.

These machines are going to have a dramatic affect on sales and may open additional sales channels for Apple. I'm also certain we'll see the entry prices for the MBA and the entry iMac drop a bit lower as time goes on. Difficult to say how much lower when these will in a way will compete with the iPad's pricing structure. The sweet spot for the entry models would be the $750-$800 range.

tipoo · December 7, 2020 9:59PM

mdriftmeyer said:

tipoo said:

Just to add something, GPU core counts are all counted differently and meaningless across architectures. An Apple GPU core is 128 ALUs, say an Intel one is 8.

Seeing what they did with the 8C M1, the prospect of a 128 core Apple GPU is amazingly tantalizing, that's 16,384 unified shaders.

Tantalizing and absurd because you will be starving most of your shaders sitting their doing nothing and that's not how GPUs work. Apple is investing in their specialty processing to be the heavy lifter--the Neural Engine, ML Engine, Audio/Video Decode/Encode process most of the system's requests already.

Seeing what they do with a meager for a GPU 70GB/s already, I would think they will think out the bottlenecks. But certainly the design has to make sure it's well fed to scale that high.

What do you mean, that's not how GPUs work? The GPUs rumored are 128C, and so far Apple's allocation is 128 ALUs/core. Also for a 2022 product, that's about double the ALUs in a 3080, I don't know what you think is absurd about it.

https://www.techpowerup.com/gpu-specs/geforce-rtx-3080.c3621

dewme · December 7, 2020 10:16PM

22july2013 said:

I presume this article is talking about 32 cores being on the same die. If true, that's good to hear. But don't see why Apple couldn't put multiple physical packages with N-cores each on the same system board. Moreover, Apple should do this as an expansion card for existing Intel Mac Pros so that Intel Mac Pros can run M1-apps natively.

They could, but it would diminish some of the benefits of the M1's unified memory architecture. Yes, you'd still have more cores available for working on highly parallelizable problems, but once the parts of the work are chunked and doled out to different groups of cores in different packages you'd have to designate something, like a "master" package or CPU to reassemble (memory copy) all of the distributed work back to a central point for final processing - and it would have to fit into the memory space of the master.

Keep in mind that the multiple CPUs on traditional multiple CPU motherboards all shared the same main memory. Additionally, each CPU or CPU core had its own memory caches and all of those caches had to be maintained in a coherent state because they were all referencing shared memory locations. If one CPU/core changed the value of a cached location in shared memory all caches in all CPUs/cores that also hold a reference to that main memory location in their caches have to be updated. With the M1 Apple can optimize its cache coherency logic much easier within its unified memory architecture than it can with shared off-package memory. If Apple brings in multiple packages each with multiple cores to work on shared problems, how many of these traditional memory management problems come back into the fray? I ask this only because a lot of the architectural benefits that Apple achieves with the M1 are due to tight-coupling between CPU cores, tight-coupling between CPU and GPU cores, and having a unified memory architecture very close to all of the cores. Moving to a multi-package model brings loose coupling back into the picture, which will most definitely limit the overall speedup that is attainable. There are alternate ways to get something in-between tight and loose coupling, as AMD has done, but you want to keep everything as close as possible.

Two things to keep in mind with all Von Neumann based architectures, which includes all current Macs and PCs, 1: copying memory around always sucks from a performance standpoint, and 2: even small amounts of non-parallelizable code has a serious negative impact on the attainable speedup that can be seen with any number of processing units/cores (Ahmdahl's Law). This doesn't mean that speedups aren't possible, it just means that unless your code is well above 90% parallelizable you're going to max out on performance very quickly regardless of the number of cores. For example for 90% case, with 8 cores you'd get a speedup of 5 over a single core, but with 32 cores you're only getting a speedup of 8 over a single core even though you've quadrupled the number of cores. To double the speedup in the 90% case you'd have to go from 8 cores to around 512 cores.

The idea of a M1 coprocessor board for the Mac Pro is not a bad idea, and it's a strategy that's been used since the dawn of personal computers. One of DEC's first PC-compatible computers, the Rainbow 100, had a 8088 and a Z80 processor so it could run both MS-DOS and CP/M (and other) operating systems and applications. I've had PCs, Sun workstations, and industrial PLCs that had plug-in coprocessor boards for doing specialized processing semi-independently of the host CPU. The challenge is always how to integrate the coprocessor with the host system so the co-processor and the host system cooperate in a high quality and seamless manner. Looking across the tight-coupling to loose-coupling spectrum, these plug-in board solutions tend to be more on the looser side of the spectrum unless the host system (and all of its busses) were purposely designed with the coprocessor in mind, like the relationship between the old 8086/8088 general purpose CPU and the 8087 numerical coprocessor.

commentzilla · December 7, 2020 11:43PM

22july2013 said:

Rayz2016 said:

winston2010 said:

22july2013 said:

I presume this article is talking about 32 cores being on the same die. If true, that's good to hear. But don't see why Apple couldn't put multiple physical packages with N-cores each on the same system board. Moreover, Apple should do this as an expansion card for existing Intel Mac Pros so that Intel Mac Pros can run M1-apps natively.

Why stop at Intel Macs? heck if they could put M1 on a PCI expansion card, Let all Windows PC's run Mac/iOS software and the world is your oyster!

The only way this would work is if Apple charged the same price for the card as they did for the whole machine.

There is no way Apple is going to burn its own house down like this, and then have to support every single PC in existence.

Apple could sell the M1 chip as a separate part for "Windows for ARM" PCs, but continue to prohibit macOS from running on any other vendor's PCs. That requires less "support." But still some support. I think that would be a more palatable and profitable idea.

Never going to happen. What I could see happening is the development of server chips for 3rd parties cloud farms running Linux. Basically the same chips.

22july2013 · December 8, 2020 12:05AM

dewme said:

22july2013 said:

I presume this article is talking about 32 cores being on the same die. If true, that's good to hear. But don't see why Apple couldn't put multiple physical packages with N-cores each on the same system board. Moreover, Apple should do this as an expansion card for existing Intel Mac Pros so that Intel Mac Pros can run M1-apps natively.

They could, but it would diminish some of the benefits of the M1's unified memory architecture. Yes, you'd still have more cores available for working on highly parallelizable problems, but once the parts of the work are chunked and doled out to different groups of cores in different packages you'd have to designate something, like a "master" package or CPU to reassemble (memory copy) all of the distributed work back to a central point for final processing - and it would have to fit into the memory space of the master.

Keep in mind that the multiple CPUs on traditional multiple CPU motherboards all shared the same main memory. Additionally, each CPU or CPU core had its own memory caches and all of those caches had to be maintained in a coherent state because they were all referencing shared memory locations. If one CPU/core changed the value of a cached location in shared memory all caches in all CPUs/cores that also hold a reference to that main memory location in their caches have to be updated. With the M1 Apple can optimize its cache coherency logic much easier within its unified memory architecture than it can with shared off-package memory. If Apple brings in multiple packages each with multiple cores to work on shared problems, how many of these traditional memory management problems come back into the fray? I ask this only because a lot of the architectural benefits that Apple achieves with the M1 are due to tight-coupling between CPU cores, tight-coupling between CPU and GPU cores, and having a unified memory architecture very close to all of the cores. Moving to a multi-package model brings loose coupling back into the picture, which will most definitely limit the overall speedup that is attainable. There are alternate ways to get something in-between tight and loose coupling, as AMD has done, but you want to keep everything as close as possible.

Two things to keep in mind with all Von Neumann based architectures, which includes all current Macs and PCs, 1: copying memory around always sucks from a performance standpoint, and 2: even small amounts of non-parallelizable code has a serious negative impact on the attainable speedup that can be seen with any number of processing units/cores (Ahmdahl's Law). This doesn't mean that speedups aren't possible, it just means that unless your code is well above 90% parallelizable you're going to max out on performance very quickly regardless of the number of cores. For example for 90% case, with 8 cores you'd get a speedup of 5 over a single core, but with 32 cores you're only getting a speedup of 8 over a single core even though you've quadrupled the number of cores. To double the speedup in the 90% case you'd have to go from 8 cores to around 512 cores.

The idea of a M1 coprocessor board for the Mac Pro is not a bad idea, and it's a strategy that's been used since the dawn of personal computers. One of DEC's first PC-compatible computers, the Rainbow 100, had a 8088 and a Z80 processor so it could run both MS-DOS and CP/M (and other) operating systems and applications. I've had PCs, Sun workstations, and industrial PLCs that had plug-in coprocessor boards for doing specialized processing semi-independently of the host CPU. The challenge is always how to integrate the coprocessor with the host system so the co-processor and the host system cooperate in a high quality and seamless manner. Looking across the tight-coupling to loose-coupling spectrum, these plug-in board solutions tend to be more on the looser side of the spectrum unless the host system (and all of its busses) were purposely designed with the coprocessor in mind, like the relationship between the old 8086/8088 general purpose CPU and the 8087 numerical coprocessor.

Thanks, and I agree. There are technical challenges and compromises with putting multiple CPUs on the same board, but it is technically possible. And Apple (and Microsoft) have sold add-on boards in the past for Macs and Apple IIes that allowed users to run other OSs.

In fact Apple could also build an Intel CPU board that could fit inside an M1 Mac Pro. That way you could run Intel apps natively in an M1 Mac, including Windows.

Apple Silicon iMac & MacBook Pro expected in 2021, 32-core Mac Pro in 2022

Comments