tenthousandthings

About

Username: tenthousandthings
Joined: June 2007
Visits: 179
Last Active: October 2024
Roles: member
Points: 2,055
Badges: 1
Posts: 1,068

Reactions

1.1KLike0Dislike163Informative

Apple executives explain Apple Silicon & Neural Engine in new interview

tenthousandthings

February 2023

Here is a transcript of Anand Shimpi's technical contributions to this. For those who don't know, he is the founder of Anandtech, and was hired away from his own very successful tech site by Apple in 2014. It seems like he really is one of the driving forces behind Apple Silicon, and likely a major influence in (or his hire was a result of) the decision to bring it to the Mac. He says some interesting things in here (the MacBook Pro will always get the current generation, and that's also the goal for Macs in general), but overall it's a useful explanation of how Apple is approaching this, of what they are trying to do:
[Intro questions]

The silicon team doesn’t operate in a vacuum, right? … When these products are being envisioned and designed, folks on the architecture team, the design team, they’re there, they’re aware of where we’re going, what’s important, both from a workload perspective, as well as the things that are most important to enable in all of these designs.

I think part of what you’re seeing now is this now decade-plus long, maniacal obsession with power-efficient performance and energy efficiency. If you look at the roots of Apple Silicon, it all started with the iPhone and the iPad. There we’re fitting into a very, very constrained environment, and so we had to build these building blocks, whether it’s our CPU or GPU, media engine, neural engine, to fit in something that’s way, way smaller from a thermal standpoint and a power delivery standpoint than something like a 16-inch MacBook Pro. So the fundamental building blocks are just way more efficient than what you’re typically used to seeing in a product like this. The other thing you’re noticing is, for a lot of tasks that maybe used to be high-powered use cases, on Apple Silicon they actually don’t consume that much power. If you look, compared to what you might find in a competing PC product, depending on the workload, we might be a factor of two or a factor of four times lower power. That allows us to deliver a lot of workloads that might have been high-power use cases on a different product in something that actually is a very quiet and cool and long-lasting sort of use case. The other thing that you’re noticing is that the single-thread performance, the snappiness of your machine, it’s really the same high-performance core regardless of if you’re talking about a MacBook Air or a 14-inch Pro or 16-inch Pro or the new Mac mini, and so all of these machines can accommodate one of those cores running full tilt, again we’ve turned a lot of those usages and usage cases into low-power workloads. You can’t get around physics, though, right? ... So if you light up all the cores, all the GPUs, the 14-inch system just has less thermal capacity than the 16, right? ... So depending on your workload, that might drive you to a bigger machine, but really the chips are across the board incredibly efficient.

[Battery life question]

You can look at how chip design works at Apple. You have to remember we’re not a merchant silicon vendor, at the end of the day we ship product. So the story for the chip team actually starts at the product, right? ... There is a vision that the design team, that the system team has that they want to enable, and the job of the chip is to enable those features and enable that product to deliver the best performance within the constraints, within the thermal envelope of that chassis, that is humanly possible. So if you look at kind of what we did going from the M1 family to the M2 Pro and M2 Max, at any given power point, we’re able to deliver more performance. If you look at, on the CPU we added two more efficiency cores, two more of our e-cores. That allowed us, or was part of what allowed us, to deliver more multi-thread performance, again, at every single power point where the M1 and M2 curves overlap we were able to deliver more performance at any given power point. The dynamic range of operations [is] a little bit longer, a little bit wider, so we do have a slight increase in terms of peak power, but in terms of efficiency, across the range, it is a step forward versus the M1 family, and that directly translates into battery life. The same thing is true for the GPU, it’s kind of counterintuitive, but a big GPU running a modest frequency and voltage, is actually a very efficient way to fill up a box. So that’s been our philosophy dating back to iPhone and iPad, and it continues on the Mac as well.

// But really the thing that we see, that the iPhone and the iPad have enjoyed over the years, is this idea that every generation gets the latest of our IPs, the latest CPU IP, the latest GPU, media engine, neural engine, and so on and so forth, and so now the Mac gets to be on that cadence too. If you look at how we’ve evolved things on the phone and iPad, those IPs tend to get more efficient over time. There is this relationship, if the fundamental chassis doesn’t change, any additional performance you draw, you deliver has to be done more efficiently, and so this is the first time the MacBook Pro gets to enjoy that and be on that same sort of cycle.

On the silicon side, the team doesn’t pull any punches, right? … The goal across all the IPs is, one, make sure you can enable the vision of the product, that there’s a new feature, a new capability that we have to bring to the table in order for the product to have everything that we envisioned, that’s clearly something that you can’t pull back on. And then secondly, it’s do the best you can, right? ... Get as much down in terms of performance and capability as you can in every single generation. The other thing is, Apple’s not a chip company. At the end of the day, we’re a product company. So we want to deliver, whether it’s features, performance, efficiency. If we’re not able to deliver something compelling, we won’t engage, right? ... We won’t build the chip. So each generation we’re motivated as much as possible to deliver he best that we can.

[Neural engine question]

… There are really two things you need to think about, right? ... The first is the tradeoff between a general purpose compute engine and something a little more specialized. So, look at our CPU and GPU, these are big general purpose compute engines. They each have their strengths in terms of the types of applications you’d want to send to the CPU versus the GPU, whereas the neural engine is more focused in terms of the types of operations that it is optimized for. But if you have a workload that’s supported by the neural engine, then you get the most efficient, highest density place on the chip to execute that workload. So that’s the first part of it. The second part of it is, well, what kind of workload are we talking about? Our investment in the neural engine dates back years ago, right? The first time we had a neural engine on an Apple Silicon chip was A11 Bionic, right? ... So that was five-ish years ago on the iPhone. Really, it was the result of us realizing that there were these emergent machine learning models, where, that we wanted to start executing on device, and we brought this technology to the iPhone, and over the years we’ve been increasing its capabilities and its performance. Then, when we made the transition of the Mac to Apple Silicon, it got that IP just like it got the other IPs that we brought, things like the media engine, our CPU, GPU, Secure Enclave, and so on and so forth. So when you’re going to execute these machine learning models, performing these inference-driven models, if the operations that you’re executing are supported by the neural engine, if they fit nicely on that engine, it’s the most efficient way to execute them. The reality is, the entire chip is optimized for machine learning, right? ... So a lot of models you will see executed on the CPU, the GPU, and the neural engine, and we have frameworks in place that kind of make that possible. The goal is always to execute it in the highest performance, most efficient place possible on the chip.

[Nanometer process question]

… You’re referring to the transistor. These are the building blocks all of our chips are built out of. The simplest way to think of them is like a little switch, and we integrate tons of these things into our designs. So if you’re looking at M2 Pro and M2 Max, you’re talking about tens of billions of these, and if you think about large collections of them, that’s how we build the CPU, the GPU, the neural engine, all the media blocks, every part of the chip is built out of these transistors. Moving to a new transistor technology is one of the ways in which we deliver more features, more performance, more efficiency, better battery life. So you can imagine, if the transistors get smaller, you can cram more of them into a given area, that’s how you might add things like additional cores, which is the thing you get in M2 Pro and M2 Max—you get more CPU cores, more GPU cores, and so on and so forth. If the transistors themselves use less power, or they’re faster, that’s another method by which you might deliver, for instance, better battery life, better efficiency. Now, I mentioned this is one tool in the toolbox. What you choose to build with them, the underlying architecture, microarchitecture and design of the chip also contribute in terms of delivering that performance, those features, and that power efficiency.

If you look at the M2 Pro and M2 Max family, you’re looking at a second-generation 5 nanometer process. As we talked about earlier, the chip got more efficient. At every single operating point, the chip was able to deliver more performance at the same amount of power.

[Media engine question]

… Going back to the point about transistors, taking that IP and integrating it on the latest kind of highly-integrated SOC and the latest transistor technology, that lets you run it at a very high speed and you get to extract a lot of performance out of it. The other thing is, and this is one of the things that is fairly unique about Apple Silicon, we built these highly-integrated SOCs, right? ... So if you think about the traditional system architecture, in a desktop or a notebook, you have a CPU from one vendor, a GPU from another vendor, each with their own sort of DRAM, you might have accelerators kind of built into each one of those chips, you might have add-in cards as additional accelerators. But with Apple Silicon in the Mac, it’s all a single chip, all backed by a unified memory system, you get a tremendous amount of memory bandwidth as well as DRAM capacity, which is unusual, right? ... In a machine like this a CPU is used to having large capacity, low bandwidth DRAM, and a GPU might have very low capacity, high bandwidth DRAM, but now the CPU gets access to GPU-like memory bandwidth, while the GPU gets access to CPU-like capacity, and that really enables things that you couldn’t have done before. Really, if you are trying to build a notebook, these are the types of chips that you want to build it out of. And the media engine comes along for the ride, right? ... The technology that we’ve refined over the years, building for iPhone and iPad, these are machines where the camera is a key part of that experience, and being able to bring some of that technology to the Mac was honestly pretty exciting. And it really enabled just a revolution in terms of the video editing and video workflows.

The addition of ProRes as a hardware accelerated encode and decode engine as a part of the media engine, that’s one of the things you can almost trace back directly to working with the Pro Workflows team, right? ... This is a codec that it makes sense to accelerate to integrate into hardware for our customers that we're expecting to buy these machines. It was something that the team was able to integrate, and for those workflows, there’s nothing like it in the industry, on the market.
NOTE: I did this transcript. It may contain some mistakes. I also cut out some interjections, like "I think" and "kind of." But I tried to capture his talking style, especially the use of "..., right?"
New Mac Pro may not support PCI-E GPUs

tenthousandthings

February 2023

programmer said:

I haven't seen it mentioned here that there is a fundamental architectural difference between the Apple Silicon GPU and pretty much every other GPU architecture (except Imagination/PowerVR): it is a tile-based architecture. If you look at the Metal programming manual, there are numerous differences in how the two types of architectures need to be dealt with by the application. Those differences will be even more severe at the OS/driver level. Apple is quite aggressive at dropping old hardware in order to reduce their software burden (given, as mentioned above, software lifecycles are actually longer than hardware delivery cycles), and I would imagine they want to get to a place where (in Metal 4 or 5) they can completely focus on their hardware's tile-based architecture, and not have to accommodate AMD/nVidia designs. Same applies to the rumoured "ray tracing" support [...]

Great post, really helpful, thank you for taking the time to write it. Pardon me if I'm wrong about this, but isn't everyone moving toward tile-based/chiplet-based architectures, not just Apple? I know AMD's RDNA 3 architecture uses "chiplets" (AMD's term for tiles). So even while Apple is moving toward dropping support in Metal 4 or 5 for older architectures, the rest of the industry is moving along with them. So accommodating AMD's current, chiplet-based 7000-series designs (and future designs) in the Apple Silicon lattice Mac Pro wouldn't necessarily be too much of a burden?
New Mac Pro may not support PCI-E GPUs

tenthousandthings

January 2023

LOL, this is the opposite of what I said last night in the Intel Mac Pro versus M2 Pro Mac mini thread. All that beautiful thermal engineering, the MPX modules with Infinity Fabric Link, it all goes to waste, discontinued less than four years after launch (December 2019)? That would be a shame. I really don't think Apple is in the business of shooting itself in the foot like that.

It would, however, be very Apple-like to use an UltraFusion-like interconnect (which is similar to AMD's Infinity Fabric), so the MPX options at launch would be limited and expensive.
M2 Pro Mac mini vs Mac Pro - compared

tenthousandthings

January 2023

keithw said:

blastdoor said:

Good comments discussion of GPU performance.

In order for the Mac Pro to compete with 'pro' level Windows/Linux systems using high-end discrete GPUs, I wonder if Apple needs to either (1) continue to include high-end discrete GPUs in the Mac Pro (which kind of runs contrary to Apple's strongly expressed preference for sharing memory between CPU and GPU cores, but perhaps so be it) or (2) reconfigure Apple Silicon so that CPU and GPU cores sit on different pieces of silicon and are linked together via 'UltraFusion', thereby perhaps improving chip yields since CPU and GPU cores could be on separate cores.

I'm inclined to think that option 2 is more appealing technically, but I'm not sure about the business/economics side. If Apple puts CPU and GPU on different dies linked by UltraFusion (or whatever they want to call their 'glue'), they would likely need to do that across more product lines than just the Mac Pro. Maybe only integrate CPU and GPU on a single piece of silicon for the generic M#, but for Pro, Max, Ultra, etc, put CPU and GPU on different dies linked. That would allow independent scaling of CPU and GPU power to better target the needs of users who either need more CPU or more GPU (or both).

I can see no technical reason Apple can't have both a powerful on-chip set of GPUs as well as supporting discrete PCIe GPU boards for people who need it. I've been running an AMD RX 6900 XT graphics card in a Thunderbolt 3-based eGPU enclosure made by Sonnettech, and getting similar GB 5 metal results to the current Mac Pro. This is with a 5-year-old iMac Pro.

I think if we step back, it’s evident the current Mac Pro must have been designed to house Apple Silicon. They introduced it only a year before the announcement (shipping only months before it) — the planning and preparation for such a consequential move must have been well underway in 2017 and 2018 when the current Mac Pro was created. If so, then the plan all along has been to support MPX modules with Apple Silicon, including GPUs.
First M2 Pro benchmarks prove big improvement over M1 Max

tenthousandthings

January 2023

mikethemartian said:

hodar said:

rob53 said:

Wish I could simply plug a Mac mini into my iMac display.

That is my single “ding” against the iMac line. The display is magnificent; but then tech on the motherboard will be obsolete decades before the monitor is done. The monitor doesn’t have the ability to switch inputs; which is why I went with the Mac Mini

With a decent monitor; keyboard, mouse and a little extra storage the cost outlay is not that far apart, assuming you start off with a upper level Mini

It is a risk someone takes whenever they buy an all-in-one product from any company.

To state the obvious, the comments here about wanting to use the iMac 5K as a display vividly illustrate why the iMac 5K is dead.

Somewhere, I’m pretty sure it was in 2017 when they did the mea-culpa about the 2013 Mac Pro graphics and thermals, someone explained the target audience of the iMac 5K — the same as the Mac Studio. They learned a lot from that — the iMac 5K sold to a broader audience than they expected. So now they’ve got this broad range, from $599 to $3999 (base configurations), plus the Studio Display.

The bonus in this is it returns the iMac to the original vision for it (and that of the original Macintosh as well) — I think we’ll see it in March, along with the M2 iPad Air.

Then WWDC features both the announcement of the reality machine, and the powerful Macs designed to create for it, the Mac Pro and the Mac Studio. Also, one more thing, the new Liquid Retina Pro Display, in two sizes, 28" 6K and 32" 8K, all with Thunderbolt 5, all available in Fall 2023, thus ending the Apple Silicon transition.