JustSomeGuy1

About

Banned
Username: JustSomeGuy1
Joined: April 2018
Visits: 60
Last Active: June 2024
Roles: member
Points: 1,172
Badges: 1
Posts: 330

Reactions

472Like0Dislike121Informative

Johny Srouji says Apple's hardware ambitions are limited only by physics

JustSomeGuy1

October 2021

602warren said:

KTR said:

Will Google be selling the chips, or are they going to try to blend Google os and chip?

My gut tells me they will behave like Samsung with phone screens and make incredible proprietary products for themselves, and sell off the ‘lesser’ versions to others for use in their devices. But it’s an interesting question and can’t wait to see what they do in the future. As long as someone is keeping up with, or trying to keep up with, A or M series chips, that will push innovation across the board in consumer devices.

I dislike Samsung but that's not accurate. The OLED screens they sold Apple for use in iPhones have been better than the screens they've used on their own flagship phones. It's just a matter of who was willing to pay more and work more.
Compared: 14-inch MacBook Pro vs. 13-inch M1 MacBook Pro vs. Intel 13-inch MacBook Pro

JustSomeGuy1

October 2021

Marvin said:

JustSomeGuy1 said:
What do you do about that RAM, actually? How do you even get room for enough traces to simply talk to all that RAM? And if you want anything even remotely like the memory capacity the current Intel chips have, how are you going to accomplish that? It may be that only HBM can even get you close to that, and that is *extremely* expensive. Like, prohibitively so except for high-end Pro buyers.

Fundamentally, the biggest issue is that the integrated close RAM that's such a big part of their performance magic is just not scalable. There's physically not enough room for it or its traces, since you need a lot of that room for inter-CPU links (which would push the RAM to... where?). You can't just fill up a larger diameter as speed-of-light issues will start to affect latency.

But they have a plan. They *are* going to solve this. And whether that involves 3D stacking of some sort, order-of-magnitude larger caches, HBM... we don't know yet. It might involve radical cooling solutions. And there's always the possibility of them doing something really new, which is of course the most exciting possibility of all.

Really we don't even know if they're going to maintain the unified memory. It seems at least plausible that they won't given the call from some quarters for multiple GPUs. I wouldn't bet on it though.

HBM would be expensive for iMac models but the price is ok for Mac Pro level. Here it estimates 16GB HMB2 at $320:

https://www.fudzilla.com/news/graphics/48019-radeon-vii-16gb-hbm-2-memory-cost-around-320

$1280 for 64GB, $2560 for 128GB. That's not a lot of money for that much memory. An upcoming Radeon GPU is reported to offer up to 128GB HBM2E:

https://www.tomshardware.com/news/amd-aldebaran-memory-subsystem-detailed

Intel will use HBM too:

https://www.tweaktown.com/news/80272/intel-confirms-sapphire-rapids-cpus-will-use-hbm-drops-in-late-2022/index.html

It says here there will be a successor to HBM in late 2022:

https://www.pcgamer.com/an-ultra-bandwidth-successor-to-hbm2e-memory-is-coming-but-not-until-2022/

I don't think the links between chips are as important. Supercomputers are made up of separate machines. The separate GPUs in the current Mac Pro are connected by Infinity Fabric at 84GB/s. M1 Max has 400GB/s memory bandwidth.

A lot of tasks that work in parallel can just be moved to the other chips for example processing separate frames in video software or separate render buckets in 3D software.

But as you say, we can assume they've planned it out. They employ experts in their field so they'll have a solution to scale things up. If it's good enough for Intel to do this in server chips and AMD to do it in their GPUs, it should be fine for Apple to do it too:

https://www.nextplatform.com/2021/08/19/intel-finally-gets-chiplet-religion-with-server-chips/
[graphics removed]

You're missing several key points.

SK Hynix announced their HBM3 for shipment next year just a couple days ago. Max stack height is 12, total stack capacity is 24GB. That requires *1024* pins for data. *Per stack*. You just can't get enough capacity using HBM (as I pointed out right after the post you quoted). So HBM might be a component of their solution, but it can't be the entire answer, unless they're willing to completely give up on large-memory configurations. That seems unlikely, but... no way to know for now.

Your larger error is in thinking interchip links aren't so important. They are fundamental to whatever solution Apple comes up with. The key to Apple's "secret sauce" right now is their unified memory. But if you think that the magic comes just from the CPU and GPU having equal access to the same memory pool, you're missing half the story. The other half is that memory bandwidth is huge. Oh, and there's a third "half" (heh), which is the astounding cache architecture (with very low latency, a big part of the overall picture). That sets them apart from everyone else.

Interchip links are key not just because of bandwidth issues, but also latency. Your examples mix a bunch of different technologies that are appropriate for different applications, but not for linking cores in a large system. They also have dramatically different requirements than pure memory busses or supercomputer links (ethernet or infiniband most often). I glossed over a bunch of that briefly when I mentioned cache coherency issues.

If you want to get a bit of a sense of why that all matters, Andrei over at Anandtech built a very cool benchmark that shows a grid with cross-core latency figures for large CPUs, which he uses when reviewing chips like EPYCs and Xeons. (I think he's used it in the Apple Ax reviews too.) You can see the impact the various technologies have.

As for chiplets - yes, that's the obvious path forward generally for everyone. But it's not at all obvious how you combine that idea with unified RAM. This is what I was talking about above - if you build a chiplet setup, you're comitted to either some sort of NUMA setup, or to a central memory controller. (That is, the architecture of the first EPYCs, versus the architecture of the most recent generation.) Both have benefits and drawbacks, and both are problematic if you're trying to do what Apple has done with their memory architecture. You run into a variety of issues with physical layout and density. Among other things, this produces huge heat engineering challenges, and grave dificulties physically routing enough traces between CPUs and RAM.

This is not something you can handwave away by pointing at other implementations. Apple is going to have to do something different. And when they do, it's going to be very exciting. Don't preemptively diminish it by likening it to existing chips. It won't be, unless they give up on close unified memory.
Compared: 2021 New 16-inch MacBook Pro vs. 2019 16-inch MacBook Pro

JustSomeGuy1

October 2021

You're really missed the entire point of what they did with the camera. There is no notch.

That is, the old laptop has a 16x10 screen. The new one has a 16x10 screen (with no notch). And then, on top of that, they stuck a camera and some extra screen space, which can be used to hold a menu bar. I expect that some enterprising developer will shortly release a little utility that blacks out that extra space, moving the menu bar down, and then you can just use it like the older version, with a perfectly rectangular display area.

Put another way, this model doesn't come with a notch cut out of the screen. It comes with tabs *added* to the screen, on either side of the camera.

(Oh, and if you're wondering, full screen apps display with no cutout because they exist entirely below the camera. Like I said, it's not a notch, it's bonus tabs.)
Report suggests Apple's A15 Bionic lacks significant CPU upgrades due to chip team brain d...

JustSomeGuy1

September 2021

(Posted this on another article, but it belongs here.)

[...] it seems likely that CPU performance is not significantly improved - lacking more facts, my money is on Andrei's analysis (in AnandTech), maybe 5-6%. But there are two wildly divergent ways to look at this.

It is possible that this is simply the result of a brain drain. That's a popular take in the press, right now. It's not clear how that analysis lines up with other known facts, like the massively improved NPU.

There is another possibility though. Apple is now designing a pair of cores for use not just in the phone, but also in the Mac. What are the needs of those two devices?
- For the phone, the biggest need is NOT more CPU performance. It's lower power use, which leads to greater sustained performance or longer battery life.
- For the Mac, it *is* more performance. But Macs are very different from phones, even the laptops. They can afford to burn more power on increased clock speed, unlike phones... IF the chip has the ability to run at higher clocks. It seems likely that the A14/M1 does NOT have that ability, simply based on the MBPs not clocking past 3.2GHz even when on wall current. (This is normal - every chip design has a maximum beyond which it can't go, no matter how much power you throw at it.)

The A-series chips have sped up from ~2.3GHz to ~3GHz over the last five years, since the iPhone 7, but most of the performance has come from widening the cores. But this leaves a ton of performance on the table- they should be able to get at least 4GHz, and possibly close to 5GHz, out of the process node they're using now, with a newer design. (Power requirements prevent that in the phone, of course.)

Now... what would such a redesign look like? Really, you'd want to try to preserve the IPC of your existing design while allowing for higher clocks. And you'd probably also want to increase your caches to compensate for the fact that every cache miss is going to cost more cycles (as each cycle is quicker). Once that design is done, if you don't need the max performance out of that chip in one situation, you'd run it slower and pocket the power savings.

This looks like it might be what Apple has done. They're claiming better battery life, despite a high-refresh-rate screen, a brighter screen, and a doubled system cache. And oh yeah, that doubled cache seems telling.

So, I think we can't really know what's going on at Apple until the new Macs ship. And maybe not until a new desktop (27" imac and/or Pro, not so much the mini) ship. If my guess is right, what we're seeing is Apple being very smart about maximizing the RoI on a single pair of core designs (high-perf & high-efficiency). They get better power efficiency in the A15, which is their primary design goal this time around, while being able to drive the cores much faster (4-4.5GHz, maybe?) in the M2 Macs. That would give the cores +25%-+40% performance PER CORE from clockspeed. You'd lose some performance due to longer pipelines, cache misses, etc, probably made up for by the larger cache (which might be where the +6% is coming from in the A14).

Next month will be *fascinating*.
RAM in iPhone 13 unchanged from iPhone 12 models

JustSomeGuy1

September 2021

Despite the previous poster's claim, it seems likely that CPU performance is not significantly improved - lacking more facts, my money is on Andrei's analysis (in AnandTech), maybe 5-6%. But there are two wildly divergent ways to look at this.

It is possible that this is simply the result of a brain drain. That's a popular take in the press, right now. It's not clear how that analysis lines up with other known facts, like the massively improved NPU.

There is another possibility though. Apple is now designing a pair of cores for use not just in the phone, but also in the Mac. What are the needs of those two devices?
- For the phone, the biggest need is NOT more CPU performance. It's lower power use, which leads to greater sustained performance or longer battery life.
- For the Mac, it *is* more performance. But Macs are very different from phones, even the laptops. They can afford to burn more power on increased clock speed, unlike phones... IF the chip has the ability to run at higher clocks. It seems likely that the A14/M1 does NOT have that ability, simply based on the MBPs not clocking past 3.2GHz even when on wall current. (This is normal - every chip design has a maximum beyond which it can't go, no matter how much power you throw at it.)

The A-series chips have sped up from ~2.3GHz to ~3GHz over the last five years, since the iPhone 7, but most of the performance has come from widening the cores. But this leaves a ton of performance on the table- they should be able to get at least 4GHz, and possibly close to 5GHz, out of the process node they're using now, with a newer design. (Power requirements prevent that in the phone, of course.)

Now... what would such a redesign look like? Really, you'd want to try to preserve the IPC of your existing design while allowing for higher clocks. And you'd probably also want to increase your caches to compensate for the fact that every cache miss is going to cost more cycles (as each cycle is quicker). Once that design is done, if you don't need the max performance out of that chip in one situation, you'd run it slower and pocket the power savings.

This looks like it might be what Apple has done. They're claiming better battery life, despite a high-refresh-rate screen, a brighter screen, and a doubled system cache. And oh yeah, that doubled cache seems telling.

So, I think we can't really know what's going on at Apple until the new Macs ship. And maybe not until a new desktop (27" imac and/or Pro, not so much the mini) ship. If my guess is right, what we're seeing is Apple being very smart about maximizing the RoI on a single pair of core designs (high-perf & high-efficiency). They get better power efficiency in the A15, which is their primary design goal this time around, while being able to drive the cores much faster (4-4.5GHz, maybe?) in the M2 Macs. That would give the cores +25%-+40% performance PER CORE from clockspeed. You'd lose some performance due to longer pipelines, cache misses, etc, probably made up for by the larger cache (which might be where the +6% is coming from in the A14).

Next month will be *fascinating*.