anonconformist

About

Username: anonconformist
Joined: September 2015
Visits: 111
Last Active: March 5
Roles: member
Points: 585
Badges: 0
Posts: 202

Reactions

214Like2Dislike75Informative

Compared: M1 vs M1 Pro and M1 Max

anonconformist

November 2021

tht said:

anonconformist said:
With dedicated media acceleration they don’t have a need for several low-power cores while playing back video, or even encoding it: most common desktop applications rarely make use of more than 2 separate threads at any given time, with one being for front end GUI interaction with the user, and a background thread for waiting on I/O, which, when the hardware is implemented correctly, is mostly waiting without much processor use anyway. Thus, it’s fully possible in most use-cases you can do all you need with just the 2 energy-efficient cores: Mail or Pages will be far more than fast enough with the 2 small cores. Web browsing, similar for I/O, but other things will cause greater power core usage, and a lot more threads.

The other reason you would use efficiency cores are for regular scheduled background system tasks, that don’t need much CPU power. Again, even 2 threads is more than a lot of those will use. I personally thought it was more an odd choice for the M1 to start out with 4 energy-efficient cores because of these reality-based considerations. Perhaps that was more a question of getting some real-world results achieved with lower risk, and getting real-world data of core usage in practice as to why they went with those decisions.
If you look at the die shots, the e-cores take up about 1% of the SoC chip area in the M1 Max and about 2% in the M1 Pro. Adding another 2 e-cores isn't a big cost. I can only think of one reason for them to do this: they needed the standby and idle power consumption of the M1 Pro and Max to be as low as they could possibly go, and they felt that 2 additional e-cores was enough CPU to handle standby and idle tasks, like all the app and system tasks you mention. But the iPhone has 4 e-cores! And, you'd think an iPhone would really need to save on power, and yet, it has 4 e-cores. So, here we are, why the difference?

The e-cores are so small such that having 4 or 6 e-cores just isn't that big of a cost. If they prevent the p-cores from firing up while web browsing, especially background web pages, it seems like a nice gain in energy efficiency. It's a browser-centric, javascript-centric workflow most of the time for users, even if they do a lot of GPU compute or multi-threaded workloads, most people still spend a lot of their time just web browsing. So, managing background web pages with e-cores seems like a really good idea.

At some point adding e-cores isn't effective in the energy efficiency versus perf trade, so, it's not going to be 8, 12, 16 e-cores, but I would think 4 to 8 would be the sweet spot for 8, 16, 32 p-core systems.

Consider this for a possible explanation as to why only 2 e-cores versus more: as much as each of their cores are a tiny percentage of the die space, it’s distance signals must traverse farther for, and included in this is memory controllers and cache coherency for each core, which I’m reasonably guessing is as complicated and as large and as much of a system performance overhead with no regards to how fast the core is. In software development a big overhead for many processes with more than one thread is due to synchronization between threads, and the e-cores versus p-cores won’t be any better, perhaps even worse, since they tend to run at different frequencies, which has its own fun for buffering between them. The iPhone isn’t something I’d ever expect will expand beyond around what it has for number of main CPU cores and similar GPU cores, and the other added things in the SoC, because… it’s a phone, and not a wise use-case trying to do all you can do in a laptop or even a full-sized iPad for human reasons, if nothing else.

Glue logic has its place and its price in any design, and can make or break a system. It’ll be interesting to see what Apple does for core breakdowns across all their products, from the Watch up to their MacPro. I also consider the original M1, as much as anything, a usable lower-risk proof of concept release for many things, including the scaled-up design of things along so many fronts, and a rational iterative development strategy of walk (fast!) before you try to run. It’s really the Mac MVP for their transition.
Compared: M1 vs M1 Pro and M1 Max

anonconformist

October 2021

retrogusto said:

Any thoughts on why the Pro would perform slightly better than the Max on the Geekbench multi-core test? Maybe the Max runs hotter, which would place it at a disadvantage in a test where both chips have ten identical cores?

There is always a trade off for speed when you add more things to address in a chip.

When you double the accessed number of functions, you add another bit of address/selection lines, and even if the speed of what was accessed (once it was selected) is identical and remains in the selected state, that initial access due to the layers of combinatorial logic still takes more time because it requires more linear distance for signals to travel, and that means more resistance and capacitance of the signal lines, beyond the speed of each transistor used.

Look at the sizes of caches for CPUs: the smaller the caches are, the lower the latency, because of this as a factor. Also, as as you increase the size of the caches, the distance grows (and thus time) for access, and generally it’s preferable to have constant-time/cycles for cache access (especially L1 instruction and data cache) to simplify and speed logic. Once a cache becomes too large, not even counting the heat/power issue, your system performance goes down, even if you can afford to make that large of a chip.

Now, look at the size of the M1 Pro versus the M1 Max: the Max is over twice as large for area. All of the GPU/CPU cache and those cores must be kept coherent, and that has a cost in space, time and energy, and waste heat. Bigger isn’t always better for performance, as alluded to above, and bigger systems tend to be slower due to sheer size: at the speeds of electrons in circuits (not as fast as light, but a good percentage) the differences in the sizes of the M1, M1 Pro and the M1 Max really add up. For the same reasons, it’s a HUGE design win to have the RAM on the same package, because the circuit traces are kept minimal for distance, which greatly affects power usage as well, making the system so darn power-efficient while also being fast.

A brief explanation of resistance/capacitance for those not familiar with it: resistance, measured in ohms, affects how much current (in amps) can flow through a circuit, and a perfect superconductor has zero resistance, so theoretically infinite power could flow through. Capacitance is measured in Farads, and capacitance is the resistance to voltage change. Digital circuits need to have 2 distinct voltage ranges for determining what’s a 1 (high voltage, usually) and a zero, and there is a range in between the one and zero voltage that’s a no-man’s land. The higher the top voltage is compared to the low voltage, the longer time it takes to settle into a recognizable state. Capacitance and resistance both have elements of size involved, and you want to minimize capacitance and its effects for speed (because it resists the change in voltage) as well as ideally not increase current flow with too low of resistance, because that’s wasted power in the form of heat. Long circuit traces increase both, for same width of traces, with more material for traces reducing resistance. This is why processors are down to less than 2 volts for normal operation in recent years, and to make them run faster, you need to increase the voltage to have less effect from RC (resistance/capacitance) time constants because you have more voltage to force through, but it wastes power at the law of squares: twice the voltage, twice the speed, but square the power usage in watts so you now use 4 times as much power.

Given all these constraints from physics, I’m very curious how they’re going to design and manufacture the SoCs used for the Pro desktop/workstations with more CPU and GPU cores with unified memory. Designing and building them as they are has practical limits for costs as well as scaling. I expect the next step is using chiplets on the package with an underlying total system bus on the main chip that is mostly system cache and I/O, with the CPUs and GPUs with their more local caches on chiplets with a number of cores each.
Compared: M1 vs M1 Pro and M1 Max

anonconformist

October 2021

tht said:

Still don't understand the reasons for going from 4 e-cores to 2 e-cores. Then, baselining 8 p-cores for the Jade SoC is also curious decision. They spent the transistors to double up the media blocks, in addition to the GPU cores. Why not add another 4 p-core CPU cluster? Probably something to do with locality to caches, but interesting set of design choices versus SoC size or transistors spent. They have the thermal budget for another 4 p-cores.

Everyone needs graphics performance rush-to-sleep for battery usage if nothing else. The media-specific functionality is a logical focus to add to Pro level chips especially for portable power, as that’s a huge power user.

With dedicated media acceleration they don’t have a need for several low-power cores while playing back video, or even encoding it: most common desktop applications rarely make use of more than 2 separate threads at any given time, with one being for front end GUI interaction with the user, and a background thread for waiting on I/O, which, when the hardware is implemented correctly, is mostly waiting without much processor use anyway. Thus, it’s fully possible in most use-cases you can do all you need with just the 2 energy-efficient cores: Mail or Pages will be far more than fast enough with the 2 small cores. Web browsing, similar for I/O, but other things will cause greater power core usage, and a lot more threads.

The other reason you would use efficiency cores are for regular scheduled background system tasks, that don’t need much CPU power. Again, even 2 threads is more than a lot of those will use. I personally thought it was more an odd choice for the M1 to start out with 4 energy-efficient cores because of these reality-based considerations. Perhaps that was more a question of getting some real-world results achieved with lower risk, and getting real-world data of core usage in practice as to why they went with those decisions.

Me? If I’m typing up a storm using Pages while listening to my music and it doesn’t cause a slow-down/glitch in any of it while typing 90+ WPM with it doing spellchecking as I type, if it’s only using 2 efficient cores, that’s less wasted energy I can use later on other things.

I strongly suspect based on my experience as a developer and doing developer support that 4 energy-efficient cores is the most Apple will ever have on a system, because of what I explained above. There are only a few types of applications that make effective use of more than 2 cores, and other than properly-written games and media encoding, most of them are in the server realm. Making full use of a large number of cores isn’t nearly as easy or feasible as most would like to think in practice because too much sequential dependency is involved: this is why, unless you’re running a server, an M1 Max is far more useful with a smaller number of faster cores, than an AMD Threadripper with 64: all those cores are going to be slower, and most of the time, not used. Most users would be shocked to learn how little their M1 of any variety are fully utilized.
Compared: M1 vs M1 Pro and M1 Max

anonconformist

October 2021

Not mentioned in the article is the importance of caches on the SoC, as main memory, no matter how fast you can buy it, is insufficient to get performance: latency to get or set anything in main memory is HUGE! The M1 Pro and M1 Max make a huge investment in die space for all that cache, and all that cache has high value for total system performance, not just the CPU cores.
New MacBook Pro chips deliver desktop performance with better power efficiency

anonconformist

October 2021

waveparticle said:

Is M1 Max chip physically larger than M1 Pro?

Yes, it has far more transistors.

But, the chip itself doesn’t do more than be a partial determinant of the size of the package, which also includes the RAM, which are in separate chips on the same package.

In theory, both M1 Pro and M1 Max chips could be put on silicon with the same size and shape, but defects would have a larger impact on yield and drive up costs of the M1 Pro.

For manufacturing, it’s likely easier and less expensive to make the exterior package with the chips be the same size and layout so only one motherboard is required, as well as simplifying cooling.