programmer

Generation gaps: How much faster Apple Silicon gets with each release

programmer

November 2024

You are aware, I hope, I was referring to the OP's comment about 40 years hence? If you don't think in forty years computing power will be over 1000 times more powerful I am guessing you are young? I started woking for Apple in the late 70s so have a long perspective.

Not as old as you, but not far off. And I’m in the industry right now, and have been for decades with a good view of what is really going on. I’m extremely familiar with how far we’ve come, and yes, it is millions of times more powerful than the earliest computers. Could we see 1000x improvement in the next 40 years? Yes, it’s possible.

My point is that we can’t take past progress as the metric for future progress. This idea of continuous steady progress in process improvement is gone, and has been for quite a while. Much of Moore’s original paper was about the economics of chip production. Performance was kind of a side effect. The problem is that each successive improvement costs more and more, and delivers less and less, and comes at higher and higher risk. In this situation the economic model could break down, and put that 1000x in 40 years in jeopardy. Nobody knows what that’s going to look like because the industry has never been in this position before. New territory. Makes predictions highly suspect.

Generation gaps: How much faster Apple Silicon gets with each release

programmer

November 2024

dope_ahmine said:

CPUs and GPUs actually complement each other in AI. CPUs handle tasks with lots of decision-making or data management, while GPUs jump in to power through the raw computation. It’s not just about one or the other; the best results come from using both for what each does best.

As for energy efficiency, GPUs perform many tasks at a much lower power cost than CPUs, which is huge for AI developers who need high-speed processing without the power drain (or cost) that would come from only using CPUs.

And on top of all that, new architectures are even starting to blend CPU and GPU functions—like Apple’s M-series chips, which let both CPU and GPU access the same memory to cut down on data transfer times and save power. Plus, with all the popular libraries like PyTorch, CUDA, and TensorFlow, it’s easier than ever to optimize code to leverage GPUs, so more developers can get the speed and efficiency benefits without diving deep into complex GPU programming.

This is what the NPU is all about as well. It is, at its core, a matrix multiplication unit. Getting a GPU to multiply large matrices optimally, is a tricky piece of code... so having dedicated matrix multiplication hardware which is purpose-built for the task makes a lot of sense. If you're doing a lot of that. Prior to the heavy adoption of deep learning it was almost unheard of for consumer machines to do large matrix multiplication. That was usually the purview of high performance computing clusters. With the advent of LLMs and generative models, however, things have changed and it is definitely worth having this hardware sitting on the SoC with the CPUs and GPUs. Apple also appears to have added matrix hardware to their CPUs (in addition to conventional SIMD), so there are lots of options in an Apple Silicon SoC for where to do these matrix operations. The NPU is very likely the most power efficient at that (by far), and may also have the highest throughput. And if you're also doing graphics or other compute, now you don't have to worry about your GPU and CPUs being tied up with the ML calculations. And the SoC's unified memory architecture lets all these units share their data very very efficiently.

Generation gaps: How much faster Apple Silicon gets with each release

programmer

November 2024

MacPro said:

1der said:

It seems Cook’s law is then about 4 years. It's always fun to make lots of assumptions and project into the future. In doing so I imagine in say 40 years what seemingly AI miracles could be accomplished with the machine in your hand being 1000 times as powerful

Same here. However, I bet your 1000-times increase is way short of the mark in terms of performance gain.

This sort of abstraction is based on a fallacy that future progress will follow the same pattern as past progress. Moore's "Law" broke down because that no longer holds. Until about the mid-2000s, we were rapidly and steadily taking advantage of the relatively easy scaling offered by the available EM spectrum for exposing masks. Since that time the rate of improvement has gotten slower, much harder, and much more expensive because we've reached extreme frequencies which are hard to use, we've hit the power leakage problem at tiny feature sizes, and so many more issues. Each process node improvement is a slow, expensive victory with ever more diminishing returns. For a lot of kinds of chips its not worth the cost of going to a smaller process, and that means there is less demand to drive shrinking to the next node. So it is not justified to look at the progress over M1 thru M4 and extrapolate linearly. We aren't at the end of the road, but getting to each successive process node is less appealing.

Generation gaps: How much faster Apple Silicon gets with each release

programmer

November 2024

Galfan said:

There is an error in this article. It seems that for the m4 pro gpu graph the editors picked out the opencl test report from Geekbench instead of the metal test.
in metal the m4 pro gpu scores around 113865

Nice catch, thanks! Their number made no sense, but I didn't know why.

Generation gaps: How much faster Apple Silicon gets with each release

programmer

November 2024

emoeller said:

Would be really interesting to know the degree of binning that Apple uses. Generally the more complex the higher the failure rate but that leaves a chip that could probably operate using fewer CPU/GPU parts resulting in a lower grade chip. Since the chips are tested and sealed with the chip name (M4, M4 Pro, M4 Max, M4 Ultra) on the top we can't tell really what the configuration is other than it will be the maximum advertised chipset. Hence a binned M4 Max could be sold/used as either a MR or M4 Pro.

I suspect the extent of the binning used is obvious from their available products for sale... i.e. you can get the cheaper version with fewer cores, or the more expensive one with "all" of them. They are unlikely to use extremely low function Maxes to sell as the baseline or Pro.

programmer

About

Reactions

Generation gaps: How much faster Apple Silicon gets with each release

Generation gaps: How much faster Apple Silicon gets with each release

Generation gaps: How much faster Apple Silicon gets with each release

Generation gaps: How much faster Apple Silicon gets with each release

Generation gaps: How much faster Apple Silicon gets with each release