anonconformist

About

Username: anonconformist
Joined: September 2015
Visits: 111
Last Active: March 5
Roles: member
Points: 585
Badges: 0
Posts: 202

Reactions

214Like2Dislike75Informative

MacBook Air with M1 chip outperforms 16-inch MacBook Pro in benchmark testing

anonconformist

November 2020

chadbag said:

Why would a unified memory system be a detriment to memory intensive operations?

GPU cores needs to contend with CPU cores for memory cycles. Unless there’s fast enough memory to fulfill all CPU requests and GPU requests at the same time, something will need to get memory access at a lower priority. If the CPU cores have a lower priority, the CPU cores are stalled and doing no computation while waiting for the GPU cores.

Note also that main memory requests haven’t been for a single byte or even 8 bytes at a time for a very long time: all main memory access is done at chunks the size at least of CPU cache lines. I don’t know what the M1 uses, but for at least the last 20 years the smallest cache line has been 32 bytes. It’s done this way because the electrical signaling overhead takes a meaningful amount of time, and RAM (dynamic) also has other costs, but the net result is a sequential chunk is usually going to be far more efficient to access than 8 bytes or less. The latency between main memory access and getting/setting data is many many CPU cycles as main memory is a fraction of the speed of CPU L1 data and instruction caches, which tend to be slower than CPU register accesses.

By combining GPU cores, CPU cores and huge L1/L2 caches along with L3, and minimizing protocol overhead between CPU and GPU along with minimal wire time, at least (assuming they’re not using buses between them like PCIE) they can make this negotiation between the various SoC function areas very fast, far faster than if it were spread out on a motherboard. Despite all that, it doesn’t change the fact that it’s unlikely that all parts that need memory access can have as much memory access at the same time as it’d take to keep them all running at maximum performance.
Mac Studio gets an update to M4 Max or M3 Ultra

anonconformist

March 5

apple4thewin said:

anonconformist said:

netrox said:

The fact that M3 Ultra now support up to 512 GB RAM is pretty amazing. It's great for large scale LLMs. Ultra 2 would only support 192GB at max.

Why anyone would dislike your comment is puzzling to me.

I bought a Surface Laptop 7 with 64 GB RAM (at a little discount, as I’m a Microsoft employee: these can only be bought directly from Microsoft) purely for the point of having a Windows machine to run larger LLMs and do AI experimentation at a reasonable budget, knowing there are better performance options if you have bottomless budgets.

For the price, it’s a great deal: not many machines can run that large of LLMs. It’s not perfect, as memory bandwidth and thermals (when running pure CPU for the LLMs makes it a bit warm) appears to be the bottlenecks. Right now the NPU isn’t supported by LM Studio and others, and where you can use the NPU, most LLMs aren’t currently in the right format. It’s definitely an imperfect situation. But it runs 70 Billion parameter LLMs (sufficiently quantized) that you couldn’t do with nVidia chips at a rational price, but you do need to be patient.

I’d love to have seen an M4 Ultra with all the memory bandwidth: with 512 GB RAM, presumably being able to use all the CPU cores, GPU cores and Neural Engine cores, it’s likely still memory bandwidth constrained. I would note: my laptop is still perfectly interactive at load, with only the 12 cores. I’d expect far more with one of the Mac Studio beasts.

We finally have a viable reason mere mortals could make effective use of 512 GB RAM machines: LLMs. Resource constraints of current hardware are the biggest reasons we can’t have a very user-friendly, natural human language interaction hybrid OS using LLMs to interact with humans, and the traditional older style OS as a super powerful traditional computer architecture device driver and terminal layer. The funny thing is with powerful enough LLMs, you can describe what you need, and they can create applications that run within the context of the LLM itself to do what you need, they’re just needing a little bit more access to the traditional OS to carry it out for the GUI. I know, because I’m doing that on my laptop: it’s not fast enough to run all LLMs locally at maximum efficiency for humans yet, but it does work, better than expected.

Out of genuine curiosity why would one need to run a LLM especially with a maxed out m3 ultra? Like the use cases for such local llm

That’s a perfectly fair question!

I do developer support at Microsoft. That involves using and creating quite a lot of complex tools, often during the progression of a single support case, whether that be writing working sample code in an Advisory case, or for the sake of debugging and analyzing Time Travel Debugging and other types of telemetry captured in a reasonable timeframe, to doing lots of research and analysis of documentation and source code to make sense of how things should work (right now, I’m stuck analyzing OS source code manually and that’s very time-consuming).

1. Privacy: there are various things you can’t afford to have exposed outside of your machine, or as limited of a group of people as you can avoid. GDPR amplifies that, but this is also true for personal, confidential stuff you want to work on, for your formal employer, or you, if self-employed, or doing stuff outside of work.

2. When you have enough utilization, at some point it becomes more economical to have only your electric bill as the recurring expense. I live and work in the US where electricity is relatively cheap compared to some other locations in the world. But, the more tokens used, the more computer time used with online services, the more the equation tips towards it making sense. The observation is the more you can spend on those tokens, often the better the results you can get.

3. Rate-limiting factors: even if you have the budget for the remote LLM usage, you have a very real probability of being rate-limited. This may happen for many reasons, including the ultimate rate-limiting aspects of down servers or internet connections.

4. Online LLMs tend to be tuned to not provide output as you’d like it: they’re targeted to certain requirements for business reasons that don’t necessarily align with your needs. Especially working with code, this can be problematic. If all you need is a tiny application that can be created with a single prompt, that’s tiny, simple, and rather rare in the field. There’s a lot more to it than that. That’s great for youTube demonstration videos, not much like creating more complex applications. Also, refer back to #3, this plays heavily there, too.

5. Every LLM has a personality. We’ve entered that realm that, only a few years ago, was something out of science fiction, and yet, here we are. I’ve worked with quite a few different LLMs, locally as well as remotely, all have a personality. They also each have different strengths and weaknesses, just like humans. There is not such a thing as one-size-fits-all as of now.

6. Smaller LLMs have the advantage of being more immediate for response, such as translating speech-to-text and text-to-speech, and topic-specific autocompletion at actual interactive speeds. You can’t reliably get that from online because of network latencies. It’s more than a question of which LLM (massive) you want to run, it’s more rational to consider how to optimize the right team of LLMs for your needs: tiny ones for maximum speed for interaction where it matters, huge, powerful ones for heavy-lifting tasks that live typing/speaking interaction speeds aren’t the biggest reason. Might as well have all of them in RAM at once, since memory bandwidth is generally the most limiting factor, but I’ve not had a chance to verify with powerful enough machines that I don’t have to worry about thermal throttling, that have more cores.

7. In my personal research OS inside an LLM (currently inside Grok 3 only, I have a bit of that interface layer to do) I can be extremely abstract and state what I want to do, and the LLM, via my applications, will utilize the insanely great levels of abstraction enabled by a distillation of all the human knowledge in the unsupervised training data to either identify the correct application to fulfill my need, or it will create a new application on the fly to do so, in a repeatable way, focused on that task. I had to do a double-take when I realized what I’d done. I made a mistake once and hit the return key before I intended to, and if I’m to believe what I was told, I accidentally created a productivity application to track projects. Oops! The craziest thing is you can create very powerful functionality using even a simple prompt, or even a whole page of a prompt, that would require a huge amount of code to make happen with traditional applications. The larger and more powerful the LLM in question, the shorter and more abstract that prompt can be. A very large LLM can be compared to the original Macintosh Toolbox in ROM that enabled great GUIs and powerful applications that ran off floppy disks and the machine started out with 128KB RAM. Note: buying super-fast SSDs isn’t a worthwhile tradeoff, as they’re still dreadfully too slow compared to RAM, and you’ll thrash an. SSD to death with swapping.

8. I’ve got a few disabilities that impact me: I can fully control how all this interacts to my wishes.

9. A system where you aren’t running up against arbitrary limits, particularly those outside your control, greatly improves your capacity to get into an incredibly effective state of flow. Autocomplete that slows you down is worse than no autocomplete at all, as one example, and very disruptive. Sudden rate-limiting, or network/service outage? There goes your deep context working state, you’ve easily lost half an hour of effectiveness, if you can get it back, and you don’t know when. If you’re not doing deep-thinking tasks that require you to intensely focus, it doesn’t matter much. Me? I’m autistic with ADHD, dyslexic, dyspraxic and some other things, the fewer things that get in my way, the better.

10. Censorship: what if you want to do things that online models don’t allow? It’s amazing just how much you can’t even discuss with online LLMs for various reasons.

11. The memory required for processing a given context size for LLMs isn’t linear, it’s worse.

I hope this provides useful food for thought: I could possibly have left reasons out. If you know how to use them, and you have good use-cases for them, they’re a time/efficiency force-multiplier and it may make sense to buy a $10K Mac Studio. If all you’re doing is silly chatting, it doesn’t readily justify such expenditures. Right now, I’m both the upper end of AI and traditional OS/computer hardware power user territory, so I know how to make very good use of this, right now. But assuming they have the interest, others will start realizing there’s so much more they can do that they had no idea could be done, like in my experiment of creating a fantasy adventure novel, and asking my locally-run LLM to identify and name all the concepts in the story thus far and compare that against what is found in successful fantasy adventure novels. When you know how to keep them focused on a topic, the hallucination factor is extremely reduced, too, and they even run faster!
ARM iMac, 13-inch MacBook Pro coming at end of 2020, says Ming-Chi Kuo

anonconformist

June 2020

rob53 said:

bobolicious said:

"Apple would also not be beholden to Intel" - is this ironic in that Apple now seems to make (almost) every move to increase customer dependence on a proprietary Apple ?

You do realize Apple is using ARM processor architecture with their own additional components. Apple isn't designing or building an entirely different CPU, it's still using the current ARM architecture so they are still at least partially beholden on ARM. The problem with Intel is they are slow as a snail in developing better and faster CPUs. ARM has been the faster developer of new CPUs not Intel so why should Apple continue to be slowed down by Intel?

Apple is not a proprietary computer manufacturer. They have a few Apple-designed components but the vast majority of components are common. Read through an iFixit or other vendors teardown and you'll see all kinds of components without Apple's name on them. Even the bulk of macOS has been open-sourced, https://opensource.apple.com. ;

When it comes to ARM, Apple is an ARM ISA licensee: other than waiting on ARM to specify the layout of instructions and what they do, Apple has no constraints by ARM itself: they can achieve that ISA goal in any manner they want on their own timeframe, meaning Apple is then (at most) constrained from the outside by the foundry’s limitations and their own design ability.

The only way they could become less constrained by outside issues is to own a foundry for exclusive use. If we reach a limit for process miniaturization then it might become worthwhile for them to own one if they don’t keep advancing technology, because foundries are very expensive and become outdated (currently) very quickly.
Compared: M1 vs M1 Pro and M1 Max

anonconformist

October 2021

retrogusto said:

Any thoughts on why the Pro would perform slightly better than the Max on the Geekbench multi-core test? Maybe the Max runs hotter, which would place it at a disadvantage in a test where both chips have ten identical cores?

There is always a trade off for speed when you add more things to address in a chip.

When you double the accessed number of functions, you add another bit of address/selection lines, and even if the speed of what was accessed (once it was selected) is identical and remains in the selected state, that initial access due to the layers of combinatorial logic still takes more time because it requires more linear distance for signals to travel, and that means more resistance and capacitance of the signal lines, beyond the speed of each transistor used.

Look at the sizes of caches for CPUs: the smaller the caches are, the lower the latency, because of this as a factor. Also, as as you increase the size of the caches, the distance grows (and thus time) for access, and generally it’s preferable to have constant-time/cycles for cache access (especially L1 instruction and data cache) to simplify and speed logic. Once a cache becomes too large, not even counting the heat/power issue, your system performance goes down, even if you can afford to make that large of a chip.

Now, look at the size of the M1 Pro versus the M1 Max: the Max is over twice as large for area. All of the GPU/CPU cache and those cores must be kept coherent, and that has a cost in space, time and energy, and waste heat. Bigger isn’t always better for performance, as alluded to above, and bigger systems tend to be slower due to sheer size: at the speeds of electrons in circuits (not as fast as light, but a good percentage) the differences in the sizes of the M1, M1 Pro and the M1 Max really add up. For the same reasons, it’s a HUGE design win to have the RAM on the same package, because the circuit traces are kept minimal for distance, which greatly affects power usage as well, making the system so darn power-efficient while also being fast.

A brief explanation of resistance/capacitance for those not familiar with it: resistance, measured in ohms, affects how much current (in amps) can flow through a circuit, and a perfect superconductor has zero resistance, so theoretically infinite power could flow through. Capacitance is measured in Farads, and capacitance is the resistance to voltage change. Digital circuits need to have 2 distinct voltage ranges for determining what’s a 1 (high voltage, usually) and a zero, and there is a range in between the one and zero voltage that’s a no-man’s land. The higher the top voltage is compared to the low voltage, the longer time it takes to settle into a recognizable state. Capacitance and resistance both have elements of size involved, and you want to minimize capacitance and its effects for speed (because it resists the change in voltage) as well as ideally not increase current flow with too low of resistance, because that’s wasted power in the form of heat. Long circuit traces increase both, for same width of traces, with more material for traces reducing resistance. This is why processors are down to less than 2 volts for normal operation in recent years, and to make them run faster, you need to increase the voltage to have less effect from RC (resistance/capacitance) time constants because you have more voltage to force through, but it wastes power at the law of squares: twice the voltage, twice the speed, but square the power usage in watts so you now use 4 times as much power.

Given all these constraints from physics, I’m very curious how they’re going to design and manufacture the SoCs used for the Pro desktop/workstations with more CPU and GPU cores with unified memory. Designing and building them as they are has practical limits for costs as well as scaling. I expect the next step is using chiplets on the package with an underlying total system bus on the main chip that is mostly system cache and I/O, with the CPUs and GPUs with their more local caches on chiplets with a number of cores each.
Maxed-out Apple Silicon Mac Pro costs 1/4 what a maxed Intel one did

anonconformist

June 2023

Perhaps the maxed out price of the new AS Mac Pro is 1/4 the price of the old Intel Mac Pro, but it also maxed out at a relatively paltry 192 GB RAM, but the old Mac Pro could have 1.5 TB RAM.

In the cases where you have large enough data sets, the new AS Mac Pro has selected itself out of the running. The SSD isn't nearly as fast as actually having RAM even in the best-case scenario. If all your data is streamed and processed linearly, the amount of RAM required tends to be lower, assuming you don't need to keep too many things streamed with a large enough context.

It's likely that with the kinds of data sets where it is larger than will fit in a new AS Mac Pro, it's not too feasible trying to partition the processing over multiple machines, so buying 4 of them isn't a bargain in that use-case.

Apple clearly is content with limiting their potential market for their halo Mac. This is a logical result.
Mac Studio with M1 UItra review: A look at the future power of Apple Silicon

anonconformist

March 2022

lkrupp said:

crowley said:

lkrupp said:

flydog said:

keithw said:

I'm still trying to decide whether or not to spend the extra $1k for 64 GPU cores instead of 48. I tend to keep machines for at least 5 years (or more,) and want to "future proof" as much as possible up front. Sure, I know there will probably be an M2 "Ultra" or M3 or M4 or M5 in the next 5 years, but the "studio" is the Mac I've always wanted. My current 2017 iMac Pro was a compromise since the only thing available at the time was the "trashcan" Mac, and it was obsolete by then. This thing is 2-1/2 times faster than my iMac Pro in multi-core CPU tests. Howerver, it's significantly slower in GPU performance then my AMD RX 6900 XT eGPU.

An Ultra is complete waste of money, as is adding cores. Xcode does not use the extra cores, nor does anything made by Adobe, Blender, or even Final Cut Pro. None of those apps are significantly faster on an Ultra vs Max (or even the most recent 27" iMac). Games may see a large improvement, but even the fastest Ultra is not a gaming PC replacement (nor is it intended to be).

In real world use, my Ultra is actually slower than my old iMac in some tasks, and the actual difference in performance across the average app is more like 15-20% (not the 300-400% that the benchmarks suggest). Xcode builts are 30% faster, and exporting a 5 minute 4k video via Final Cut Pro is about 10% faster. Anything that uses single core (safari, word, excel) will not be any faster than a Mac Mini. On an average workday, that $4,000 Ultra saves maybe 3 minutes.

Most people will do just fine with a Mac mini.

And you don’t think the developers you mention won’t soon optimize their code to take advantage of the M1 Ultra cores and GPUs? Really? This was a topic on today’s MacBreak Weekly podcast. None of the benchmark software has been optimized yet either. Wait a couple of months and your might change your tune.

Multi core CPUs and GPUs are not a new thing. If developers haven’t managed to utilise them before what makes you think they’re going to be able to now? Parallel processing isn’t just something you can switch on in any given app.

So what’s your point? That the M1 Ultra’s potential will never be fully realized? That the M1 Ultra is a flop that will never live up to its promise? Apple should just admit the M1Ultra is a failed attempt, stop production and go back to Intel? All hail Nvidia?

Apple can add more GPU cores within reason related to how big they can make their chips and their packages, so that scales pretty well, for GPU-based processing, which tends to be reasonably described as “embarrassingly-parallel” in nature. That’s the easy type of thing to make faster.

There are very few types of tasks you can do with regular CPUs that scales well, if at all, by throwing more cores at it: most applications aren’t possible to implement in a parallel-processing manner that can make use of more than one core: this is where faster single cores and fewer of them are far more valuable for the majority of tasks and users. The sorts of uses for so many cores is amenable to server-level tasks more than all but a tiny few specialized client-level tasks. As such, the M1 Ultra SoC absolutely will not be very effectively usable for a regular desktop machine for easily 99% of regular desktop users and their use-cases, as at a minimum, they’d not have a way to do much of anything to get to even 90% CPU usage in a useful manner.

For the record, that’s also true for the M1 Max as well.
Intel under fire: What Wall Street thinks about Apple's new MacBook Pro

anonconformist

October 2021

mcdave said:

hackintoisier said:

Not sure how the summaries listed correlate to the opinion that Intel is “under fire.”

Also, Intel is ramping Alder Lake-S which probably will exceed M1 Max performance (albeit while consuming much more power). Also alder lake will have up to 8 golden cove performance cores and 8 gracemont efficiency cores. Raptor Lake is rumored to launch in 2022 and double the efficiency cores to 16, for a total of 8 + 16 = 24 cores and 32 threads. Intel is still selling a metric ton of processors to the ecosystem. 80 percent market share. Microsoft just announced that it updated the windows 11 kernel thread scheduler to schedule threads in a manner that takes advantage of the hybrid design. Intel might be coming back.

Intel and AMD will be in trouble if and when ecosystem partners like Asus, Dell, Lenovo, Razer, Microsoft etc. introduce non-x86 designs.

I don’t see x86 being in trouble until two things happen.
First, An ARM vendor emerges that sells an ARM processor to the mass market with performance characteristics on par with Apple silicon or the upcoming x86 designs (or the ecosystem partners develop their own in house designs). Qualcomm can’t compete with Alder Lake or Zen 4. And as good as Apple silicon is, it can’t run windows natively… and not only that, Dell, Lenovo, Asus can’t put an Apple silicon processor inside of their laptops because Apple doesn’t sell to other people. So for the billions of users out there who don’t use macOS, Apple silicon is not relevant to them. Now if Apple got into the processor supplier game (it won’t) then that would spell serious trouble for AMD and Intel.

Second, windows on arm needs to be licensed for broader non-OEM use, and it also has to seamlessly run the applications that people want to use like games, office suite software, content creation software, and so on.

Until those two things happen, Intel and AMD will be fine. But Apple’s innovations could spur other laptop manufacturers to follow suit and ultimately press Microsoft for a windows on arm solution. Intel and AMD need to tread carefully, and continue to ramp x86 core design production on smaller nodes. ASAP.

A very dated perspective.
1) Nobody cares about cores, it’s about work done. Let’s see what the workflow reviews say once MBPs hit decent reviewers (giving the trolls another chance to brush up their tactics).
2) Vendors are badging their own silicon with MS leading the way (more like AMD customising for Playstation) so I don’t see an ARM champion emerging as it’s just a supervisory ISA. When Lenovo & HP announce their chips, it’s over for Intel.
3) Intel isn’t even close. When you look at the products which match Apple’s meagre CPU benchmarks (Cinebench is optimised for x86 AVX2 only - the Embree renderer is Intel’s code) the TDP is 125W with 250W peak.

The majority of applications for non-server uses that use more than 2 threads/cores at any given time, let alone effectively, is a very small number and percentage: writing software that isn’t inherently readily parallel and making practical use of more cores is usually far more effort than it’s worth, on a good day. As such, processors like AMD’s ThreadRipper is silly for most uses and users for average software, as most cores will sit idle unless you’re running a lot of other software in the background. A smaller number of faster single cores is the sweet spot for cost/performance of a system, and thus, having a bunch of efficiency cores AND a bunch of performance cores doesn’t seem probable to get great overall results. Most background tasks that aren’t your main active process usually aren’t running constantly, as they’re waiting for data: a lot of background processes can be running on efficient cores, slower, and not affect their effectiveness. As such, most of the time, in practice, a couple efficiency cores can effectively provide more than enough system throughput to provide a responsive GUI on your foreground application AND all the background stuff, too: this is how Apple can get such crazy battery life.

But most people aren’t aware of this reality. The one place where throwing lots of cores at something everyone will notice is the GPU, because its task is embarrassingly parallel in nature.

There are a number of use-cases where all cores can and will be used to good effect, and the people that use those applications (hopefully!) know what they are. But for Apple’s office suite? More than 2 CPU cores is wasted hardware.
Mac Studio may never get updated, because new Mac Pro is coming

anonconformist

February 2023

The old (current) MacPro has expansion capability far exceeding what every other Apple system has, including RAM up to 1.5 TB.

Unless they cede the MacPro market that defines, it makes no sense to make the Mac Studio a one-and-done, because it isn’t remotely that expandable and can’t be made to be without making the squarish peg in the round hole mistake they made with the trashcan MacPro and stupidly over-constraining their most powerful system sold.

How big is the market for something more powerful than what is reasonably seen with the M2 Ultra for RAM? I submit Apple knows more of that than everyone else. Either they have run into problems trying to make a new Apple Silicon MacPro, or they’ve determined the market size for the MacPro isn’t large enough to justify the overhead, are both more reasonable expectations than Apple making the loved (my assessment, what are the numbers? Again, Apple knows) Mac Studio a one-and-done design, as it addresses a clear market point for price and functionality that wasn’t ever really served well by the MacPro.
MacBook Air with M1 chip outperforms 16-inch MacBook Pro in benchmark testing

anonconformist

November 2020

a hawkins said:

People are arguing about number, technical, history, etc would miss the most important point: Apple just make it works.
Most users don't care wether it uses ARMs, Intel, AMD, that architecture, this technology, etc. They care only that they can actually use it.
It does not matter that Apple made any technological advancement or anything. If people can buy it at reasonable price and use in everyday life. It's the end of story. The rest are just nerd chat.

I have 2015 15-inch MacBook Pro and 2018 iMac for work use.
I also just built a custom PC with Core i7 Gen10, 64GB 3200 RAM, NVMe Drive, and a 3080.
That PC blue screened me twice in a month - unrecoverable and need to reformat drive. I've never seen my Mac crash at that level in my 10 years in this platform.

If I can play a game at 4K 144Hz in my iMac I would not even bother touch a PC again. I bought that because Apple cannot do that. I don't care if I have integrated Intel graphic or Apple Silicon or nVidia 9090 or whatever inside that I would not even see it.

Quite likely the BSOD was due to unstable hardware and/or its drivers, but sometimes actual Windows bugs cause them as well, though they’re a relatively small % of them. That’s the advantage you can have with a single vendor of all the innards for hardware and software, though, as long as they properly support it. If your machine could still boot up and go online after that, it most likely informed Microsoft of the issue, I know how this works, my day job is developer support there.

The interesting thing Apple has done by developing the M1 (and future chips) is they’ve very much achieved Steve Jobs’ desire of making the computer very much that of an appliance: most users can’t be bothered to know details of their microwave, they just want it to do their bidding in a very repeatable manner to where they use it on autopilot. It’s just a means to an end. That’s something a lot of us computer geeks need to keep in mind: it’s a mind appliance, and all is great if it doesn’t get in the way of achieving an end. People think more about things that cause them issues in achieving their goals than those that don’t.

The ideal Apple product for someone using that as a guide is something that requires minimal concern about feeding the beast: the computer should be something that gets out of their way, it just works. Once they start having to think too much about getting their goals accomplished, that’s a failure in hardware and/or software for that user.

Thus, we have the curious contrast that’s the ideal user buying/using experience: initial excitement to be able to do new things (or old things better in some way) and desirable boredom regarding how the device works. What’s inside? If it does what they want, that’s a moot point, until their needs change.
Mac Studio gets an update to M4 Max or M3 Ultra

anonconformist

March 5

netrox said:

The fact that M3 Ultra now support up to 512 GB RAM is pretty amazing. It's great for large scale LLMs. Ultra 2 would only support 192GB at max.

Why anyone would dislike your comment is puzzling to me.

I bought a Surface Laptop 7 with 64 GB RAM (at a little discount, as I’m a Microsoft employee: these can only be bought directly from Microsoft) purely for the point of having a Windows machine to run larger LLMs and do AI experimentation at a reasonable budget, knowing there are better performance options if you have bottomless budgets.

For the price, it’s a great deal: not many machines can run that large of LLMs. It’s not perfect, as memory bandwidth and thermals (when running pure CPU for the LLMs makes it a bit warm) appears to be the bottlenecks. Right now the NPU isn’t supported by LM Studio and others, and where you can use the NPU, most LLMs aren’t currently in the right format. It’s definitely an imperfect situation. But it runs 70 Billion parameter LLMs (sufficiently quantized) that you couldn’t do with nVidia chips at a rational price, but you do need to be patient.

I’d love to have seen an M4 Ultra with all the memory bandwidth: with 512 GB RAM, presumably being able to use all the CPU cores, GPU cores and Neural Engine cores, it’s likely still memory bandwidth constrained. I would note: my laptop is still perfectly interactive at load, with only the 12 cores. I’d expect far more with one of the Mac Studio beasts.

We finally have a viable reason mere mortals could make effective use of 512 GB RAM machines: LLMs. Resource constraints of current hardware are the biggest reasons we can’t have a very user-friendly, natural human language interaction hybrid OS using LLMs to interact with humans, and the traditional older style OS as a super powerful traditional computer architecture device driver and terminal layer. The funny thing is with powerful enough LLMs, you can describe what you need, and they can create applications that run within the context of the LLM itself to do what you need, they’re just needing a little bit more access to the traditional OS to carry it out for the GUI. I know, because I’m doing that on my laptop: it’s not fast enough to run all LLMs locally at maximum efficiency for humans yet, but it does work, better than expected.