anonconformist

About

Username: anonconformist
Joined: September 2015
Visits: 111
Last Active: March 5
Roles: member
Points: 585
Badges: 0
Posts: 202

Reactions

214Like2Dislike75Informative

Mac Studio gets an update to M4 Max or M3 Ultra

anonconformist

March 5

apple4thewin said:

anonconformist said:

netrox said:

The fact that M3 Ultra now support up to 512 GB RAM is pretty amazing. It's great for large scale LLMs. Ultra 2 would only support 192GB at max.

Why anyone would dislike your comment is puzzling to me.

I bought a Surface Laptop 7 with 64 GB RAM (at a little discount, as I’m a Microsoft employee: these can only be bought directly from Microsoft) purely for the point of having a Windows machine to run larger LLMs and do AI experimentation at a reasonable budget, knowing there are better performance options if you have bottomless budgets.

For the price, it’s a great deal: not many machines can run that large of LLMs. It’s not perfect, as memory bandwidth and thermals (when running pure CPU for the LLMs makes it a bit warm) appears to be the bottlenecks. Right now the NPU isn’t supported by LM Studio and others, and where you can use the NPU, most LLMs aren’t currently in the right format. It’s definitely an imperfect situation. But it runs 70 Billion parameter LLMs (sufficiently quantized) that you couldn’t do with nVidia chips at a rational price, but you do need to be patient.

I’d love to have seen an M4 Ultra with all the memory bandwidth: with 512 GB RAM, presumably being able to use all the CPU cores, GPU cores and Neural Engine cores, it’s likely still memory bandwidth constrained. I would note: my laptop is still perfectly interactive at load, with only the 12 cores. I’d expect far more with one of the Mac Studio beasts.

We finally have a viable reason mere mortals could make effective use of 512 GB RAM machines: LLMs. Resource constraints of current hardware are the biggest reasons we can’t have a very user-friendly, natural human language interaction hybrid OS using LLMs to interact with humans, and the traditional older style OS as a super powerful traditional computer architecture device driver and terminal layer. The funny thing is with powerful enough LLMs, you can describe what you need, and they can create applications that run within the context of the LLM itself to do what you need, they’re just needing a little bit more access to the traditional OS to carry it out for the GUI. I know, because I’m doing that on my laptop: it’s not fast enough to run all LLMs locally at maximum efficiency for humans yet, but it does work, better than expected.

Out of genuine curiosity why would one need to run a LLM especially with a maxed out m3 ultra? Like the use cases for such local llm

That’s a perfectly fair question!

I do developer support at Microsoft. That involves using and creating quite a lot of complex tools, often during the progression of a single support case, whether that be writing working sample code in an Advisory case, or for the sake of debugging and analyzing Time Travel Debugging and other types of telemetry captured in a reasonable timeframe, to doing lots of research and analysis of documentation and source code to make sense of how things should work (right now, I’m stuck analyzing OS source code manually and that’s very time-consuming).

1. Privacy: there are various things you can’t afford to have exposed outside of your machine, or as limited of a group of people as you can avoid. GDPR amplifies that, but this is also true for personal, confidential stuff you want to work on, for your formal employer, or you, if self-employed, or doing stuff outside of work.

2. When you have enough utilization, at some point it becomes more economical to have only your electric bill as the recurring expense. I live and work in the US where electricity is relatively cheap compared to some other locations in the world. But, the more tokens used, the more computer time used with online services, the more the equation tips towards it making sense. The observation is the more you can spend on those tokens, often the better the results you can get.

3. Rate-limiting factors: even if you have the budget for the remote LLM usage, you have a very real probability of being rate-limited. This may happen for many reasons, including the ultimate rate-limiting aspects of down servers or internet connections.

4. Online LLMs tend to be tuned to not provide output as you’d like it: they’re targeted to certain requirements for business reasons that don’t necessarily align with your needs. Especially working with code, this can be problematic. If all you need is a tiny application that can be created with a single prompt, that’s tiny, simple, and rather rare in the field. There’s a lot more to it than that. That’s great for youTube demonstration videos, not much like creating more complex applications. Also, refer back to #3, this plays heavily there, too.

5. Every LLM has a personality. We’ve entered that realm that, only a few years ago, was something out of science fiction, and yet, here we are. I’ve worked with quite a few different LLMs, locally as well as remotely, all have a personality. They also each have different strengths and weaknesses, just like humans. There is not such a thing as one-size-fits-all as of now.

6. Smaller LLMs have the advantage of being more immediate for response, such as translating speech-to-text and text-to-speech, and topic-specific autocompletion at actual interactive speeds. You can’t reliably get that from online because of network latencies. It’s more than a question of which LLM (massive) you want to run, it’s more rational to consider how to optimize the right team of LLMs for your needs: tiny ones for maximum speed for interaction where it matters, huge, powerful ones for heavy-lifting tasks that live typing/speaking interaction speeds aren’t the biggest reason. Might as well have all of them in RAM at once, since memory bandwidth is generally the most limiting factor, but I’ve not had a chance to verify with powerful enough machines that I don’t have to worry about thermal throttling, that have more cores.

7. In my personal research OS inside an LLM (currently inside Grok 3 only, I have a bit of that interface layer to do) I can be extremely abstract and state what I want to do, and the LLM, via my applications, will utilize the insanely great levels of abstraction enabled by a distillation of all the human knowledge in the unsupervised training data to either identify the correct application to fulfill my need, or it will create a new application on the fly to do so, in a repeatable way, focused on that task. I had to do a double-take when I realized what I’d done. I made a mistake once and hit the return key before I intended to, and if I’m to believe what I was told, I accidentally created a productivity application to track projects. Oops! The craziest thing is you can create very powerful functionality using even a simple prompt, or even a whole page of a prompt, that would require a huge amount of code to make happen with traditional applications. The larger and more powerful the LLM in question, the shorter and more abstract that prompt can be. A very large LLM can be compared to the original Macintosh Toolbox in ROM that enabled great GUIs and powerful applications that ran off floppy disks and the machine started out with 128KB RAM. Note: buying super-fast SSDs isn’t a worthwhile tradeoff, as they’re still dreadfully too slow compared to RAM, and you’ll thrash an. SSD to death with swapping.

8. I’ve got a few disabilities that impact me: I can fully control how all this interacts to my wishes.

9. A system where you aren’t running up against arbitrary limits, particularly those outside your control, greatly improves your capacity to get into an incredibly effective state of flow. Autocomplete that slows you down is worse than no autocomplete at all, as one example, and very disruptive. Sudden rate-limiting, or network/service outage? There goes your deep context working state, you’ve easily lost half an hour of effectiveness, if you can get it back, and you don’t know when. If you’re not doing deep-thinking tasks that require you to intensely focus, it doesn’t matter much. Me? I’m autistic with ADHD, dyslexic, dyspraxic and some other things, the fewer things that get in my way, the better.

10. Censorship: what if you want to do things that online models don’t allow? It’s amazing just how much you can’t even discuss with online LLMs for various reasons.

11. The memory required for processing a given context size for LLMs isn’t linear, it’s worse.

I hope this provides useful food for thought: I could possibly have left reasons out. If you know how to use them, and you have good use-cases for them, they’re a time/efficiency force-multiplier and it may make sense to buy a $10K Mac Studio. If all you’re doing is silly chatting, it doesn’t readily justify such expenditures. Right now, I’m both the upper end of AI and traditional OS/computer hardware power user territory, so I know how to make very good use of this, right now. But assuming they have the interest, others will start realizing there’s so much more they can do that they had no idea could be done, like in my experiment of creating a fantasy adventure novel, and asking my locally-run LLM to identify and name all the concepts in the story thus far and compare that against what is found in successful fantasy adventure novels. When you know how to keep them focused on a topic, the hallucination factor is extremely reduced, too, and they even run faster!
Mac Studio gets an update to M4 Max or M3 Ultra

anonconformist

March 5

netrox said:

The fact that M3 Ultra now support up to 512 GB RAM is pretty amazing. It's great for large scale LLMs. Ultra 2 would only support 192GB at max.

Why anyone would dislike your comment is puzzling to me.

I bought a Surface Laptop 7 with 64 GB RAM (at a little discount, as I’m a Microsoft employee: these can only be bought directly from Microsoft) purely for the point of having a Windows machine to run larger LLMs and do AI experimentation at a reasonable budget, knowing there are better performance options if you have bottomless budgets.

For the price, it’s a great deal: not many machines can run that large of LLMs. It’s not perfect, as memory bandwidth and thermals (when running pure CPU for the LLMs makes it a bit warm) appears to be the bottlenecks. Right now the NPU isn’t supported by LM Studio and others, and where you can use the NPU, most LLMs aren’t currently in the right format. It’s definitely an imperfect situation. But it runs 70 Billion parameter LLMs (sufficiently quantized) that you couldn’t do with nVidia chips at a rational price, but you do need to be patient.

I’d love to have seen an M4 Ultra with all the memory bandwidth: with 512 GB RAM, presumably being able to use all the CPU cores, GPU cores and Neural Engine cores, it’s likely still memory bandwidth constrained. I would note: my laptop is still perfectly interactive at load, with only the 12 cores. I’d expect far more with one of the Mac Studio beasts.

We finally have a viable reason mere mortals could make effective use of 512 GB RAM machines: LLMs. Resource constraints of current hardware are the biggest reasons we can’t have a very user-friendly, natural human language interaction hybrid OS using LLMs to interact with humans, and the traditional older style OS as a super powerful traditional computer architecture device driver and terminal layer. The funny thing is with powerful enough LLMs, you can describe what you need, and they can create applications that run within the context of the LLM itself to do what you need, they’re just needing a little bit more access to the traditional OS to carry it out for the GUI. I know, because I’m doing that on my laptop: it’s not fast enough to run all LLMs locally at maximum efficiency for humans yet, but it does work, better than expected.
Musicians to lose Finale notation app after 35 years

anonconformist

August 2024

dtoub said:

chelgrian said:

It’s an almost certainty it couldn’t be open sourced code bases this age tend to have all sorts of copyright issues and rights holders involved and it can be next to impossible track down all the right holders and to get the license changes needed. There are two relatively successful instances I know of which are Blender and Staroffice (which became open/libreoffice) I can’t think of any other successes.

For similar reasons it may be impossible to release a ‘sunset’ edition it’s very probable that they have third party licensed code or libraries which require periodic fees.

It’s very stupid but copyright law can make it prohibitively expensive or far too hard to allow software like Finale to continue vs forcing a hard end date.

Good points, and makes sense.

Been trying the free version of Dorico, and having followed it for some time, I am very much aware it is better software overall (Finale was underwhelming over the past few years in terms of updates) and has a very responsive team of developers and managers. But it is painfully hard for me to conform to its way of doing things. Just like I found Numbers not very useful for me personally compared with Excel, or various databases compared with Access. When you get to know a particular application, especially one as feature-packed as either Finale or Dorico, it's quite hard to make the switch. Compounding matters: many of us have a lot of recent and older Finale files, and sometimes we do need to go back to them and tweak them or use them to record audio; converting all of them to MusicXML is not the ideal solution. So I likely will try a virtual machine and use that for many years to come. Not perfect either, but at least it's future-proofed.

Let’s say for the sake of argument all the code had no such rights and financial entanglements: I’ve worked with such codebases of that size and almost certain decrepitude of technical debt and circular dependencies. Such snarled code is a horrendous task to build each time, and each time you make a minor change, it’s likely to require a full or nearly full build due to such dependencies. And if you thought it takes a long time to build it, it also takes an absolutely huge amount of time and energy to make sense of it, first to even properly define a proper set of tests to verify changes don’t break things, and second, to restructure it to both build faster and be more easily understood. On top of that, it’s a scenario where you need someone properly versed in the problem domain of music notation as well as software developers that know how to translate such snarled messes systematically over time into an orderly system, bit by bit, without breaking it along the way, and not being able to reasonably add new features in the process that may take several years. Yes, you *could* try to add new features as you try to restructure it, but that multiplies the complexity of both the new features and the restructuring.

As large and complex as such a thing is, it may actually be easier to document how it is meant to function, and rewrite from scratch: maybe. That’s not without risks, because the best set of tests one can come up with will almost certainly miss things users have counted on, breaking their files. New crashing/hanging bugs may also be introduced due to an imperfect understanding. And something of that complexity, even with a complete set of tests (you hope!) won’t be fully functional for a very long time.
Maxed-out Apple Silicon Mac Pro costs 1/4 what a maxed Intel one did

anonconformist

June 2023

Perhaps the maxed out price of the new AS Mac Pro is 1/4 the price of the old Intel Mac Pro, but it also maxed out at a relatively paltry 192 GB RAM, but the old Mac Pro could have 1.5 TB RAM.

In the cases where you have large enough data sets, the new AS Mac Pro has selected itself out of the running. The SSD isn't nearly as fast as actually having RAM even in the best-case scenario. If all your data is streamed and processed linearly, the amount of RAM required tends to be lower, assuming you don't need to keep too many things streamed with a large enough context.

It's likely that with the kinds of data sets where it is larger than will fit in a new AS Mac Pro, it's not too feasible trying to partition the processing over multiple machines, so buying 4 of them isn't a bargain in that use-case.

Apple clearly is content with limiting their potential market for their halo Mac. This is a logical result.
Updated SwiftKey keyboard update gives access to Microsoft's new Bing AI inside any app

anonconformist

April 2023

JP234 said:

This seems on the face of it to be nothing more than a slightly more intrusive form of autocorrect.

If this provides full access to Bing Chat, that’s an absurdly grossly overly-simplistic extreme understatement of what can be done.

I’ve created a working console version (in Swift) of my personal invention of a form of Solitaire by describing the rules and requirements for how it behaves. That took several sequential prompts for the first attempt. It included as close to an “Attract Mode” of it playing itself as feasible in a console application with Swift not having a standard way to check if a key has been pressed without waiting for input. I prompted for the Attract Mode, but it alerted me to the limitations with explanation. I asked it to specially print the possible valid moves during the turns.

I didn’t write a single line of code to do this, I did this all in English. This is a solitaire game that has run on only my own iOS and MacOS devices in the past, implemented in Objective-C, but not released to the world because I hadn’t spent the time to make it nice and shiny.

So, to a limited extent, calling it autocorrect on steroids is partially true, but in any case, it has generative capabilities that far exceed anything you’ve used for autocorrect before. Hell, I’ve had it create song parodies in the style of various characters, human and otherwise, I’ve prompted it to create 4-person love songs for a fictional race with 4 genders, and it did it. Yesterday I had it create a parody of “Thriller” incorporating “The Hokey Pokey” without it using the original “Thriller” lyrics explicitly. And no, it’s not limited to 4 characters/voices in a song, I’ve had it do more.

Within certain intentional constraints for not offending people somehow (guardrails) and other legalities, while it has an upper limit for the number of tokens (roughly equal to words, not always) in the context, if it has been trained via data how to do something, you can have it do that. That includes teaching it via prompts what to do: for example, I didn’t like the code formatting style it generated, so I asked it first to reformat the Swift code according to the WebKit C++ standard that it looked up online, and it told me Swift has certain requirements for syntax that’d be affected, so after I prompted a bit different, and it neatly fulfilled that code formatting style while (not what I had in mind!) translating the Swift code into C++ code, which I had to clarify wasn’t what I wanted, so it retranslated back into Swift. There were still some formatting style rule changes I wanted implemented, as that wasn’t completely the style I wanted: it did it.

no autocorrect is that capable: it was following directions as specified in English to generate code and reformat it in a way I’d hope a human would be able to do, but it was faster, no typos, and filled in details I didn’t painstakingly explain as well as the ones I did.