Even though the Mac Studio with M3 Ultra seems like a great option for LLM usage and development, there is a big drawback in terms of cost.
But compared to what? How much would you have to spend to do the same job on a PC?
This was actually covered in the video. You can rent a ton of server time for the cost of a maxed out Mac Studio. So if you are just using it for development or general LLM usage then it really doesn't make sense financially. That said, the maker of the video also gave examples of why it would be worth it to pay to run an LLM locally. Specially when it has to do with privacy.
My question isn’t the cost to rent but the cost to buy.
If that was the intent of your questions then your questions were really poorly worded as buying didn't come up at all.
Anyway, to answer your question, you could build a PC that could to this cheeper than you could buy a Mac Studio. The big deal about Deepseek was that it ran on consumer hardware.
There are many versions and the 671 billion parameter version is not going to run on anything resembling a standard PC. So I think you just don't know.
Even though the Mac Studio with M3 Ultra seems like a great option for LLM usage and development, there is a big drawback in terms of cost.
But compared to what? How much would you have to spend to do the same job on a PC?
This was actually covered in the video. You can rent a ton of server time for the cost of a maxed out Mac Studio. So if you are just using it for development or general LLM usage then it really doesn't make sense financially. That said, the maker of the video also gave examples of why it would be worth it to pay to run an LLM locally. Specially when it has to do with privacy.
My question isn’t the cost to rent but the cost to buy.
If that was the intent of your questions then your questions were really poorly worded as buying didn't come up at all.
Anyway, to answer your question, you could build a PC that could to this cheeper than you could buy a Mac Studio. The big deal about Deepseek was that it ran on consumer hardware.
There are many versions and the 671 billion parameter version is not going to run on anything resembling a standard PC. So I think you just don't know.
So… when I said “this is covered in the video” I literally meant that is was covered in the video. All we have established here is that you haven’t watched the video.
At about 5 minutes and 30 seconds he says that building this with consumer PC hardware would be "quite expensive." I was looking for a fair bit more precision than that.
Why on earth would I want to run an AI model? Locally or otherwise?
I’m sure this was meant to be snarky, but for me it’s a genuine question: what are the envisioned real world use cases? What might a business (even a home one) use a local LLM for?
The article mentions a hospital in the context of patient privacy, but what would that model actually be *doing*?
In hospitals, AI models are reviewing patient scans to detect cancer:
It can be asked about medical issues like if there's a pain somewhere, what could it be and what treatments are available e.g 'What medicine is typically used to treat acid reflux?'.
In a clinical setting, a doctor would review the recommendations.
In business, they'd be better off using a custom AI model that is trained on high quality data. A legal company might train a model on past cases and they can quickly find similar cases to use as references.
Local models are usually more responsive (if the hardware is fast enough), don't get timeouts and you can save past prompts more easily. They would likely still be cloud-based so that all employees can access them from lightweight clients, just a company cloud server.
At about 5 minutes and 30 seconds he says that building this with consumer PC hardware would be "quite expensive." I was looking for a fair bit more precision than that.
Performance is 4 tokens/s. The video in the article mentioned the M3 Ultra was around 17 tokens/s.
This is one area where Nvidia and AMD are worse value and they do it on purpose because they make a lot of their revenue from this where they lower the memory in the consumer GPUs and charge a lot for enterprise GPUs with more memory that is needed for AI.
Why on earth would I want to run an AI model? Locally or otherwise?
I’m sure this was meant to be snarky, but for me it’s a genuine question: what are the envisioned real world use cases? What might a business (even a home one) use a local LLM for?
The article mentions a hospital in the context of patient privacy, but what would that model actually be *doing*?
In hospitals, AI models are reviewing patient scans to detect cancer:
It can be asked about medical issues like if there's a pain somewhere, what could it be and what treatments are available e.g 'What medicine is typically used to treat acid reflux?'.
In a clinical setting, a doctor would review the recommendations.
In business, they'd be better off using a custom AI model that is trained on high quality data. A legal company might train a model on past cases and they can quickly find similar cases to use as references.
Local models are usually more responsive (if the hardware is fast enough), don't get timeouts and you can save past prompts more easily. They would likely still be cloud-based so that all employees can access them from lightweight clients, just a company cloud server.
At about 5 minutes and 30 seconds he says that building this with consumer PC hardware would be "quite expensive." I was looking for a fair bit more precision than that.
Performance is 4 tokens/s. The video in the article mentioned the M3 Ultra was around 17 tokens/s.
This is one area where Nvidia and AMD are worse value and they do it on purpose because they make a lot of their revenue from this where they lower the memory in the consumer GPUs and charge a lot for enterprise GPUs with more memory that is needed for AI.
As an AI-developer myself, a M3 Ultra would be an incredibly stupid purchase. The machine would only be good for a very limited set of AI-models.
You'd be better off purchasing Digits for $3K, yes with 25% of the memory (128gb), and offload work to the cloud when needed, or chain two these machines for $6K. https://www.wired.com/story/nvidia-personal-supercomputer-ces/ It would perform much better. Not only memory should be taken into account, but also the entire ecosystem around AI development and performance across as well as internal storage and the type of chip and how it performs across models other than LLMs.
The M3 Ultra is a best for video, 3D and post-production.
As an AI-developer myself, a M3 Ultra would be an incredibly stupid purchase. The machine would only be good for a very limited set of AI-models.
You'd be better off purchasing Digits for $3K, yes with 25% of the memory (128gb), and offload work to the cloud when needed, or chain two these machines for $6K. https://www.wired.com/story/nvidia-personal-supercomputer-ces/ It would perform much better. Not only memory should be taken into account, but also the entire ecosystem around AI development and performance across as well as internal storage and the type of chip and how it performs across models other than LLMs.
The M3 Ultra is a best for video, 3D and post-production.
It sounds like Nvidia will have more powerful versions of that general concept, too.
It strikes me that the big weakness with the Apple silicon machines is that the GPU just isn’t beefy enough. Apple might be better able to address that if they have the GPU on a separate die, which sounds like the plan with M5 pro and higher.
As an AI-developer myself, a M3 Ultra would be an incredibly stupid purchase. The machine would only be good for a very limited set of AI-models.
You'd be better off purchasing Digits for $3K, yes with 25% of the memory (128gb), and offload work to the cloud when needed, or chain two these machines for $6K. https://www.wired.com/story/nvidia-personal-supercomputer-ces/ It would perform much better. Not only memory should be taken into account, but also the entire ecosystem around AI development and performance across as well as internal storage and the type of chip and how it performs across models other than LLMs.
The M3 Ultra is a best for video, 3D and post-production.
In my case, I’m not an “AI developer” but I want to do local inference for privacy/security reasons. But that’s not my only use for a computer. I benefit a lot from Apple’s powerful CPU cores so I wouldn’t want to give that up. So I like the idea of using a Studio to meet both needs, even if the Apple GPU is a little weak compared to Nvidia.
sadly for me, I can’t afford it now. DOGE has cut my income in half.
Why on earth would I want to run an AI model? Locally or otherwise?
I’m sure this was meant to be snarky, but for me it’s a genuine question: what are the envisioned real world use cases? What might a business (even a home one) use a local LLM for?
The article mentions a hospital in the context of patient privacy, but what would that model actually be *doing*?
I think the big thing is for creative tasks. There are a lot of uses for AI in creative industries, but we don't want our material input being used to output somewhere else. So all image generation stuff is helpful to do locally. You know, as an artist, your content isn't being fed into the online system for others to use, but you still gain the value of using AI tools. Another amazing use case is text-to-video generation. It is prohibitively expensive to use text-to-video generation online, and very little output for the price. The ability to do that locally would be game-changing from a business perspective. Well worth the money. Leading-edge text-to-video models cost 1k or more to have enough output to make them valuable for business. Also there are tons of other use cases.
Why on earth would I want to run an AI model? Locally or otherwise?
If you don't already know, perhaps you should pause on commenting in public until you've spent 15 seconds figuring it out.
Suffice it to say that many people have very good reasons to do this.
Such as?
I am obviously not one of those people, so am asking why. Your answer is not illuminating.
Are you retired? If so then don’t worry about it.
I am not. I work in enterprise software development. I have seen nothing significantly useful from this AI revolution so far, just a lot of fakery, deception and disruption of trust. These are unequivocally bad things to my mind.
So I ask again, why would I want to run an AI model? Locally or otherwise?
Why on earth would I want to run an AI model? Locally or otherwise?
If you don't already know, perhaps you should pause on commenting in public until you've spent 15 seconds figuring it out.
Suffice it to say that many people have very good reasons to do this.
Such as?
I am obviously not one of those people, so am asking why. Your answer is not illuminating.
Are you retired? If so then don’t worry about it.
I am not. I work in enterprise software development. I have seen nothing significantly useful from this AI revolution so far, just a lot of fakery, deception and disruption of trust. These are unequivocally bad things to my mind.
So I ask again, why would I want to run an AI model? Locally or otherwise?
It's probably best that you don't use them -- leave it to others. Just stick to your comfort zone.
Why on earth would I want to run an AI model? Locally or otherwise?
If you don't already know, perhaps you should pause on commenting in public until you've spent 15 seconds figuring it out.
Suffice it to say that many people have very good reasons to do this.
Such as?
I am obviously not one of those people, so am asking why. Your answer is not illuminating.
Are you retired? If so then don’t worry about it.
I am not. I work in enterprise software development. I have seen nothing significantly useful from this AI revolution so far, just a lot of fakery, deception and disruption of trust. These are unequivocally bad things to my mind.
So I ask again, why would I want to run an AI model? Locally or otherwise?
It's probably best that you don't use them -- leave it to others. Just stick to your comfort zone.
If you can’t think of any answers it’s perfectly fine to just say so.
Why on earth would I want to run an AI model? Locally or otherwise?
If you don't already know, perhaps you should pause on commenting in public until you've spent 15 seconds figuring it out.
Suffice it to say that many people have very good reasons to do this.
Such as?
I am obviously not one of those people, so am asking why. Your answer is not illuminating.
Are you retired? If so then don’t worry about it.
I am not. I work in enterprise software development. I have seen nothing significantly useful from this AI revolution so far, just a lot of fakery, deception and disruption of trust. These are unequivocally bad things to my mind.
So I ask again, why would I want to run an AI model? Locally or otherwise?
It's probably best that you don't use them -- leave it to others. Just stick to your comfort zone.
If you can’t think of any answers it’s perfectly fine to just say so.
I’ve a theory that Apple could sell this for much less and end up making more due to volume. Right now it’s very niche.
Would be great to see apple get the accolades it deserves as not only a performance per watt leader, but as an all-out straight up performance king.
It’s too bad they went with m3 ultra instead of 4. Would have been perfect timing to hurt Nvidias feelings.
Take a look at this link, the ultra M3 Mac Studio debuted at number 12 on the blender benchmark test 40 positions higher than the ultra M2 Mac Studio released one generation ago. Nvidia feelings will be hurt soon enough, and because they are a tech company they can see that convergence is probably only one generation away even the geeks on the tech sites are riled up because they can also see Apple’s trajectory. (Which is probably within a year)
What does that mean? It means that Apple’s hardware/software future is very bright. (The ultra M5 or M6 will be at the top of the chart within one or two generations?).
What this chart can’t show is the fact that the energy efficiency of the Apple Silicon chips are second to none at this time, most of the chips (Nvidia) featured on this list require 1000 watts or more, the Apple Silicon chips require less than 140 watts for everything CPU/SOC and GPU…. Users/Investors should look ahead. Apple is executing behind the scenes probably too much for the EU however who thinks Apple should share everything with their competition for the sake of fairness.
Comments
https://www.youtube.com/watch?v=Mur70YjInmI
This is image analysis rather than text but text models can be used for medicine. There's an online AI for free here:
https://duckduckgo.com/chat
It can be asked about medical issues like if there's a pain somewhere, what could it be and what treatments are available e.g 'What medicine is typically used to treat acid reflux?'.
In a clinical setting, a doctor would review the recommendations.
In business, they'd be better off using a custom AI model that is trained on high quality data. A legal company might train a model on past cases and they can quickly find similar cases to use as references.
Local models are usually more responsive (if the hardware is fast enough), don't get timeouts and you can save past prompts more easily. They would likely still be cloud-based so that all employees can access them from lightweight clients, just a company cloud server.
Specs are listed here:
https://geekbacon.com/2025/02/20/running-deepseek-r1-671b-locally-a-comprehensive-look/
It needs multiple 3090 or higher GPUs + 512GB RAM. There's a video here showing a $2000 setup but it only runs at 3 tokens/s:
https://www.youtube.com/watch?v=Tq_cmN4j2yY&t=2822s
Another uses an Nvidia 6000 that costs around $7k for the GPU:
https://www.youtube.com/watch?v=e-EG3B5Uj78&t=560s
https://www.newegg.com/pny-vcnrtx6000ada-pb/p/N82E16814133886
Performance is 4 tokens/s. The video in the article mentioned the M3 Ultra was around 17 tokens/s.
This is one area where Nvidia and AMD are worse value and they do it on purpose because they make a lot of their revenue from this where they lower the memory in the consumer GPUs and charge a lot for enterprise GPUs with more memory that is needed for AI.
This video tests Nvidia H100 GPUs x8 ($28k each - https://www.newegg.com/p/N82E16888892002 ), which gets 25 tokens/s:
https://www.youtube.com/watch?v=bOp9ggH4ztE&t=433s
If Nvidia sold a model of the H100 with 512GB of memory, it could probably compete with M3 Ultra but would cost more than $30k just for the GPU.
Applications that need lots of unified memory is where Apple's hardware design is very competitive and they knew this when designing it.
Suffice it to say that many people have very good reasons to do this.
You'd be better off purchasing Digits for $3K, yes with 25% of the memory (128gb), and offload work to the cloud when needed,
or chain two these machines for $6K. https://www.wired.com/story/nvidia-personal-supercomputer-ces/
It would perform much better. Not only memory should be taken into account, but also the entire ecosystem around AI development
and performance across as well as internal storage and the type of chip and how it performs across models other than LLMs.
The M3 Ultra is a best for video, 3D and post-production.
sadly for me, I can’t afford it now. DOGE has cut my income in half.
Not useful. As expected.
I am obviously not one of those people, so am asking why. Your answer is not illuminating.
So I ask again, why would I want to run an AI model? Locally or otherwise?
Take a look at this link, the ultra M3 Mac Studio debuted at number 12 on the blender benchmark test 40 positions higher than the ultra M2 Mac Studio released one generation ago. Nvidia feelings will be hurt soon enough, and because they are a tech company they can see that convergence is probably only one generation away even the geeks on the tech sites are riled up because they can also see Apple’s trajectory. (Which is probably within a year)
What does that mean? It means that Apple’s hardware/software future is very bright. (The ultra M5 or M6 will be at the top of the chart within one or two generations?).
What this chart can’t show is the fact that the energy efficiency of the Apple Silicon chips are second to none at this time, most of the chips (Nvidia) featured on this list require 1000 watts or more, the Apple Silicon chips require less than 140 watts for everything CPU/SOC and GPU…. Users/Investors should look ahead. Apple is executing behind the scenes probably too much for the EU however who thinks Apple should share everything with their competition for the sake of fairness.
https://opendata.blender.org/benchmarks/query/?compute_type=OPTIX&compute_type=CUDA&compute_type=HIP&compute_type=METAL&compute_type=ONEAPI&group_by=device_name&blender_version=4.3.0