Run ChatGPT-style AI on your Mac with OpenAI's new offline tools

Jump to First Reply
Posted:
in Current Mac Hardware

One of the two new open-weight models from OpenAI can bring ChatGPT-like reasoning to your Mac with no subscription needed.

OpenAI website screenshot on a tablet showing a menu with options and two blue gradient cards labeled gpt-oss-120b and gpt-oss-20b.
New models from OpenAI



On August 5, OpenAI launched two new large language models with publicly available weights: gpt-oss-20b and gpt-oss-120b. These are the first open-weight models from the company since GPT-2 in 2019.

Both are released under the Apache 2.0 license, which allows for free commercial use and modification. Sam Altman, CEO of OpenAI, described the smaller model as the best and most usable open model currently available.

Altman also said the new models deliver reasoning performance comparable to GPT-4o-mini and o3-mini. Each model is part of OpenAI's proprietary lineup.

The move follows growing pressure from the open-source AI community, particularly as models like Meta's LLaMA 3 and China's DeepSeek continue to gain attention. OpenAI's decision to release these models now is likely a response to that shift in momentum.

System requirements and Mac compatibility



OpenAI says the smaller 20 billion parameter model works well on devices with at least 16 gigabytes of unified memory or VRAM. That makes it viable on higher-end Apple Silicon Macs, such as those with M2 Pro, M3 Max, or higher configurations.

The company even highlights Apple Silicon support as a key use case for the 20b model. The larger 120 billion parameter model is a different story.

OpenAI recommends 60 GB to 80 GB of memory for the 120 billion parameter model, which puts it well outside the range of most consumer laptops or desktops. Only powerful GPU workstations or cloud setups can realistically handle it.

The 20b model can run well on many Apple and PC setups. The 120b model is better suited for researchers and engineers with access to specialized hardware.

Performance and developer options



The gpt-oss models support modern features like chain-of-thought reasoning, function calling, and code execution. Developers can fine-tune them, build tools on top of them, and run them without needing an internet connection.

Website screenshot displaying information about the gpt-oss-120b model, with options to follow and use the model, download statistics, and a text generation feature.
OpenAI model on HuggingFace



That customization opens new possibilities for privacy-focused apps, offline assistants, and custom AI workflows. OpenAI has provided reference implementations across several toolkits.

Developers can run the models using PyTorch, Transformers, Triton, vLLM, and Apple's Metal Performance Shaders. Support is also available in third-party tools like Ollama and LM Studio, which simplify model download, quantization, and interface setup.

Mac users can run the 20b model locally by using Apple's Metal system and the shared memory built into Apple Silicon. The model is already compressed using a special 4-bit format that helps it run faster and use less memory, without making the results worse.

It still takes a little technical work to set up, but tools like LM Studio or Ollama can help make that process easier. OpenAI has also released detailed model cards and sample prompts to help developers get started.

What it means for AI developers and Apple users



OpenAI's return to open-weight models is a significant shift. The 20b model offers strong performance for its size and can be used on a wide range of local hardware, including MacBooks and desktops with Apple Silicon.

The 20b model gives developers more freedom to build local AI tools without paying for API access or depending on cloud servers. Meanwhile, the 120b model shows what's possible at the high end but won't be practical for most users.

It may serve more as a research benchmark than a day-to-day tool. Even so, its availability under a permissive license is a major step for transparency and AI accessibility.

For Apple users, this release provides a glimpse of what powerful local AI can look like. With Apple pushing toward on-device intelligence in macOS and iOS, OpenAI's move fits a broader trend of local-first machine learning.



Read on AppleInsider

Comments

  • Reply 1 of 10
    blastdoorblastdoor Posts: 3,876member
    An m3 ultra Mac Studio could be configured with enough RAM to run the 120 billion parameter model. I wonder how it would perform 
     0Likes 0Dislikes 0Informatives
  • Reply 2 of 10
    racerhomie3racerhomie3 Posts: 1,267member
    This is amazing news!

    I really hope my next Mac is a 16GB one with 512GB of Storage
    williamlondon
     0Likes 1Dislike 0Informatives
  • Reply 3 of 10
    badmonkbadmonk Posts: 1,365member
    Thanks for the article, hoping AI provides a “how to” article for those of us who are interested in setting up the small parameter model.
     0Likes 0Dislikes 0Informatives
  • Reply 4 of 10
    Marvinmarvin Posts: 15,586moderator
    blastdoor said:
    An m3 ultra Mac Studio could be configured with enough RAM to run the 120 billion parameter model. I wonder how it would perform 
    It looks pretty fast in the video below on M3 Ultra with 512GB RAM, 65 tokens/s, it uses 95GB of memory:

     0Likes 0Dislikes 0Informatives
  • Reply 5 of 10
    tskwaratskwara Posts: 12member
    blastdoor said:
    An m3 ultra Mac Studio could be configured with enough RAM to run the 120 billion parameter model. I wonder how it would perform 
    It’s running really well on my M1 Ultra 20-core CPU, a 64-core GPU, and 128GB of unified memory.  A little speechless ATM.
     0Likes 0Dislikes 0Informatives
  • Reply 6 of 10
    blastdoorblastdoor Posts: 3,876member
    tskwara said:
    blastdoor said:
    An m3 ultra Mac Studio could be configured with enough RAM to run the 120 billion parameter model. I wonder how it would perform 
    It’s running really well on my M1 Ultra 20-core CPU, a 64-core GPU, and 128GB of unified memory.  A little speechless ATM.
    That's really awesome! 
     0Likes 0Dislikes 0Informatives
  • Reply 7 of 10
    tskwara said:
    blastdoor said:
    An m3 ultra Mac Studio could be configured with enough RAM to run the 120 billion parameter model. I wonder how it would perform 
    It’s running really well on my M1 Ultra 20-core CPU, a 64-core GPU, and 128GB of unified memory.  A little speechless ATM.
    Can I ask what you do with that LLM (that cannot be done online)?
     0Likes 0Dislikes 0Informatives
  • Reply 8 of 10
    tskwaratskwara Posts: 12member
    tskwara said:
    blastdoor said:
    An m3 ultra Mac Studio could be configured with enough RAM to run the 120 billion parameter model. I wonder how it would perform 
    It’s running really well on my M1 Ultra 20-core CPU, a 64-core GPU, and 128GB of unified memory.  A little speechless ATM.
    Can I ask what you do with that LLM (that cannot be done online)?
    Saving some expense by running less demanding AI tasks (i.e. sythesizing datasets for SME model tuning) locally is one thing.  It can be done online but does cost more.
    williamlondonmuthuk_vanalingam
     0Likes 1Dislike 1Informative
  • Reply 9 of 10
    I'm running the 120b version on a MacBook Pro with M3 Max processor (128GB) and it runs fine. The setup on Ollama takes less than two minutes is really simple (though downloading the LLM can take several minutes depending on the Internet speed).

    It runs very smoothly, definitely takes up about 60 for the App, but unlike running Llama3.3 (70b), the fan never kicks in. As much as I'd love to have a Studio Mac with an M3-Ultra, it is not necessary, as it works fine on a suitably equipped MBP.

     0Likes 0Dislikes 0Informatives
  • Reply 10 of 10
    programmerprogrammer Posts: 3,504member
    tskwara said:
    blastdoor said:
    An m3 ultra Mac Studio could be configured with enough RAM to run the 120 billion parameter model. I wonder how it would perform 
    It’s running really well on my M1 Ultra 20-core CPU, a 64-core GPU, and 128GB of unified memory.  A little speechless ATM.
    Can I ask what you do with that LLM (that cannot be done online)?
    Avoiding sharing all your good ideas?
     0Likes 0Dislikes 0Informatives
Sign In or Register to comment.