A Powerbook G4 is barely fast enough to run a large language model

Jump to First Reply
Posted:
in Current Mac Hardware

A software developer has proven it is possible to run a modern LLM on old hardware like a 2005 PowerBook G4, albeit nowhere near the speeds expected by consumers.

Laptop with a terminal displaying text, set against a wallpaper of a city skyline reflecting on water with a colorful sunset sky.
A PowerBook G4 running a TinyStories 110M Llama2 LLM inference -- Image credit: Andrew Rossignol/TheResistorNetwork



Most artificial intelligence projects, such as the constant push for Apple Intelligence, leans on having a powerful enough device to handle queries locally. This has meant that newer computers and processors, such as the latest A-series chips in the iPhone 16 generation, tend to be used for AI applications, simply due to having enough performance for it to work.

In a blog post published on Monday by The Resistor Network, Andrew Rossignol -- the brother of Joe Rossignol at MacRumors -- writes about his work getting a modern large language model (LLM) to run on older hardware. What was available to him was a 2005 PowerBook G4, equipped with a 1.5GHz processor, a gigabyte of memory, and architecture limitations such as a 32-bit address space.

After checking out the llama2.c project to implement the Llama2 LLM inference with a single vanilla C file and no accelerators, Rossignol forked the core of the project to make some improvements.

Those improvements included wrappers for system functions, organizing the code into a library with a public API, and eventually porting the project to run on a PowerPC Mac. This latter point involved issues with the "big-endian" processor, where the model checkpoint and tokenizers instead expect the use of "little-endian" processors, referring to byte ordering systems.

Not exactly fast



The recommendation by the llama2.c project was to use the TinyStories model, which was used to maximize the chance of outputs without specialized hardware acceleration, such as a modern GPU. Testing was mostly done with the 15 million-parameter (15M) variant of the model, before switching to the 110M version, as anything higher would be too large for the address space.

The number of parameters used in a model can result in a more complex model, so the aim is to get as many as possible in use for it to generate an accurate response, without sacrificing the speed of the response. Given the severe constraints of the project, it was a case of choosing models that were small enough to be usable.

To compare the performance of the PowerBook G4 project, it was put against a single Intel Xeon Silver 4216 core clocked at 3.2GHz. The benchmark resulted in a test time for a query of 26.5 seconds and 6.91 tokens per second.

Running the same code on the PowerBook G4 worked, but at a much slower rate of 4 minutes, or nine times slower than the single Xeon core. With more optimizations, including using vector extensions like AltiVec, another half a minute was shaved off the inference operation, or making the PowerBook G4 just eight times slower.

It was found that the selected models were capable of producing "whimsical children's stories." This helped lighten the mood during debugging.

Beyond speed



It seems unlikely that there will be much more performance that could be squeezed out of the test hardware, due to limitations like its use of 32-bit and a maximum addressable memory of 4GB. While quantization could help, there's too little address space to be usable.

Admitting that the project probably stops at this point for the moment, Rossignol offers that the project "has been a great way to get my toes wet with LLMs and how they operate."

He also adds that "it is fairly impressive that a computer which is 15 years junior [to the Xeon] can do this at all."

This demonstration of older hardware running a modern LLM gives hope to users that their older hardware could be brought out of retirement and still be used with AI. However, while keeping in mind that the cutting edge software developments will run with limitations and a considerably slower speed than modern hardware.

Short of the discovery of extreme optimizations to minimize the processing requirements, those working on LLMs and AI in general will still have to keep buying more modern hardware for the task.

The latest M3 Ultra Mac Studio is a great, if extremely expensive, way to run massive LLMs locally. But for the hobbyist dabbling in the subject, tinkering with projects like the PowerBook G4 can still be rewarding.



Read on AppleInsider

Comments

  • Reply 1 of 4
    mknelsonmknelson Posts: 1,163member
    "The latest M3 Ultra Mac Studio is a great, if extremely expensive, way to run massive LLMs locally. But for the hobbyist dabbling in the subject, tinkering with projects like the PowerBook G4 can still be rewarding."

    Meanwhile, over at The Cult of Mac

    "
    The new Mac Studio with an M3 Ultra chip, which supports up to 512 GB of unified memory, is the easiest and cheapest way to run powerful, cutting-edge LLMs on your own hardware."
    https://www.cultofmac.com/news/mac-studio-ai-performance
    watto_cobra
     1Like 0Dislikes 0Informatives
  • Reply 2 of 4
    looplessloopless Posts: 353member
    mknelson said:
    "The latest M3 Ultra Mac Studio is a great, if extremely expensive, way to run massive LLMs locally. But for the hobbyist dabbling in the subject, tinkering with projects like the PowerBook G4 can still be rewarding."

    Meanwhile, over at The Cult of Mac

    "The new Mac Studio with an M3 Ultra chip, which supports up to 512 GB of unified memory, is the easiest and cheapest way to run powerful, cutting-edge LLMs on your own hardware."
    https://www.cultofmac.com/news/mac-studio-ai-performance
    Both statements are true.  To run LLMs locally you either need to invest in expensive and power-hungry GPU cards to plug into your PC OR get a 'loaded' Mac Studio . 

    Either option is not for the "hobbyist". The big advantage of the Mac Studio is low power usage and unified memory. It's a massive software PITA using GPUs with separate memory spaces.
    watto_cobra
     1Like 0Dislikes 0Informatives
  • Reply 3 of 4
    saareksaarek Posts: 1,612member
    I wonder how much better my PowerMac G5 would handle it. It’s a late 2005 Dual 2.3Ghz model with 16GB of Ram.
    watto_cobra
     1Like 0Dislikes 0Informatives
  • Reply 4 of 4
    programmerprogrammer Posts: 3,499member
    "Barely fast enough to run ..." -- this is a nonsense statement for any computation that doesn't have some sort of real-time constraint.  If you're willing to wait long enough, even a 680x0-based Mac could run an LLM... if it had enough storage.  The main limitation that will prevent the computation from running at all is storage capacity.  LLMs require a huge amount of state (gigabytes), and if that state doesn't fit in memory it has to be kept in offline storage (disks, flash memory, etc).  Moving it offline would make it many orders of magnitude slower, thus increasingly impractical.  Computers which have no means of accessing large storage couldn't run LLMs at all, but otherwise its just a matter of how long it'll take.  In practice a 680x0-based Mac would probably suffer a hardware failure before it delivered a useful result... but barring that, it would eventually produce an answer.
    watto_cobra
     1Like 0Dislikes 0Informatives
Sign In or Register to comment.