Apple-Nvidia collaboration triples speed of AI model production

Jump to First Reply
Posted:
in General Discussion

Apple's latest machine learning research could make creating models for Apple Intelligence faster, by coming up with a technique to almost triple the rate of generating tokens when using Nvidia GPUs.

Futuristic glowing green microchip and circuit board with intricate patterns and connectors.
Training models for machine learning is a processor-intensive task



One of the problems in creating large language models (LLMs) for tools and apps that offer AI-based functionality, such as Apple Intelligence, is inefficiencies in producing the LLMs in the first place. Training models for machine learning is a resource-intensive and slow process, which is often countered by buying more hardware and taking on increased energy costs.

Earlier in 2024, Apple published and open-sourced Recurrent Drafter, known as ReDrafter, a method of speculative decoding to improve performance in training. It used an RNN (Recurrent Neural Network) draft model combining beam search with dynamic tree attention for predicting and verifying draft tokens from multiple paths.

This sped up LLM token generation by up to 3.5 times per generation step versus typical auto-regressive token generation techniques.

In a post to Apple's Machine Learning Research site, it explained that alongside existing work using Apple Silicon, it didn't stop there. The new report published on Wednesday detailed how the team applied the research in creating ReDrafter to make it production-ready for use with Nvidia GPUs.

Nvidia GPUs are often employed in servers used for LLM generation, but the high-performance hardware often comes at a hefty cost. It's not uncommon for multi-GPU servers to cost in excess of $250,000 apiece for the hardware alone, let alone any required infrastructure or other connected costs.

Apple worked with Nvidia to integrate ReDrafter into the Nvidia TensorRT-LLM inference acceleration framework. Due to ReDrafter using operators that other speculative decoding methods didn't use, Nvidia had to add the extra elements for it to work.

With its integration, ML developers using Nvidia GPUs in their work can now use ReDrafter's accelerated token generation when using TensorRT-LLM for production, not just those using Apple Silicon.

The result, after benchmarking a tens-of-billions parameter production model on Nvidia GPUs, was a 2.7-times speed increase in generated tokens per second for greedy encoding.

The upshot is that the process could be used to minimize latency to users and reduce the amount of hardware required. In short, users could expect faster results from cloud-based queries, and companies could offer more while spending less.

In Nvidia's Technical Blog on the topic, the graphics card producer said the collaboration made TensorRT-LLM "more powerful and more flexible, enabling the LLM community to innovate more sophisticated models and easily deploy them."

The report's release follows after Apple publicly confirmed it was investigating the potential use of Amazon's Trainium2 chip to train models for use in Apple Intelligence features. At the time, it expected to see a 50% improvement in efficiency with pretraining using the chips over existing hardware.



Read on AppleInsider

Comments

  • Reply 1 of 12
    elijahgelijahg Posts: 2,887member
    So Apple is finally friendly with Nvidia again?
    Penziwatto_cobra
     2Likes 0Dislikes 0Informatives
  • Reply 2 of 12
    netroxnetrox Posts: 1,546member
    elijahg said:
    So Apple is finally friendly with Nvidia again?
    What makes you think Apple wasn't friendly with Nvidia? 
    michelb76watto_cobra
     2Likes 0Dislikes 0Informatives
  • Reply 3 of 12
    auxioauxio Posts: 2,783member
    netrox said:
    elijahg said:
    So Apple is finally friendly with Nvidia again?
    What makes you think Apple wasn't friendly with Nvidia? 
    https://www.cultofmac.com/news/nvidia-settles-class-action-lawsuit-over-macbook-pro-gpus
    blastdoormuthuk_vanalingamnetroxwatto_cobra
     3Likes 0Dislikes 1Informative
  • Reply 4 of 12
    melgrossmelgross Posts: 33,687member
    netrox said:
    elijahg said:
    So Apple is finally friendly with Nvidia again?
    What makes you think Apple wasn't friendly with Nvidia? 
    Good one! Apple has had a feud with Nvidia for a long time. It started with CUDA. Apple came up with their own software for that which was better. ATI embraced it, but Nvidia didn’t allow it in their GPUs. That was the first problem. The second was as linked to above, when Nvidia changed their production to different solder balls. It didn’t tell the manufacturer of the sub boards the chips were then soldered to. When the incompatible solder connections began to fail for all manufacturers, and we had two iMacs that failed because of this, Nvidia refused to take responsibility. Eventually they had to put $500,000,000 into an escrow account for manufacturers to dip into, but it wasn’t adequate.

    Apple hasn’t used Nvidia since. So this is interesting and somewhat surprising.
    muthuk_vanalingamentropysPenziForumPostwatto_cobrabyronl
     3Likes 0Dislikes 3Informatives
  • Reply 5 of 12
    elijahgelijahg Posts: 2,887member
    The reason Apple switched from ATI to Nvidia in the first place was because they fell out with ATI over them leaking a PowerBook with an ATI GPU in. Unfortunately that hate of Nvidia really screwed over the Mac owners from about 2012 onwards. We were stuck with crappy, hot, slow ATI GPUs when Nvidia was much much much faster. 

     https://www.zdnet.com/article/ati-on-apple-leak-our-fault/
    MacProbyronl
     2Likes 0Dislikes 0Informatives
  • Reply 6 of 12
    danoxdanox Posts: 3,659member
    Friendly? Apple will use them for as long as they have to in the back of house, Apple is not far from convergence with many of the Nvidia graphics cards, when will that convergence take place the M4 Studio Ultra? or will it be the M5 or M6 it isn’t that far away, look at the power signature required for one of the graphics cards made by Nvidia. Apple and Nvidia ain’t on the same path the power signature wattage is so far off. It’s ridiculous.

    https://www.pcguide.com/gpu/power-supply-rtx-4080/ 750 Watts system recommendation, and the back house Nvidia stuff it’s even more out there. The current M2 Ultra takes 107 watts for everything, and the M4 is even more powerful and efficient than that, let along what’s coming up with the M5 and M6.

    Currently the 4080 is about 3.3 times faster than the M2 Studio Ultra (Blender), it will be interesting to see how close the M4 Studio Ultra gets next year, we know it’ll use a hell of a lot less power to achieve Its performance.

    Apple Silicon is definitely powerful enough now, but for market inertia by the time of the M5 it will be indisputable.





    edited December 2024
    ForumPostwatto_cobra
     1Like 0Dislikes 1Informative
  • Reply 7 of 12
    danox said:
    Friendly? Apple will use them for as long as they have to in the back of house, Apple is not far from convergence with many of the Nvidia graphics cards, when will that convergence take place the M4 Studio Ultra? or will it be the M5 or M6 it isn’t that far away, look at the power signature required for one of the graphics cards made by Nvidia. Apple and Nvidia ain’t on the same path the power signature wattage is so far off. It’s ridiculous.

    https://www.pcguide.com/gpu/power-supply-rtx-4080/ 750 Watts system recommendation, and the back house Nvidia stuff it’s even more out there. The current M2 Ultra takes 107 watts for everything, and the M4 is even more powerful and efficient than that, let along what’s coming up with the M5 and M6.

    Currently the 4080 is about 3.3 times faster than the M2 Studio Ultra (Blender), it will be interesting to see how close the M4 Studio Ultra gets next year, we know it’ll use a hell of a lot less power to achieve Its performance.

    Apple Silicon is definitely powerful enough now, but for market inertia by the time of the M5 it will be indisputable.





    That's quite a stretch as the Nvidia cards are way faster at most things, and yes, of course they use much more power. They're also so much faster on LLM work, it's not even a comparison. The 4090 cards are purposely made slower, the have a cut circuit that unlocks even more performance. All Nvidia has to do is make a new board with the circuit connected and the 4090 will almost be twice as fast on a lot of important LLM calculations. Right now, they obviously don't want to cut into their datacenter market with a consumer card. Apple's M-series performance per watt, coupled with the shared memory is absolutely fantastic, but still quite behind on CUDA powered stuff, and once Nvidia increases RAM a bit (RAM is getting much less important for inference by the month), they'll stay on top very comfortably.
    byronl
     1Like 0Dislikes 0Informatives
  • Reply 8 of 12
    elijahgelijahg Posts: 2,887member
    danox said:
    Friendly? Apple will use them for as long as they have to in the back of house, Apple is not far from convergence with many of the Nvidia graphics cards, when will that convergence take place the M4 Studio Ultra? or will it be the M5 or M6 it isn’t that far away, look at the power signature required for one of the graphics cards made by Nvidia. Apple and Nvidia ain’t on the same path the power signature wattage is so far off. It’s ridiculous.

    https://www.pcguide.com/gpu/power-supply-rtx-4080/ 750 Watts system recommendation, and the back house Nvidia stuff it’s even more out there. The current M2 Ultra takes 107 watts for everything, and the M4 is even more powerful and efficient than that, let along what’s coming up with the M5 and M6.

    Currently the 4080 is about 3.3 times faster than the M2 Studio Ultra (Blender), it will be interesting to see how close the M4 Studio Ultra gets next year, we know it’ll use a hell of a lot less power to achieve Its performance.

    Apple Silicon is definitely powerful enough now, but for market inertia by the time of the M5 it will be indisputable.





    The Apple Silicon GPU benchmarks whilst incredible for an iGPU, are somewhat cherry picked. It will be interesting to see how much faster the Mx GPUs get, but I very much doubt - even with Apple's incredibly skilled engineers - that they will be anywhere near dedicated GPU performance. It's just physics. Massively less transistors means less performance, no matter how many clever optimisations there are. Power per watt, yes Apple rules the roost by far and likely always will.

    My 2019 iMac gets a Geekbench Metal score of about 3x less than the top end M2 Ultra, but it's 4 years older than the M2 Ultra, and it didn't cost £5200 + display. The Pro Vega 48 in my iMac was pretty sluggish compared to the equivalent Nvidia card at the time: getting 11,000 on 3DMark. The Nvidia RTX 2080 at the time was getting nearly double that. So that shows how Apple screwed over Mac users by refusing to use Nvidia. Nvidia also wrote their own Mac drivers, which were updated all the time and much better than the Apple-written ATI drivers. And shows that in the real world, right now, Apple Silicon GPUs are still a long way from reaching dedicated GPUs.
    byronl
     1Like 0Dislikes 0Informatives
  • Reply 9 of 12
    dewmedewme Posts: 5,966member
    The Nvidia GPU solder issue is what led to the premature death of my 2008 iMac. I think it would still be running if the video subsystem issue didn’t crop up after the AppleCare ended. Some owners came up with a scheme to reflow the solder connections by baking the video card in the oven for a certain amount of time at a certain temperature. It was successful for some folks but I don’t think it was ever a permanent fix. 
    watto_cobra
     1Like 0Dislikes 0Informatives
  • Reply 10 of 12
    elijahgelijahg Posts: 2,887member
    dewme said:
    The Nvidia GPU solder issue is what led to the premature death of my 2008 iMac. I think it would still be running if the video subsystem issue didn’t crop up after the AppleCare ended. Some owners came up with a scheme to reflow the solder connections by baking the video card in the oven for a certain amount of time at a certain temperature. It was successful for some folks but I don’t think it was ever a permanent fix. 
    I had a Mac Pro 1,1 at the time with a Nvidia Geforce 8800. That also suffered, but reflowing it in the oven fixed it until the power supply died, then the motherboard died sometime in 2011.
    dewmewatto_cobra
     0Likes 0Dislikes 2Informatives
  • Reply 11 of 12

    At this current rate, Apple seems to be about 2 years behind on dedicated graphics cards (since the M4 series is mostly compared to the RTX 4000 series), and the A-series chips are about 4 years behind the M series (since the A18 Pro is comparable to the M1). So if the math is correct, a modern high-end gaming pc is comparable to an iPhone in less than 10 years’ time. Although technology seems to be developing faster, and we are nearing the end of the silicon area of development, so who knows how quickly tech will be developed during the switch?

    ForumPostwatto_cobra
     2Likes 0Dislikes 0Informatives
  • Reply 12 of 12
    danoxdanox Posts: 3,659member
    If you examine the comment section, the gamer/Microsoft tech enthusiasts appear to be sensing/anticipating convergence. They are extremely upset about the performance of Apple Silicon M2 Studio Ultra. They seem to recognize that in the future, the significance of wattage, energy efficiency, and performance will outweigh the current approach of burning down the barn between Intel, AMD and Nvidia. (Nvidia will be the last holdout of course).

     www.youtube.com/watch?v=5dhuxRF2c_w (13:05) - Wattage used by Apple is on a different path compared to the PC world. 

    edited December 2024
    dewmeForumPostwatto_cobra
     1Like 0Dislikes 2Informatives
Sign In or Register to comment.