Apple-Nvidia collaboration triples speed of AI model production

AppleInsider · December 19, 2024 3:59PM

Apple's latest machine learning research could make creating models for Apple Intelligence faster, by coming up with a technique to almost triple the rate of generating tokens when using Nvidia GPUs.

Futuristic glowing green microchip and circuit board with intricate patterns and connectors.

Training models for machine learning is a processor-intensive task

One of the problems in creating large language models (LLMs) for tools and apps that offer AI-based functionality, such as Apple Intelligence, is inefficiencies in producing the LLMs in the first place. Training models for machine learning is a resource-intensive and slow process, which is often countered by buying more hardware and taking on increased energy costs.

Earlier in 2024, Apple published and open-sourced Recurrent Drafter, known as ReDrafter, a method of speculative decoding to improve performance in training. It used an RNN (Recurrent Neural Network) draft model combining beam search with dynamic tree attention for predicting and verifying draft tokens from multiple paths.

This sped up LLM token generation by up to 3.5 times per generation step versus typical auto-regressive token generation techniques.

In a post to Apple's Machine Learning Research site, it explained that alongside existing work using Apple Silicon, it didn't stop there. The new report published on Wednesday detailed how the team applied the research in creating ReDrafter to make it production-ready for use with Nvidia GPUs.

Nvidia GPUs are often employed in servers used for LLM generation, but the high-performance hardware often comes at a hefty cost. It's not uncommon for multi-GPU servers to cost in excess of $250,000 apiece for the hardware alone, let alone any required infrastructure or other connected costs.

Apple worked with Nvidia to integrate ReDrafter into the Nvidia TensorRT-LLM inference acceleration framework. Due to ReDrafter using operators that other speculative decoding methods didn't use, Nvidia had to add the extra elements for it to work.

With its integration, ML developers using Nvidia GPUs in their work can now use ReDrafter's accelerated token generation when using TensorRT-LLM for production, not just those using Apple Silicon.

The result, after benchmarking a tens-of-billions parameter production model on Nvidia GPUs, was a 2.7-times speed increase in generated tokens per second for greedy encoding.

The upshot is that the process could be used to minimize latency to users and reduce the amount of hardware required. In short, users could expect faster results from cloud-based queries, and companies could offer more while spending less.

In Nvidia's Technical Blog on the topic, the graphics card producer said the collaboration made TensorRT-LLM "more powerful and more flexible, enabling the LLM community to innovate more sophisticated models and easily deploy them."

The report's release follows after Apple publicly confirmed it was investigating the potential use of Amazon's Trainium2 chip to train models for use in Apple Intelligence features. At the time, it expected to see a 50% improvement in efficiency with pretraining using the chips over existing hardware.

Read on AppleInsider

elijahg · December 19, 2024 4:45PM

So Apple is finally friendly with Nvidia again?

netrox · December 19, 2024 5:06PM

elijahg said:

So Apple is finally friendly with Nvidia again?

What makes you think Apple wasn't friendly with Nvidia?

auxio · December 19, 2024 5:36PM

netrox said:

elijahg said:

So Apple is finally friendly with Nvidia again?

What makes you think Apple wasn't friendly with Nvidia?

https://www.cultofmac.com/news/nvidia-settles-class-action-lawsuit-over-macbook-pro-gpus

melgross · December 19, 2024 6:10PM

netrox said:

elijahg said:

So Apple is finally friendly with Nvidia again?

What makes you think Apple wasn't friendly with Nvidia?

Good one! Apple has had a feud with Nvidia for a long time. It started with CUDA. Apple came up with their own software for that which was better. ATI embraced it, but Nvidia didn’t allow it in their GPUs. That was the first problem. The second was as linked to above, when Nvidia changed their production to different solder balls. It didn’t tell the manufacturer of the sub boards the chips were then soldered to. When the incompatible solder connections began to fail for all manufacturers, and we had two iMacs that failed because of this, Nvidia refused to take responsibility. Eventually they had to put $500,000,000 into an escrow account for manufacturers to dip into, but it wasn’t adequate.

Apple hasn’t used Nvidia since. So this is interesting and somewhat surprising.

elijahg · December 19, 2024 7:15PM

The reason Apple switched from ATI to Nvidia in the first place was because they fell out with ATI over them leaking a PowerBook with an ATI GPU in. Unfortunately that hate of Nvidia really screwed over the Mac owners from about 2012 onwards. We were stuck with crappy, hot, slow ATI GPUs when Nvidia was much much much faster.

https://www.zdnet.com/article/ati-on-apple-leak-our-fault/

danox · December 19, 2024 11:44PM

Friendly? Apple will use them for as long as they have to in the back of house, Apple is not far from convergence with many of the Nvidia graphics cards, when will that convergence take place the M4 Studio Ultra? or will it be the M5 or M6 it isn’t that far away, look at the power signature required for one of the graphics cards made by Nvidia. Apple and Nvidia ain’t on the same path the power signature wattage is so far off. It’s ridiculous.

https://www.pcguide.com/gpu/power-supply-rtx-4080/ 750 Watts system recommendation, and the back house Nvidia stuff it’s even more out there. The current M2 Ultra takes 107 watts for everything, and the M4 is even more powerful and efficient than that, let along what’s coming up with the M5 and M6.

Currently the 4080 is about 3.3 times faster than the M2 Studio Ultra (Blender), it will be interesting to see how close the M4 Studio Ultra gets next year, we know it’ll use a hell of a lot less power to achieve Its performance.

Apple Silicon is definitely powerful enough now, but for market inertia by the time of the M5 it will be indisputable.

edited December 2024

michelb76 · December 20, 2024 9:03AM

danox said:

Friendly? Apple will use them for as long as they have to in the back of house, Apple is not far from convergence with many of the Nvidia graphics cards, when will that convergence take place the M4 Studio Ultra? or will it be the M5 or M6 it isn’t that far away, look at the power signature required for one of the graphics cards made by Nvidia. Apple and Nvidia ain’t on the same path the power signature wattage is so far off. It’s ridiculous.

https://www.pcguide.com/gpu/power-supply-rtx-4080/ 750 Watts system recommendation, and the back house Nvidia stuff it’s even more out there. The current M2 Ultra takes 107 watts for everything, and the M4 is even more powerful and efficient than that, let along what’s coming up with the M5 and M6.

Currently the 4080 is about 3.3 times faster than the M2 Studio Ultra (Blender), it will be interesting to see how close the M4 Studio Ultra gets next year, we know it’ll use a hell of a lot less power to achieve Its performance.

Apple Silicon is definitely powerful enough now, but for market inertia by the time of the M5 it will be indisputable.

That's quite a stretch as the Nvidia cards are way faster at most things, and yes, of course they use much more power. They're also so much faster on LLM work, it's not even a comparison. The 4090 cards are purposely made slower, the have a cut circuit that unlocks even more performance. All Nvidia has to do is make a new board with the circuit connected and the 4090 will almost be twice as fast on a lot of important LLM calculations. Right now, they obviously don't want to cut into their datacenter market with a consumer card. Apple's M-series performance per watt, coupled with the shared memory is absolutely fantastic, but still quite behind on CUDA powered stuff, and once Nvidia increases RAM a bit (RAM is getting much less important for inference by the month), they'll stay on top very comfortably.

elijahg · December 20, 2024 11:34AM

danox said:

Friendly? Apple will use them for as long as they have to in the back of house, Apple is not far from convergence with many of the Nvidia graphics cards, when will that convergence take place the M4 Studio Ultra? or will it be the M5 or M6 it isn’t that far away, look at the power signature required for one of the graphics cards made by Nvidia. Apple and Nvidia ain’t on the same path the power signature wattage is so far off. It’s ridiculous.

https://www.pcguide.com/gpu/power-supply-rtx-4080/ 750 Watts system recommendation, and the back house Nvidia stuff it’s even more out there. The current M2 Ultra takes 107 watts for everything, and the M4 is even more powerful and efficient than that, let along what’s coming up with the M5 and M6.

Currently the 4080 is about 3.3 times faster than the M2 Studio Ultra (Blender), it will be interesting to see how close the M4 Studio Ultra gets next year, we know it’ll use a hell of a lot less power to achieve Its performance.

Apple Silicon is definitely powerful enough now, but for market inertia by the time of the M5 it will be indisputable.

The Apple Silicon GPU benchmarks whilst incredible for an iGPU, are somewhat cherry picked. It will be interesting to see how much faster the Mx GPUs get, but I very much doubt - even with Apple's incredibly skilled engineers - that they will be anywhere near dedicated GPU performance. It's just physics. Massively less transistors means less performance, no matter how many clever optimisations there are. Power per watt, yes Apple rules the roost by far and likely always will.

My 2019 iMac gets a Geekbench Metal score of about 3x less than the top end M2 Ultra, but it's 4 years older than the M2 Ultra, and it didn't cost £5200 + display. The Pro Vega 48 in my iMac was pretty sluggish compared to the equivalent Nvidia card at the time: getting 11,000 on 3DMark. The Nvidia RTX 2080 at the time was getting nearly double that. So that shows how Apple screwed over Mac users by refusing to use Nvidia. Nvidia also wrote their own Mac drivers, which were updated all the time and much better than the Apple-written ATI drivers. And shows that in the real world, right now, Apple Silicon GPUs are still a long way from reaching dedicated GPUs.

dewme · December 20, 2024 12:56PM

The Nvidia GPU solder issue is what led to the premature death of my 2008 iMac. I think it would still be running if the video subsystem issue didn’t crop up after the AppleCare ended. Some owners came up with a scheme to reflow the solder connections by baking the video card in the oven for a certain amount of time at a certain temperature. It was successful for some folks but I don’t think it was ever a permanent fix.

elijahg · December 20, 2024 2:20PM

dewme said:

The Nvidia GPU solder issue is what led to the premature death of my 2008 iMac. I think it would still be running if the video subsystem issue didn’t crop up after the AppleCare ended. Some owners came up with a scheme to reflow the solder connections by baking the video card in the oven for a certain amount of time at a certain temperature. It was successful for some folks but I don’t think it was ever a permanent fix.

I had a Mac Pro 1,1 at the time with a Nvidia Geforce 8800. That also suffered, but reflowing it in the oven fixed it until the power supply died, then the motherboard died sometime in 2011.

apple4thewin · December 20, 2024 2:57PM

At this current rate, Apple seems to be about 2 years behind on dedicated graphics cards (since the M4 series is mostly compared to the RTX 4000 series), and the A-series chips are about 4 years behind the M series (since the A18 Pro is comparable to the M1). So if the math is correct, a modern high-end gaming pc is comparable to an iPhone in less than 10 years’ time. Although technology seems to be developing faster, and we are nearing the end of the silicon area of development, so who knows how quickly tech will be developed during the switch?

danox · December 20, 2024 5:42PM

If you examine the comment section, the gamer/Microsoft tech enthusiasts appear to be sensing/anticipating convergence. They are extremely upset about the performance of Apple Silicon M2 Studio Ultra. They seem to recognize that in the future, the significance of wattage, energy efficiency, and performance will outweigh the current approach of burning down the barn between Intel, AMD and Nvidia. (Nvidia will be the last holdout of course).

www.youtube.com/watch?v=5dhuxRF2c_w (13:05) - Wattage used by Apple is on a different path compared to the PC world.

https://www.amazon.com/s?k=RTX+4090&crid=TYD1FX14MM9T&geniuslink=true&linkId=80928326bec964484ef583373e58d8ab&sprefix=rtx+40,aps,405&tag=optim0f-20&ref=as_li_ss_tl Soon, the era of a single graphics card consuming 750-1000 Watts will come to an end.

edited December 2024

Apple-Nvidia collaboration triples speed of AI model production

Comments