Nvidia begins work on first GPGPUs for Apple Macs

macronin · January 24, 2008 10:00PM

Quote:

Originally Posted by Booga

…it can accelerate phenomenally…

Funny you say that, as this first thing I thought of was Apple using this specific card to boost performance in the successor to Shake…

Quote:

Originally Posted by Mirlin11

… Sorry but I won't be spending $1,500 for a graphics card ANYTIME soon...

And the consumer market is NOT where nVidia is aiming this product… (duh)

Quote:

Originally Posted by gastroboy

Sounds useful, but will Apple buy it?

By the time it is available for the Mac, the Mac hardware and OSX should have completely transitioned to a dumbed down system for making photo albums of the kids' little league and sending prettified emails about how you just managed to make photo albums of the kids' little league.

To extend its usefulness will be video iChat so you can hold up the photo album to show nana what you just did.

To get Steve's attention though, Nvidia will have to work a lot harder to make this the world's thinnest graphics card.

Wow! Bitter much? Apple is a business, and the consumer sales are what drive that business. But I seriously don't see them forgetting about the professional DCC market anytime soon.

I would expect this card working in conjunction with the rework of Shake and announced at Siggraph this year… Pixar might get into the game and show a GPGPU-ized RenderMan also…

Sweet…!

;^p

melgross · January 24, 2008 11:08PM

Quote:

Originally Posted by hmurchison

GPU doing more processing? There goes my "Performance per Watt" metric.

And out goes the increasingly useless SPEC for desktop and workstation testing.

amorph · January 24, 2008 11:22PM

Quote:

Originally Posted by Splinemodel

It will take a marketing miracle for this to be a success: it was hard enough getting compilers to support altivec, which is a hell of a lot more seamless than this proposal.

When AltiVec came out, Apple didn't have the various Core libraries (including the abstracted SIMD library), which automatically distribute their load across available resources. If you use those you get GPGPU for free when your machine has it (after Apple supports it in their frameworks, obviously). If you need it to do custom work, just write a plugin. You would only have to drop down to writing specialized GPGPU code for boundary cases where Apple's libraries weren't efficient enough. Even their Son of Batch Processing task-based API can be updated to transparently support GPGPU.

This process might be a bit slower than some pros would like if the tech first appears on an optional upgrade card for the Mac Pro, but if it's as promising as it sounds it will propagate down the lineup and Apple will have a real incentive to build support into their various libraries.

connector · January 25, 2008 12:01AM

The real benefit of CUDA will be speed, accuracy, and usefulness. Currently Apple uses only OpenGL to access the GPU. Apple is using OpenGL for tasks it was not designed to do directly, like all the flavors of Quartz Composer such as Core Image. Now back in the day, the painfully slow progress of the PPC chip forced Apple to start hacking OpenGL to speed up image processing. It's given them an competitive edge that is really coming full circle. Now that the GPU makers are seeing people use their hardware this way they are finally designing GPUs for this.

With the new GPU hardware paradigm and the access CUDA will give we will start to see real time raytracing, faster than realtime H.264 encoding (maybe), and yes massive Audio processing. So with the more approachable development environment you will start to see applications really start to take advantage of the speed. It would provide a type of SSE on steroids. But to really make it shine I guess the real work will be in Apples hands, because they will have to extend Quartz Composer to accommodate both Nvidia's CUDA and ATI's Close To Metal (CTM). Maybe it will be a whole new Apple tech that will show up in 10.6 or sooner. If Apple unifies both techs and makes it even more accessible to developers it would be pure genius!

Long live Apple!

gastroboy · January 25, 2008 12:11AM

Quote:

Originally Posted by MacRonin

Apple is a business, and the consumer sales are what drive that business. But I seriously don't see them forgetting about the professional DCC market anytime soon.

Fine that Apple sells more computers especially to consumers which is the mass market, but NOT if it cripples them for its loyal professional base.

Every release seems to be a couple steps forward (if we are lucky) accompanied by entirely unnecessary steps backwards.

philipm · January 25, 2008 6:50AM

The idea of doing general purpose computation on a weird and wonderful processor designed for some special purpose like graphics is not new and has always in the past fizzled on the same issues:

Amdahl's Law -- the fancy stuff speeds up part of the computation hugely but not all of it; you have to add up everything to get the true speedup

iffy to program -- most of these things are very hard to code for, needing a lot of machine dependencies

poor compatibility with critical standards, e.g., some floating point results aren't exactly as you'd expect in a general-purpose processor and you have to correct for the resulting errors

The Amdahl's Law thing specially bites in a case where you have to go off the CPU chip to do the special work, because you lose on the inter-chip communication.

I am not so convinced this time around it's all that different despite the CUDA stuff which is designed to make programming easier.

If you could make normal computing 400x faster easily, you wouldn't adapt a GPU to this purpose, you'd just make a general purpose CPU 400x faster. No iffy problems as outlined as above.

2 general rules of computing progress arise from this:

people don't get smarter -- if something wasn't a good idea because the promise of the hardware didn't match the ability of programmers a few decades ago, it isn't now

special cases aren't "general-purpose" -- they are special cases. See the design principles which lead to RISC for why this territory is well understood and not worth exploring again

gastroboy · January 25, 2008 7:31AM

I think you have the same problem as you had for AltiVec, that Apple didn't build the technology into every new model.

When the technology starts there are relatively few computers that support it.

If, and only IF, Apple commits to make all computers include the technology, and there is a clear advantage to using it, then you get a growing base of computers that will benefit from the additional programming to implement it. But it still takes quite some time before the new technology forms even a sizable minority of computers in the market.

Apple is so unreliable with its support of even its own technology (eg ports, media, CPUs, SCSI, FW, GPUs, etc,) that the chances of the new technology having a viable long term pool of compatible computers to utilise is slim.

programmer · January 25, 2008 9:30AM

Its worth noting that the GPGPU hardware is just a standard nVidia 8800 series GPU, without the video display output. The port of the CUDA software to the Mac means that you should be able to run CUDA programs on any Mac with a 8800 (or later) based nVidia graphics card. The GPGPU boards are additional hardware to boost the computational capabilities of the machine even further.

Unfortunately CUDA is nVidia-specific, so until nVidia and ATI/AMD agree or Apple introduces an OS-provided standard that covers both this will never run on all Macs even if they have the latest GPUs. As observed above, Apple may transparently leverage this kind of technology via its various Core technologies and SIMD libraries which will benefit applications using those. This isn't as flexible as coding directly to CUDA, but until a better alternative to CUDA comes along then its better than nothing.

BTW, the main difference between a CPU and a GPU is that the CPU does one operation at a time to one piece of data at a time and is designed for these operations and their ordering to be as flexible as possible and for the data to be as freely organized as possible. GPUs, on the other hand, constrain what your data looks like and how it is organized and they do the same sequence of operations to as many data elements as possible at the same time. It is these tighter constraints that allow the GPU hardware guys to make more design assumptions, which allows them to optimize to faster computational speeds. The price of generality/flexibility is usually performance (and visa versa).

Over the past decade the lines have become increasingly blurry. GPUs are becoming vastly more flexible in what operations they can do, and in what order, and are improving in the flexibility for how they handle data (although it is all still strongly oriented toward large volumes of data and massively data parallel computations). CPUs have also become more capable of doing operations on more data at once (SIMD, in the form of AltiVec and SSE, and multiple cores). The future will be even more blurred as GPUs and CPUs are built into the same physical chips, and as the processing elements of GPUs become more and more flexible in how they operate, and as CPUs get more cores, better SIMD, and more memory bandwidth.

How to explain this to consumers is a really big problem for the marketeers, who usually don't even understand it themselves. The days of being able to say that "this machine is faster than that one" are gone or going fast -- in the future you'll have to say "this implementation of that particular algorithm with these tools, on that hardware, under these conditions, running that OS is faster than..." (well, that's what you'll say if you want to be truthful... otherwise everybody will just claim they are the fastest... and at something they will be). For the software developers this is an ever increasing nightmare of having too many poorly conceived and inadequate tools to efficiently develop software that is competitive, not to mention being stuck with a legacy of out dated non-concurrent code. The software world is going to have to mature significantly to keep up with the rapidly changing hardware.

melgross · January 25, 2008 11:26AM

Quote:

Originally Posted by philipm

The idea of doing general purpose computation on a weird and wonderful processor designed for some special purpose like graphics is not new and has always in the past fizzled on the same issues:

Amdahl's Law -- the fancy stuff speeds up part of the computation hugely but not all of it; you have to add up everything to get the true speedup

iffy to program -- most of these things are very hard to code for, needing a lot of machine dependencies

poor compatibility with critical standards, e.g., some floating point results aren't exactly as you'd expect in a general-purpose processor and you have to correct for the resulting errors

The Amdahl's Law thing specially bites in a case where you have to go off the CPU chip to do the special work, because you lose on the inter-chip communication.

I am not so convinced this time around it's all that different despite the CUDA stuff which is designed to make programming easier.

If you could make normal computing 400x faster easily, you wouldn't adapt a GPU to this purpose, you'd just make a general purpose CPU 400x faster. No iffy problems as outlined as above.

2 general rules of computing progress arise from this:

people don't get smarter -- if something wasn't a good idea because the promise of the hardware didn't match the ability of programmers a few decades ago, it isn't now

special cases aren't "general-purpose" -- they are special cases. See the design principles which lead to RISC for why this territory is well understood and not worth exploring again

Amdhal wasn't strictly talking about this. His "law" doesn't apply in a direct way. It's more closely related to using 8 cores over 2 cores.

jibbo · January 25, 2008 12:45PM

Quote:

Originally Posted by hmurchison

GPU doing more processing? There goes my "Performance per Watt" metric.

If the TDP of a 45nm Core 2 is 130W and the TDP of an GeForce 8800 GTX is 185W, then the GeForce only needs to perform around 45% faster than the CPU for its perf/Watt to be superior. Anything close to the "45 times faster" claimed by this article would imply a far superior perf/Watt.

hmurchison · January 25, 2008 1:34PM

Quote:

Originally Posted by jibbo

If the TDP of a 45nm Core 2 is 130W and the TDP of an GeForce 8800 GTX is 185W, then the GeForce only needs to perform around 45% faster than the CPU for its perf/Watt to be superior. Anything close to the "45 times faster" claimed by this article would imply a far superior perf/Watt.

True that. I'd love to see some turbocharged GPGPU applications. Informative first post. Welcome to the boards!

superkaratemonkeydeathcar · January 26, 2008 2:29AM

Quote:

Originally Posted by philipm

2 general rules of computing progress arise from this:

people don't get smarter -- if something wasn't a good idea because the promise of the hardware didn't match the ability of programmers a few decades ago, it isn't now

special cases aren't "general-purpose" -- they are special cases. See the design principles which lead to RISC for why this territory is well understood and not worth exploring again

Joseph Schumpeter and the recent trend in IQ scores disagree with you.

programmer · January 26, 2008 9:49AM

Quote:

Originally Posted by melgross

Amdhal wasn't strictly talking about this. His "law" doesn't apply in a direct way. It's more closely related to using 8 cores over 2 cores.

Sure it does. Amdhal's Law is extremely important in this case. If you have a process where 50% of the time is in a data parallel problem, and the other 50% is in a serial portion of the problem, then by throwing the data parallel problem at a GPU that can do those calcs a billion times faster... your program will run (at most) twice as fast.

programmer · January 26, 2008 9:49AM

Quote:

Originally Posted by superkaratemonkeydeathcar

Joseph Schumpeter and the recent trend in IQ scores disagree with you.

Either that or people are getting better at IQ tests.

tubgirl · January 26, 2008 2:50PM

Quote:

Originally Posted by Programmer

Its worth noting that the GPGPU hardware is just a standard nVidia 8800 series GPU, without the video display output. The port of the CUDA software to the Mac means that you should be able to run CUDA programs on any Mac with a 8800 (or later) based nVidia graphics card. The GPGPU boards are additional hardware to boost the computational capabilities of the machine even further.

actually, all geforce 8 series gpus are supported (at least on win/linux), so MBP users should also be able to use it.

http://www.nvidia.com/object/cuda_learn_products.html

programmer · January 26, 2008 4:21PM

Quote:

Originally Posted by tubgirl

actually, all geforce 8 series gpus are supported (at least on win/linux), so MBP users should also be able to use it.

http://www.nvidia.com/object/cuda_learn_products.html

Thanks. That's what I intended to mean by "8800 series"... I gave up following nVidia's product terminology years ago.

mcptz · January 26, 2008 4:49PM

Hi,

I've done both MIMD and SIMD programming as well as a little bit of GPGPU programming. GPGPU was pretty much a pain in the ass. I wonder if now it's a lot better or easier to program for, seems it probably is. Also C code is a huge improvement over when I did it in assembly.

So it is possible to get 45 times speed up, not just 45%, on an application over a standard one core cpu using a SIMD machine, especially one as powerful as an nvidia gpu processor.

Processing MRI scans seems like an image algorithm which can be very efficient on SIMD machines, similar to GPGPUs. However I don't know the details of course so I'll just be safe and say they probably meant 45%-415% for now. But to say 45-415 times faster wouldn't surprise me if proven true. I've written SIMD code which ends up around 5-10 times faster than on a modern one core cpu, however the performance per watt is astoundingly high, as each SIMD cpu is only 20 MHz, as compared with an intel processor in the GHz.

Amdahl's law is CRUCIAL for GPGPUs. It says you can only speed up the part which can be parallelized, while part of the algorithm cannot be sped up by going parallel because it is has to be done in order. Biggest example of a serial part of a program in the I/O.

So for a while Intel and AMD have been looking into SIMD co-processors to do certain algorithms very fast, not just pieces of the cpu already there, but think like 512 processing elements dedicated to running a thread which is designate as SIMD. It looked like they were going to implement it but maybe the existence of a GPGPU in every computer in a few years may trump this and why pay for an extra simd processor when you can just use a GPGPU? interesting.

programmer · January 26, 2008 7:03PM

Quote:

Originally Posted by MCPtz

Hi,

I've done both MIMD and SIMD programming as well as a little bit of GPGPU programming. GPGPU was pretty much a pain in the ass. I wonder if now it's a lot better or easier to program for, seems it probably is. Also C code is a huge improvement over when I did it in assembly.

"Easier to program" is a relative term. Yes, it is easier. No, its not easy.

Quote:

So it is possible to get 45 times speed up, not just 45%, on an application over a standard one core cpu using a SIMD machine, especially one as powerful as an nvidia gpu processor.

No, they mean 45 times faster. On data parallel problems this is entirely feasible.

Quote:

each SIMD cpu is only 20 MHz, as compared with an intel processor in the GHz.

I don't know what you were coding, but modern SIMD is epitomized by the IBM Cell processor. A main processor plus up to 8 on-chip 3+ GHz dual issue SIMD cores. 200+ GFLOPS performance, compared to quad core Intel Core 2 Duos in the ~50 GFLOPS ballpark... with much higher cost and power consumption. On scalar code the Intel CPUs win hands down. On SIMD code, the Cell crushes them.

Quote:

Biggest example of a serial part of a program in the I/O.

I/O isn't inherently serial (although the encoding for output might be), but it usually is a bandwidth restriction. Serial portions of a program are the algorithms that you just can't figure out how to parallelize (or replace with parallel equivalents).

Quote:

but maybe the existence of a GPGPU in every computer in a few years may trump this and why pay for an extra simd processor when you can just use a GPGPU? interesting.

Increasingly tight integration is what we're going to see. AMD's Fusion project is going to bring the GPU on-chip with the CPU. Intel's integrated GPUs are another example of tighter integration as well. The biggest performance bottleneck in GPGPU is usually moving data across the expansion bus (PCIe currently), so bringing it onto the motherboard has some advantages at the cost of losing modularity. System-on-chip is the logical move as we approach a billion transisters per die.

mcptz · January 26, 2008 8:02PM

First thanks for clarifying a few things for others. Definitely 'easier' not from an algorithm design standpoint, but for implementation certainly more simple than assembly. You're right about I/O, I was just thinking of having a file on a HDD but yea you're right.

I didn't know gpus were moving to the cpu, but then again as you say, it's obvious they're going there.

The SIMD I programmed for is a single board 512 processing element linear array(with wrap around). 2-d torus I think is the shape.

http://www.soe.ucsc.edu/projects/kestrel/

"system accelerates computational biology, computational chemistry, and other algorithms by factors of 20 to 40" than a 433 mhz, probably, sun workstation. Each cpu runs at 20 MHz.

I don't really think the cell is the epitome of SIMD. It can be programmed to be mimd and so many various things, it's very flexible which is good but at the same time difficult to master from its complexity. But of course it's still super fast, powerful, and very interesting.

Anyways I just saw their latest paper addresses why a have a SIMD over a gpgpu, however it hasn't been published yet but you can read their pdf on the webpage.

"We propose a model in which the CPU and the GPU are complemented by ?the third big chip?, a massively-parallel SIMD processor."

mrpiddly · January 26, 2008 9:36PM

There still needs to be improvements and more initiative before this type of technology starts appearing in consumer level products.

The question apple has to answer: Is there actually a market for this card considering the expenses they would have to go to? Its a bit more difficult then just sticking a T C870 into the mac pro. They may justify the expense through a long term strategy with this only being the starting point.

Nvidia begins work on first GPGPUs for Apple Macs

Comments