|
|||||||
| Register | Members List | New Posts | Mark Forums Read |
![]() |
|
|
Thread Tools | Display Modes |
|
|
#1 |
|
Kasper's Automated Slave
Join Date: Nov 1997
Posts: 6,151
|
Nvidia working on first GPGPUs for Apple Macs
Graphics chipmaker Nvidia Corp. is in the early developmental stages of its first Mac-bound GPGPUs, AppleInsider has learned.
Short for general-purpose computing on graphics processing units, GPGPUs are a new wave of graphics processors that can be instructed to perform computations previously reserved only for a system's primary CPU, allowing them aid in the speed of non graphics related applications. The technology -- in Nvidia's case -- leverages a proprietary architecture called CUDA, which is short for Compute Unified Device Architecture. It's currently compatible with the company's new GeForce 8 Series of graphics cards, allowing developers to use the C programming language to write algorithms for execution on the GPU. GPGPUs have proven most beneficial in applications requiring intense number crunching, examples of which include high-performance computer clusters, raytracing, scientific computing applications, database operations, cryptography, physics-based simulation engines, and video, audio and digital image processing. It's likely that the first Mac-comptaible GPGPUs would turn up as build-to-order options for Apple's Mac Pro workstations due to their ability to aid digital video and audio professionals in sound effects processing, video decoding and post processing. Precisely when those cards will crop up is unclear, though Nividia through its Santa Clara, Calif.-based offices this week put out an urgent call for a full time staffer to help design and implement kernel level Mac OS X drivers for the cards. Nvidia's $1500 Tesla graphics and computing hybrid card released in June is the chipmaker's first chipset explicitly built for both graphics and high intensity, general-purpose computing. Programs based on the CUDA architecture can not only tap its 3D performance but also repurpose the shader processors for advanced math. The massively parallel nature leads to tremendous gains in performance compared to regular CPUs, NVIDIA claims. In science applications, calculations have seen speed boosts from a 45 times to as much as 415 times in processing MRI scans for hospitals. Increases such as this can mean the difference between using a single system and a whole computer cluster to do the same work, the company says. |
|
|
|
|
|
#2 |
|
Registered User
Join Date: Sep 2003
Posts: 334
|
It's only a matter of time before apple slaps a processor in our keyboards and mice as well. :P
- Xidius |
|
|
|
|
|
#3 |
|
Registered User
Join Date: Oct 2006
Posts: 37
|
Sweet
I'll take one as soon as they are ready
|
|
|
|
|
|
#4 |
|
Registered User
Join Date: Apr 2005
Posts: 24
|
Hmmm GPGPUs, catchy name! But seriously, sounds confusing!
Will software need to be modified to leverage the power of these processors? I'm very confused at the role of the GPU/CPU, they seem to be crossing paths more and more! Someone, tell me, what does the GPU do, and what does the CPU do, and why are they better for their separate tasks. |
|
|
|
|
|
#5 |
|
Registered User
Join Date: Jun 2006
Location: Washington, DC
Posts: 302
|
As GUIs get more processor heavy, i.e. RI, and people realize that computers are useful for much more than gaming, I think these will be a great asset, especially for a companies with software-hardware integration like Apple.
MacBook Pro C2D 2.4GHz and a battle-scarred PowerBook G4 1.33GHz
"When you gaze long into a dead pixel, the dead pixes gazes also into you" |
|
|
|
|
|
#6 | |
|
Registered User
Join Date: Jul 2005
Posts: 157
|
Quote:
I think an 8 core Mac Pro IS a cluster. ![]() |
|
|
|
|
|
|
#7 |
|
Registered User
Join Date: Mar 2005
Posts: 366
|
Well, you think wrong then.
"A computer cluster is a group of loosely coupled computers that work together closely so that in many respects they can be viewed as though they are a single computer. The components of a cluster are commonly, but not always, connected to each other through fast local area networks." From The Fount of all Knowledge. |
|
|
|
|
|
#8 | |
|
Registered User
Join Date: Jun 2003
Location: Tinton Falls, NJ
Posts: 702
|
Quote:
The newer GPGPUs attempt to compensate for some of these issues, but the areas where this computation is applicable tend to be pretty specific. |
|
|
|
|
|
|
#9 |
|
Registered User
Join Date: Jun 2003
Location: Tinton Falls, NJ
Posts: 702
|
A great way to prototype this stuff is to boot up Quartz Composer in the Leopard devtools and try some GLSL shaders. It abstracts away the CPU/GPU issues nicer than any other tool I've seen.
|
|
|
|
|
|
#10 | |
|
Registered User
Join Date: Jan 2008
Location: On stage
Posts: 3
|
Quote:
??? (KIDDING) This is a very promising development. When it migrates down to the point where us mere mortals can afford it, time will tell.... Sorry but I won't be spending $1,500 for a graphics card ANYTIME soon... |
|
|
|
|
|
|
#11 | |
|
Registered User
Join Date: Feb 2007
Location: Leverkusen, Germany
Posts: 3
|
Ummm....
Quote:
|
|
|
|
|
|
|
#12 |
|
Registered User
Join Date: Apr 2002
Location: No GPS signal.
Posts: 1,169
|
I like this idea better than PhysX hardware. Let PhysX run on GPGPU instead.
nagromme
Would you like a treatment? |
|
|
|
|
|
#13 |
|
Registered User
Join Date: Jun 2005
Location: Philadelphia
Posts: 472
|
"calculations have seen speed boosts from a 45 times to as much as 415 times"
Do they really mean 45-415%? Or, are we really talking about a 4500 to 41500% increase? If so, that is absolutely unbelievable! |
|
|
|
|
|
#14 |
|
Registered User
Join Date: Jan 2006
Posts: 1,395
|
I'm really curious how these things might work for audio - audio companies have been making DSP cards for years, but they are usually extremely expensive and CPUs have become a better bang for the buck.
Anyone know if any audio apps are taking advantage of things like this yet? |
|
|
|
|
|
#15 |
|
Global Moderator
Join Date: Sep 2004
Location: NYC
Posts: 19,612
|
It's an interesting development, but it looks to be pretty expensive. I imagine that's the only reason why they'd do it for the few Macs that can use it.
Since this would be useless without the OS recognizing it for what it is, I wonder if the reason why they're doing it, is because Apple is somehow involved. If Apple didn't support it, it wouldn't be useful. |
|
|
|
|
|
#16 | |
|
Global Moderator
Join Date: Sep 2004
Location: NYC
Posts: 19,612
|
Quote:
|
|
|
|
|
|
|
#17 | |||
|
Registered User
Join Date: Jan 2007
Location: Vienna, VA
Posts: 214
|
This is really good news. Apple has been offloading processing into GPUs for some time, but so far, I think it's only been for certain graphic-oriented frameworks (like CoreImage) and some special-purpose apps (like Motion).
With this announcement, we may see more generalized frameworks (possibly integrated into Accelerate, CoreAudio, and other frameworks) to take advantage of the immense about of GPU power that goes unused most of the time. Quote:
Of course, you probably won't be able to take advantage of some features without using new APIs, which will obviously require updates to applications. Quote:
A modern GPU is really a high-end math coprocessor. Things like matrix arithmetic across large data sets, fourier transforms, and other features useful for image processing are built-in. Many features are highly specialized, like various 3D texture mapping abilities, anti-aliasing, compositing objects, etc. For quite a while now, it has been possible to upload code into the GPU for execution. Typically, this is to get better video performance, but it can be done for other purposes as well. The source and target memory of an operation doesn't have to be on-screen - it can be anywhere in the video card's memory. Of course, this is only useful if your application needs the kinds of operations the GPU can provide. Audio processing is one such operation. I can think of a few other possibilities. The big deal about NVidia's CUDA architecture (if I understand the article right) is that it will now be possible to use the GPU for a much wider range of processing tasks than just those that can benefit from image-processing algorithms. Quote:
I know Motion (part of the Final Cut Studio suite) requires a powerful GPU, because it offloads tons of rendering work into it. I think the other parts of FCS (including Soundtrack Pro) will use GPU power, if its available, but I'm not certain of that. |
|||
|
|
|
|
|
#18 |
|
Banned
Join Date: Aug 2003
Posts: 526
|
Sounds useful, but will Apple buy it?
By the time it is available for the Mac, the Mac hardware and OSX should have completely transitioned to a dumbed down system for making photo albums of the kids' little league and sending prettified emails about how you just managed to make photo albums of the kids' little league. To extend its usefulness will be video iChat so you can hold up the photo album to show nana what you just did. To get Steve's attention though, Nvidia will have to work a lot harder to make this the world's thinnest graphics card. Last edited by gastroboy; 01-24-2008 at 08:38 PM.. |
|
|
|
|
|
#19 |
|
Global Moderator
Join Date: Nov 2001
Location: Seattle, WA
Posts: 10,457
|
GPU doing more processing? There goes my "Performance per Watt" metric.
![]() |
|
|
|
|
|
#20 |
|
Registered User
Join Date: Jan 2002
Location: Silicon Valley
Posts: 7,033
|
It will take a marketing miracle for this to be a success: it was hard enough getting compilers to support altivec, which is a hell of a lot more seamless than this proposal.
Even so, I wholeheartedly support the effort. Superscalar CPUs are often inefficient for many kinds of modern computing. However, I see IBM's Cell as a much better paradigm than this is, and certainly one that's more likely to have an impact on the future of computing. Board-to-board data flow is supposed to be going away, not coming back.
Cat: the other white meat
|
|
|
|
|
|
#21 |
|
Registered User
Join Date: Nov 2004
Location: Northwest
Posts: 2,695
|
I mentioned this a few months ago with the release of the AMD/ATi equivalent and got a resounding yawn from all but a few.
Now the AI speaks and you are in awe? These cards won't be released until UEFI is standard for nVidia and AMD/ATi. |
|
|
|
|
|
#22 | ||
|
Registered User
Join Date: Jun 2002
Location: Wherever your mama is…
Posts: 919
|
Funny you say that, as this first thing I thought of was Apple using this specific card to boost performance in the successor to Shake…
Quote:
Quote:
I would expect this card working in conjunction with the rework of Shake and announced at Siggraph this year… Pixar might get into the game and show a GPGPU-ized RenderMan also… Sweet…! ;^p
The most beautiful thing we can experience is the mysterious.
It is the source of all true art and science. Albert Einstein |
||
|
|
|
|
|
#23 |
|
Global Moderator
Join Date: Sep 2004
Location: NYC
Posts: 19,612
|
|
|
|
|
|
|
#24 | |
|
Moderator Emeritus
Join Date: Nov 2001
Location: Iowa City
Posts: 6,811
|
Quote:
This process might be a bit slower than some pros would like if the tech first appears on an optional upgrade card for the Mac Pro, but if it's as promising as it sounds it will propagate down the lineup and Apple will have a real incentive to build support into their various libraries.
"...within intervention's distance of the embassy." - CvB
Original music: The Mayflies - Black earth Americana. Now on iTMS! Becca Sutlive - Iowa Fried Rock 'n Roll - now on iTMS! |
|
|
|
|
|
|
#25 |
|
Registered User
Join Date: Jan 2008
Posts: 6
|
CUDA = No more OpenGL hacks.
The real benefit of CUDA will be speed, accuracy, and usefulness. Currently Apple uses only OpenGL to access the GPU. Apple is using OpenGL for tasks it was not designed to do directly, like all the flavors of Quartz Composer such as Core Image. Now back in the day, the painfully slow progress of the PPC chip forced Apple to start hacking OpenGL to speed up image processing. It's given them an competitive edge that is really coming full circle. Now that the GPU makers are seeing people use their hardware this way they are finally designing GPUs for this.
With the new GPU hardware paradigm and the access CUDA will give we will start to see real time raytracing, faster than realtime H.264 encoding (maybe), and yes massive Audio processing. So with the more approachable development environment you will start to see applications really start to take advantage of the speed. It would provide a type of SSE on steroids. But to really make it shine I guess the real work will be in Apples hands, because they will have to extend Quartz Composer to accommodate both Nvidia's CUDA and ATI's Close To Metal (CTM). Maybe it will be a whole new Apple tech that will show up in 10.6 or sooner. If Apple unifies both techs and makes it even more accessible to developers it would be pure genius! Long live Apple! Last edited by connector; 01-25-2008 at 01:02 AM.. Reason: Forgot the title... |
|
|
|
|
|
#26 | |
|
Banned
Join Date: Aug 2003
Posts: 526
|
Quote:
Every release seems to be a couple steps forward (if we are lucky) accompanied by entirely unnecessary steps backwards. |
|
|
|
|
|
|
#27 |
|
Registered User
Join Date: Apr 2006
Location: Brisbane, Australia
Posts: 72
|
GPGPUs not so novel
The idea of doing general purpose computation on a weird and wonderful processor designed for some special purpose like graphics is not new and has always in the past fizzled on the same issues:
The Amdahl's Law thing specially bites in a case where you have to go off the CPU chip to do the special work, because you lose on the inter-chip communication. I am not so convinced this time around it's all that different despite the CUDA stuff which is designed to make programming easier. If you could make normal computing 400x faster easily, you wouldn't adapt a GPU to this purpose, you'd just make a general purpose CPU 400x faster. No iffy problems as outlined as above. 2 general rules of computing progress arise from this:
Philip Machanick creator of Opinionations
Institute for Molecular Bioscience, University of Queensland, Australia |
|
|
|
|
|
#28 |
|
Banned
Join Date: Aug 2003
Posts: 526
|
I think you have the same problem as you had for AltiVec, that Apple didn't build the technology into every new model.
When the technology starts there are relatively few computers that support it. If, and only IF, Apple commits to make all computers include the technology, and there is a clear advantage to using it, then you get a growing base of computers that will benefit from the additional programming to implement it. But it still takes quite some time before the new technology forms even a sizable minority of computers in the market. Apple is so unreliable with its support of even its own technology (eg ports, media, CPUs, SCSI, FW, GPUs, etc,) that the chances of the new technology having a viable long term pool of compatible computers to utilise is slim. |
|
|
|
|
|
#29 |
|
Registered User
Join Date: Dec 2001
Location: The International Center for Rumour Control
Posts: 3,133
|
Its worth noting that the GPGPU hardware is just a standard nVidia 8800 series GPU, without the video display output. The port of the CUDA software to the Mac means that you should be able to run CUDA programs on any Mac with a 8800 (or later) based nVidia graphics card. The GPGPU boards are additional hardware to boost the computational capabilities of the machine even further.
Unfortunately CUDA is nVidia-specific, so until nVidia and ATI/AMD agree or Apple introduces an OS-provided standard that covers both this will never run on all Macs even if they have the latest GPUs. As observed above, Apple may transparently leverage this kind of technology via its various Core technologies and SIMD libraries which will benefit applications using those. This isn't as flexible as coding directly to CUDA, but until a better alternative to CUDA comes along then its better than nothing. BTW, the main difference between a CPU and a GPU is that the CPU does one operation at a time to one piece of data at a time and is designed for these operations and their ordering to be as flexible as possible and for the data to be as freely organized as possible. GPUs, on the other hand, constrain what your data looks like and how it is organized and they do the same sequence of operations to as many data elements as possible at the same time. It is these tighter constraints that allow the GPU hardware guys to make more design assumptions, which allows them to optimize to faster computational speeds. The price of generality/flexibility is usually performance (and visa versa). Over the past decade the lines have become increasingly blurry. GPUs are becoming vastly more flexible in what operations they can do, and in what order, and are improving in the flexibility for how they handle data (although it is all still strongly oriented toward large volumes of data and massively data parallel computations). CPUs have also become more capable of doing operations on more data at once (SIMD, in the form of AltiVec and SSE, and multiple cores). The future will be even more blurred as GPUs and CPUs are built into the same physical chips, and as the processing elements of GPUs become more and more flexible in how they operate, and as CPUs get more cores, better SIMD, and more memory bandwidth. How to explain this to consumers is a really big problem for the marketeers, who usually don't even understand it themselves. The days of being able to say that "this machine is faster than that one" are gone or going fast -- in the future you'll have to say "this implementation of that particular algorithm with these tools, on that hardware, under these conditions, running that OS is faster than..." (well, that's what you'll say if you want to be truthful... otherwise everybody will just claim they are the fastest... and at something they will be). For the software developers this is an ever increasing nightmare of having too many poorly conceived and inadequate tools to efficiently develop software that is competitive, not to mention being stuck with a legacy of out dated non-concurrent code. The software world is going to have to mature significantly to keep up with the rapidly changing hardware.
Providing grist for the rumour mill.
|
|
|
|
|
|
#30 | |
|
Global Moderator
Join Date: Sep 2004
Location: NYC
Posts: 19,612
|
Quote:
|
|
|
|
|
|
|
#31 |
|
Registered User
Join Date: Jan 2008
Posts: 2
|
If the TDP of a 45nm Core 2 is 130W and the TDP of an GeForce 8800 GTX is 185W, then the GeForce only needs to perform around 45% faster than the CPU for its perf/Watt to be superior. Anything close to the "45 times faster" claimed by this article would imply a far superior perf/Watt.
|
|
|
|
|
|
#32 | |
|
Global Moderator
Join Date: Nov 2001
Location: Seattle, WA
Posts: 10,457
|
Quote:
|
|
|
|
|
|
|
#33 | |
|
Registered User
Join Date: Jun 2006
Location: NYC Area
Posts: 52
|
Quote:
-but Jimmy has fear? A thousand times no. I never doubted myself for a minute for I knew that my monkey strong bowels were girded with strength like the loins of a dragon ribboned with fat and the opulence of buffalo... dung.
|
|
|
|
|
|
|
#34 |
|
Registered User
Join Date: Dec 2001
Location: The International Center for Rumour Control
Posts: 3,133
|
Sure it does. Amdhal's Law is extremely important in this case. If you have a process where 50% of the time is in a data parallel problem, and the other 50% is in a serial portion of the problem, then by throwing the data parallel problem at a GPU that can do those calcs a billion times faster... your program will run (at most) twice as fast.
Providing grist for the rumour mill.
|
|
|
|
|
|
#35 |
|
Registered User
Join Date: Dec 2001
Location: The International Center for Rumour Control
Posts: 3,133
|
Either that or people are getting better at IQ tests.
Providing grist for the rumour mill.
|
|
|
|
|
|
#36 | |
|
Registered User
Join Date: Jun 2005
Posts: 170
|
Quote:
http://www.nvidia.com/object/cuda_learn_products.html
born to lose, live to win
|
|
|
|
|
|
|
#37 | |
|
Registered User
Join Date: Dec 2001
Location: The International Center for Rumour Control
Posts: 3,133
|
Quote:
![]()
Providing grist for the rumour mill.
|
|
|
|
|
|
|
#38 |
|
Registered User
Join Date: Jan 2008
Posts: 2
|
Hi,
I've done both MIMD and SIMD programming as well as a little bit of GPGPU programming. GPGPU was pretty much a pain in the ass. I wonder if now it's a lot better or easier to program for, seems it probably is. Also C code is a huge improvement over when I did it in assembly. So it is possible to get 45 times speed up, not just 45%, on an application over a standard one core cpu using a SIMD machine, especially one as powerful as an nvidia gpu processor. Processing MRI scans seems like an image algorithm which can be very efficient on SIMD machines, similar to GPGPUs. However I don't know the details of course so I'll just be safe and say they probably meant 45%-415% for now. But to say 45-415 times faster wouldn't surprise me if proven true. I've written SIMD code which ends up around 5-10 times faster than on a modern one core cpu, however the performance per watt is astoundingly high, as each SIMD cpu is only 20 MHz, as compared with an intel processor in the GHz. Amdahl's law is CRUCIAL for GPGPUs. It says you can only speed up the part which can be parallelized, while part of the algorithm cannot be sped up by going parallel because it is has to be done in order. Biggest example of a serial part of a program in the I/O. So for a while Intel and AMD have been looking into SIMD co-processors to do certain algorithms very fast, not just pieces of the cpu already there, but think like 512 processing elements dedicated to running a thread which is designate as SIMD. It looked like they were going to implement it but maybe the existence of a GPGPU in every computer in a few years may trump this and why pay for an extra simd processor when you can just use a GPGPU? interesting. Last edited by MCPtz; 01-26-2008 at 05:55 PM.. |
|
|
|
|
|
#39 | |||||
|
Registered User
Join Date: Dec 2001
Location: The International Center for Rumour Control
Posts: 3,133
|
Quote:
Quote:
Quote:
Quote:
Quote:
Providing grist for the rumour mill.
|
|||||
|
|
|
|
|
#40 |
|
Registered User
Join Date: Jan 2008
Posts: 2
|
First thanks for clarifying a few things for others. Definitely 'easier' not from an algorithm design standpoint, but for implementation certainly more simple than assembly. You're right about I/O, I was just thinking of having a file on a HDD but yea you're right.
I didn't know gpus were moving to the cpu, but then again as you say, it's obvious they're going there. The SIMD I programmed for is a single board 512 processing element linear array(with wrap around). 2-d torus I think is the shape. http://www.soe.ucsc.edu/projects/kestrel/ "system accelerates computational biology, computational chemistry, and other algorithms by factors of 20 to 40" than a 433 mhz, probably, sun workstation. Each cpu runs at 20 MHz. I don't really think the cell is the epitome of SIMD. It can be programmed to be mimd and so many various things, it's very flexible which is good but at the same time difficult to master from its complexity. But of course it's still super fast, powerful, and very interesting. Anyways I just saw their latest paper addresses why a have a SIMD over a gpgpu, however it hasn't been published yet but you can read their pdf on the webpage. "We propose a model in which the CPU and the GPU are complemented by “the third big chip”, a massively-parallel SIMD processor." |
|
|
|
![]() |
| Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
| Thread Tools | |
| Display Modes | |
|
|