Intel Chipsets all the Way with OpenCL

programmer · August 16, 2008 11:21PM

Quote:

Originally Posted by Marvin

Apple didn't buy PA Semi that long ago...

This is irrelevant, really. First of all, if PA Semi was doing the work they might have been doing it on contract and the acquisition was motivated by them having difficulties or just Apple wanting exclusive access to their magic. If it is even related to the Mac product lineup at all. And the rumours about Apple vector processing have been around for ages -- pre-dating the transition to Intel. Such things could also live alongside Larrabee or any other GPU just fine... the great thing about OpenCL is it makes software agnostic to whatever hardware happens to be present. So really there is no reason that Apple couldn't be at the stage where they are able to deliver such a thing.

Quote:

On the subject of GPU computation affecting graphics, typically you will do computation associated with what you are rendering and this doesn't impact performance significantly at all:

http://www.youtube.com/watch?v=yIT4lMqz4Sk

You do realize that they were demoing the PhysX add-in card in this particular video, right? That card, by the way, was created by a fairly small company and demonstrates some of what is achievable with specialized processors. If they could do it, Apple certainly could too.

marvin · August 17, 2008 4:30AM

Quote:

Originally Posted by Programmer

First of all, if PA Semi was doing the work they might have been doing it on contract and the acquisition was motivated by them having difficulties or just Apple wanting exclusive access to their magic. If it is even related to the Mac product lineup at all.

Exactly, Steve Jobs said the purchase was for ipods/iphones.

http://www.macworld.co.uk/mac/news/i...S&NewsID=21624

"Apple CEO Steve Jobs has confirmed that the company's recently-acquired PA Semi chipmaker's teams will be tasked with building chips for the iPhone and the iPod.

?PA Semi is going to do system-on-chips for iPhones and iPods,? Jobs told the New York Times."

http://www.macrumors.com/2008/04/23/...-in-the-chips/

"According to comments made directly to P.A. Semi's customers, Apple is "not interested in the startup's products or road map, but is buying the company for its intellectual property and engineering talent.""

Quote:

Originally Posted by Programmer

Such things could also live alongside Larrabee or any other GPU just fine

I could understand it living alongside a GPU but not Larrabee. Again not that it's not possible, just that there's no point to it. If you're married to a supermodel, you're not going to rent a $50 hooker.

Quote:

Originally Posted by Programmer

You do realize that they were demoing the PhysX add-in card in this particular video, right? That card, by the way, was created by a fairly small company and demonstrates some of what is achievable with specialized processors. If they could do it, Apple certainly could too.

I didn't notice that but it seems that using PhysX on the GPU is still faster than Ageia:

http://forums.guru3d.com/showthread.php?p=2807534

http://www.fudzilla.com/index.php?op...=8859&Itemid=1

In the end, we're talking about simply using more processing power. An add-on chip vs a better GPU.

Here is a benchmark between the Ageia and the GPU directly performing PhysX calculations:

http://techgage.com/article/nvidias_...tatus_report/2

The GPU is much faster. As the page after that shows though, doing the performance on the GPU does impact on the performance but 10 fps loss at reasonable resolution isn't significant vs 20 fps gain over the CPU. Presumably the Ageia card was in the same machine meaning that benchmark was using CPU + GPU + Ageia, which you'd naturally expect to be faster anyway.

But simply removing the Ageia saves you the money and you lose just 10 fps. Plus, we're talking about GPGPU solutions here, not about running intensive calculations while running graphically intensive games. You simply wouldn't start encoding H264 on your GPU and then load up Crysis.

Even with an extra chip, you wouldn't do that either because it means losing the use of it during the game.

GPGPU can be done now with today's hardware. Nothing has to be done by Apple except to use better GPUs and allow a switching mechanism to integrated chips when they aren't needed to save power. Even when you use them, if they use twice the power but run 5 times faster then overall, you still save power and you get your job done quicker. There is no downside to it.

Apple doing their own processor would mean they'd have to update it too. This is the problem with the PhysX chips. By contrast, GPU manufacturers consistently deliver faster and faster chips beyond Moore's Law - at one point they said it was Moore's Law cubed as they deliver doubled performance every 6 months vs 18 months for CPUs.

April 2008 - Geforce 9800 GTX

June 2008 - GTX 280

3 months between and 50-100% performance increase across the board (the GTX is even overclocked here):

http://www.theinquirer.net/gb/inquir...80-scores-here

I'd love to see Apple deliver that kind of performance boost but they barely manage to update their computers every 12 months these days let alone double performance every 3-6 months.

programmer · August 17, 2008 9:21AM

Quote:

Originally Posted by Marvin

I could understand it living alongside a GPU but not Larrabee. Again not that it's not possible, just that there's no point to it. If you're married to a supermodel, you're not going to rent a $50 hooker.

Well I can think of a couple of famous examples that give lie to your analogy! I'll leave your analogy there, however, as it would be in bad taste to continue applying it here.

Larrabee (and PhysX, for that matter) still lie at the far end of the PCIe bus, and are discrete parts. You'll recall that I'm suggesting that integration into an Apple chipset might be some of the secret sauce here. And I'm not holding up PhysX as the ultimate example of what is possible.

Quote:

Plus, we're talking about GPGPU solutions here, not about running intensive calculations while running graphically intensive games. You simply wouldn't start encoding H264 on your GPU and then load up Crysis.

Unless you are encoding a video of yourself playing Crysis.

It seems shortsighted to assume that there aren't other non-game uses for computational power so our computers shouldn't be made more powerful.

Quote:

By contrast, GPU manufacturers consistently deliver faster and faster chips beyond Moore's Law - at one point they said it was Moore's Law cubed as they deliver doubled performance every 6 months vs 18 months for CPUs.

The GPU guys have been "catching up" as it were. Moore's Law is a statement (not a "law") about the economics of integrated circuits, isn't specific to CPUs and isn't about processor performance. It applies to GPUs as well. I'd be careful about how you apply predictions based on past performance to future GPUs. And you also don't know how this might apply to other forms of processing power.

mdriftmeyer · August 17, 2008 9:28PM

Quote:

Originally Posted by Marvin

Real-time raytracing isn't necessary though - real-time would only be needed for games but games won't use ray-tracing for a while yet due to the extra requirements for anti-aliasing. It would likely be used more for film/CGI rendering, which can take as long as it needs - current CPUs take minutes per frame. GPUs are typically measured in frames per second. Even a few seconds per frame for post-pro quality is a big step forward.

The key is that it does it faster than just using the CPU. If a CPU is rendering raytracing on 8-cores only and the GPU can perform some of the calculations then it's wasted processing power. For some tasks, the GPU will rival the 8-core CPU and if you can use it, you've just cut your render time in half. That's a major selling point.

Check out the upcomng Nvidia GT200 series GPUs vs the itunes and premiere encoders:

http://www.youtube.com/watch?v=8C_Pj1Ep4nw

These are CPU vs GPU demos. Even the Geforce 8 series GPUs can help out using CUDA so it's CPU + GPU. The only concern is heat when it comes to the laptop hardware.

On the subject of ray-tracing there are algorithms being developed to improve it. Carmack talks about that here:

http://www.pcper.com/article.php?aid=532

Sparse voxel octree raytracing.

"The idea would be that you have to have a general purpose solution that can approach all sorts of things and is at least capable of doing the algorithms necessary for this type of ray tracing operation at a decent speed. I think it’s pretty clear that that’s going to be there in the next generation. In fact, years and years ago I did an implementation of this with complete software based stuff and it was interesting; it was not competitive with what you could do with hardware, but it’s likely that I’ll be able to put something together this year probably using CUDA."

This is an important point about ray-tracing in games:

"I think that we can have huge benefits completely ignoring the traditional ray tracing demos of “look at these shiny reflective curved surfaces that make three bounces around and you can look up at yourself”. That’s neat but that’s an artifact shader, something that you look at one 10th of 1% of the time in a game. And you can do a pretty damn good job of hacking that up just with a bunch environment map effects. It won’t be right, but it will look cool, and that’s all that really matters when you’re looking at something like that. We are not doing light transport simulation here, we are doing something that is supposed to look good."

Just look at Gran Tourismo - almost photographic at times - no ray-tracing:

http://www.youtube.com/watch?v=2ko_1lfgvA0

Same deal with Crysis. They have a whole paper about the techniques they used. Occlusion is probably the most commonly used technique for realistic lighting and you don't even need ray-tracing for that. Crysis uses screen space AO:

http://www.youtube.com/watch?v=zoRmoDjhpiY

The difference between ray-tracing and say environment mapping is basically that you get realistic effects and you can handle self-reflection. It takes away your control though. If you want to make an object reflect a particular set of imagery for artistic effect, you don't need the expense of raytracing. This is why you don't use ray-tracing for compositing reflecting CGI with film - there is no 3D environment to reflect so you just have to map it. You still use ray-tracing for AO etc though.

Yeah, I agree that current GPUs aren't the ideal solution in the long term. If they can get some sort of processor together that doesn't increase costs significantly, doesn't generate too much heat and delivers a directed performance boost it could be far better than trying to tax a GPU that will just turn the fans up full blast all the time. The question in my mind is not about technical capability so much as feasibility given the time frames.

Apple didn't buy PA Semi that long ago. To design and outsource manufacturing of a custom built processor might be possible but with Larrabee coming in Summer 2009 or thereabouts, going this route would serve two things. Keeping Apple ahead of the competition and allowing developers to ready software for bigger things to come. As I say though developers can get the OpenCL compatible ATI firestream or probably any of the GT200 series chips and rest assured that their developments will work ok - in fact the Geforce 8 series and up would do. Here is an Nvidia rep talking about OpenCL and CUDA:

"CUDA is a highly tuned programming environment and architecture that includes compilers, software tools, and an API. For us, OpenCL is another API that is an entry point into the CUDA parallel-programming architecture. Programmers will use the same mindset and parallel programming strategy for both OpenCL and CUDA. They are very similar in their syntax, but OpenCL will be more aligned with OS X while CUDA is a based on standard C for a variety of platforms. We have designed our software stack so that to our hardware, OpenCL and CUDA code look the same. They are simply two different paths toward GPU-accelerated code.

"The CUDA and OpenCL APIs differ so code is not 100 percent compatible. However, they share similar constructs for defining data parallelism so the codes will be very similar and the porting efforts will be minor. The fact that CUDA is available today and is supported across all major operating systems, including OS X, means that developers have a stable, pervasive environment for developing gigaflop GPU applications now, that can be easily integrated with OS X via OpenCL when it is released.

"OpenCL actually utilizes the CUDA driver stack in order to deliver great performance on Nvidia GPUs. In the processor market, there are many different types of tools and languages. The parallel computing market will evolve similarly with many different types of tools. The developer gets to choose the development tools that best fills their needs."

http://www.bit-tech.net/hardware/200...cture-review/6

On the subject of GPU computation affecting graphics, typically you will do computation associated with what you are rendering and this doesn't impact performance significantly at all:

http://www.youtube.com/watch?v=yIT4lMqz4Sk

Resolution Independence would most certainly benefit the aide of RayTracing in real-time for independent views, at varying bit depths, with different threads for Text objects, Window Objects, vector->bitmap objects, etc.

mdriftmeyer · August 17, 2008 9:31PM

Quote:

Originally Posted by Marvin

Exactly, Steve Jobs said the purchase was for ipods/iphones.

http://www.macworld.co.uk/mac/news/i...S&NewsID=21624

"Apple CEO Steve Jobs has confirmed that the company's recently-acquired PA Semi chipmaker's teams will be tasked with building chips for the iPhone and the iPod.

“PA Semi is going to do system-on-chips for iPhones and iPods,” Jobs told the New York Times."

http://www.macrumors.com/2008/04/23/...-in-the-chips/

"According to comments made directly to P.A. Semi's customers, Apple is "not interested in the startup's products or road map, but is buying the company for its intellectual property and engineering talent.""

I could understand it living alongside a GPU but not Larrabee. Again not that it's not possible, just that there's no point to it. If you're married to a supermodel, you're not going to rent a $50 hooker.

I didn't notice that but it seems that using PhysX on the GPU is still faster than Ageia:

http://forums.guru3d.com/showthread.php?p=2807534

http://www.fudzilla.com/index.php?op...=8859&Itemid=1

In the end, we're talking about simply using more processing power. An add-on chip vs a better GPU.

Here is a benchmark between the Ageia and the GPU directly performing PhysX calculations:

http://techgage.com/article/nvidias_...tatus_report/2

The GPU is much faster. As the page after that shows though, doing the performance on the GPU does impact on the performance but 10 fps loss at reasonable resolution isn't significant vs 20 fps gain over the CPU. Presumably the Ageia card was in the same machine meaning that benchmark was using CPU + GPU + Ageia, which you'd naturally expect to be faster anyway.

But simply removing the Ageia saves you the money and you lose just 10 fps. Plus, we're talking about GPGPU solutions here, not about running intensive calculations while running graphically intensive games. You simply wouldn't start encoding H264 on your GPU and then load up Crysis.

Even with an extra chip, you wouldn't do that either because it means losing the use of it during the game.

GPGPU can be done now with today's hardware. Nothing has to be done by Apple except to use better GPUs and allow a switching mechanism to integrated chips when they aren't needed to save power. Even when you use them, if they use twice the power but run 5 times faster then overall, you still save power and you get your job done quicker. There is no downside to it.

Apple doing their own processor would mean they'd have to update it too. This is the problem with the PhysX chips. By contrast, GPU manufacturers consistently deliver faster and faster chips beyond Moore's Law - at one point they said it was Moore's Law cubed as they deliver doubled performance every 6 months vs 18 months for CPUs.

April 2008 - Geforce 9800 GTX

June 2008 - GTX 280

3 months between and 50-100% performance increase across the board (the GTX is even overclocked here):

http://www.theinquirer.net/gb/inquir...80-scores-here

I'd love to see Apple deliver that kind of performance boost but they barely manage to update their computers every 12 months these days let alone double performance every 3-6 months.

I can see you've never worked for Steve Jobs. He says a lot of reasoned statements for whatever acquisition Apple buys and leaves it at that. Meanwhile, the acquisition will do a helluva a lot more than just iPhone and iPod.

The unified Cocoa model for OS X across all devices, along-side hardware advancements to have these devices work in a manner where peformance and ease-of-use are always a must guarantees that the Quartz Team is working with the PA Semi team working with the QuickTime team, etc.

wizard69 · August 18, 2008 1:27PM

Quote:

Originally Posted by mdriftmeyer

I can see you've never worked for Steve Jobs. He says a lot of reasoned statements for whatever acquisition Apple buys and leaves it at that. Meanwhile, the acquisition will do a helluva a lot more than just iPhone and iPod.

The unified Cocoa model for OS X across all devices, along-side hardware advancements to have these devices work in a manner where peformance and ease-of-use are always a must guarantees that the Quartz Team is working with the PA Semi team working with the QuickTime team, etc.

Guarantees is a strong word to be using here unless you have inside info. Obviously we don't know exactly what Apple is up to but I have to say there is a very strong possibility that you are right. Still one needs to look at this from a business standpoint and what Apple can reasonably implement that would give them a cost effective advantage in the market place.

In essence I kinda agree that Quicktime is involved as I see the issue of video support and BluRay being huge. A key element here is figuring out how Apple expects to meet the contractural requirements in supporting BluRay.

Even if in the end all we get from PA is iPod chips I still see those chips as being designed to meet some of the goals above. Call me crazy but if Apple could find a secure way to allow users to play BluRay movies on their iPods I can see big opportunities here. I do mean BluTay movies in all their glory with some sort of DRM tacked on. As long as play back is somewhat restricted the studios should go for it.

In case you are wondering yeah I'm talking about a video optmized iPod. The idea is to do for video what the last generation of iPods did for music. By the way NO the Touch is not a video optimized iPod. At least not in the sense that I'm thinking, we need to be able you plug into a video system as easily as we can plug I to a music delivery system. In a nut shell this means signalling up to 1080P.

If PA's actvities have a wider application on Apple hardware. Then I'm really hoping for more general hardware that all developers can leverage. As much as I'd like to see a parallel vector facility somewhat like that implemented in Cell I don't think it is going to happen. Frankly I would expect some noise indicating that such was coming.

Dave

marvin · August 18, 2008 3:55PM

Quote:

Originally Posted by Programmer

Larrabee (and PhysX, for that matter) still lie at the far end of the PCIe bus, and are discrete parts.

Tacking a chip close to the CPU doesn't automatically make it quicker though. If the processor is an order of magnitude slower, it's still going to be that much slower at processing, it'll just receive and send the instructions quicker. You don't just send a single instruction down to the GPU and wait for it to come back again, you send a whole bunch down. GPUs are optimized for high throughput and parallelism vs the CPU being optimized for low latency. Parallelism is better for heavy computation because you only need to do a lot of very specific instructions so you don't require the complexity of a full CPU, just a whole heap of cut down processors.

A balance would be good but Apple will still need to deliver a good chip that is competitive with what we see from GPU manufacturers to be worth making.

Quote:

Originally Posted by Programmer

It seems shortsighted to assume that there aren't other non-game uses for computational power so our computers shouldn't be made more powerful.

Video encoding, ray-tracing, physics simulation, folding at home.

"The raw computational horsepower of GPUs is staggering: A single GeForce 8800 chip achieves a sustained 330 billion floating-point operations per second (Gflops) on simple benchmarks "

"A new GPU-accelerated Folding@home client contributed 28,000 Gflops in the month after its October 2006 release?more than 18 percent of the total Gflops that CPU clients contributed running on Microsoft Windows since October 2000."

http://www.computer.org/portal/site/...7r7!1195711095

Quote:

Originally Posted by Programmer

The GPU guys have been "catching up" as it were. Moore's Law is a statement (not a "law") about the economics of integrated circuits, isn't specific to CPUs and isn't about processor performance. It applies to GPUs as well. I'd be careful about how you apply predictions based on past performance to future GPUs. And you also don't know how this might apply to other forms of processing power.

The 'Law' has more importance to what it implies than the specifics about transistor count. The important implication is about increasing performance vs time and GPU manufacturers are winning in this regard. In terms of transistor count (not density mind you) they are about even. The GT280 has 1.4 billion transistors, the Core i7 supports 731 million (probably per core).

Numbers IMO aren't anywhere near as important as results. If a GPU encodes a video twice as fast as the CPU, that's what I'm using irrespective of GPU vs CPU advantages and disadvantages.

When I see Steve Jobs on stage demoing a video encode using a separate processor running 5-10 times faster than a mainstream high-end CPU then I'll be right there. So far I've seen this demoed by Nvidia. Apple are perfectly welcome to step up to the plate but I'm not going to pin my hopes on what Apple might do when other people are showing what they are actually doing.

Over the past few years, all I've seen Apple do is cripple their hardware so they can jam it into razor thin boxes and mess around with consumer oriented gadgets. OpenCL is the first thing they've done that's even remotely interesting to me and they didn't even start it. They've seen Nvidia doing it and thought, hey what if everyone could use that instead of being tied to proprietary Nvidia hardware. Quite funny when you think about it as that's really what Microsoft did to them. Not quite as Nvidia seem to be backing OpenCL.

Apple simply don't make major hardware advances at great cost without passing those costs onto consumers. A typical GPU that ships in millions of computers is about $100-200. Apple would be targeting their own machines, which ship in small numbers so they can't lower the margins. Even if they could make something at low cost, it's still more cost than using the GPU that they already have. Larrabee on the contrary will be a GPU replacement.

Quote:

Originally Posted by mdriftmeyner

Resolution Independence would most certainly benefit the aide of RayTracing in real-time for independent views, at varying bit depths, with different threads for Text objects, Window Objects, vector->bitmap objects, etc.

Resolution independence will benefit from the technology that can produce real-time raytracing yes. Raytracing will not be used though (no point in 2.5D space as you know where everything is).

Quote:

Originally Posted by wizard69

if Apple could find a secure way to allow users to play BluRay movies on their iPods I can see big opportunities here.

In a nut shell this means signalling up to 1080P.

How exactly do you get a 25GB BluRay movie onto an ipod? Presumably you mean ripping it to 1080 H264 (this still takes up over 8GB) and having the ipod stream this to a high def TV? I don't think there's any urgency for that.

Apple aren't concerned about quality but distribution because that's where they make their cut. Backing BluRay means losses from their online distribution model. They'll say that they back quality proclaiming HD itunes downloads etc just like they claim quality assurance of iphone apps on the app store. What really matters is that every developer pays them a fee to put the app there and a 30% cut for each sale.

With BluRay, Apple sell the ipod, but then what?

programmer · August 18, 2008 11:06PM

Quote:

Originally Posted by Marvin

Tacking a chip close to the CPU doesn't automatically make it quicker though. If the processor is an order of magnitude slower, it's still going to be that much slower at processing, it'll just receive and send the instructions quicker. You don't just send a single instruction down to the GPU and wait for it to come back again, you send a whole bunch down. GPUs are optimized for high throughput and parallelism vs the CPU being optimized for low latency. Parallelism is better for heavy computation because you only need to do a lot of very specific instructions so you don't require the complexity of a full CPU, just a whole heap of cut down processors.

A balance would be good but Apple will still need to deliver a good chip that is competitive with what we see from GPU manufacturers to be worth making.

Believe me, you're preaching to the converted here. Indeed, I've probably been preaching this for a lot longer than you have. The main differences in our positions are that:

* I believe that Apple is capable of (and motivated to) designing a power efficient multi-core vector processing array that is integrated into the north/south bridge. Note that what I describe is also optimized for throughput and parallelism -- it is there to complement the CPU, and do it in a way that is closely bound to the CPU. You'll note that Apple's CFO said "expect to see a hit to our margins".

* In my opinion this would be a worthwhile addition to the Mac, iPod, iPhone hardware designs. They would obviously vary in terms of array size and clock rate, but they would all provide OpenCL acceleration. Most of the media intensive core services of the OS could then leverage this hardware. Also having a powerful GPU in the system does not take away from the value of such hardware, particularly since it would be tightly couple to the main CPU's RAM and would demonstrate different design choices than current GPUs. Whether it is more or less similar to Larrabee is an interesting question.

mdriftmeyer · August 19, 2008 12:23AM

Quote:

Originally Posted by Programmer

Believe me, you're preaching to the converted here. Indeed, I've probably been preaching this for a lot longer than you have. The main differences in our positions are that:

* I believe that Apple is capable of (and motivated to) designing a power efficient multi-core vector processing array that is integrated into the north/south bridge. Note that what I describe is also optimized for throughput and parallelism -- it is there to complement the CPU, and do it in a way that is closely bound to the CPU. You'll note that Apple's CFO said "expect to see a hit to our margins".

* In my opinion this would be a worthwhile addition to the Mac, iPod, iPhone hardware designs. They would obviously vary in terms of array size and clock rate, but they would all provide OpenCL acceleration. Most of the media intensive core services of the OS could then leverage this hardware. Also having a powerful GPU in the system does not take away from the value of such hardware, particularly since it would be tightly couple to the main CPU's RAM and would demonstrate different design choices than current GPUs. Whether it is more or less similar to Larrabee is an interesting question.

This goes to my earlier point of updates to Quartz, QuickTime X, et.al APIs coming in the 10.6 timeframe.

hiro · August 19, 2008 2:19AM

Sorry, I'm not going to go over the whole thread so apologies.

I saw both the Larabee and Cuda briefs and demos last week at SIGGRAPH.

Today, this second, Cuda wins because it is here right now. It has a significant long term weakness though. Cuda programming is severely memory access constrained and with Nvidia keeping chunks of fixed function logic in the pipeline and texture-centric loading it runs into memory bottlenecks. High quality interactive ray-tracing is memory constrained, not processing constrained on Cuda. THAT's why the live demo was three bounces, not five. Also you really have to think about how to manage that memory to minimize the bottlenecks. Over the long term that means programming pain and with what Nvidia has put together to date the chances to change that aren't particularly wonderful.

Ati is in even worse straits with memory access bottlenecks, but they do have a somewhat more accessible programming model, but not by much.

Larabee is LOTS of memory bandwidth, full cache coherency and access handled just like a CPU when doing general purpose work, can you say programmer friendly? Or you can choose to use available fixed function transistors for doing graphics texture work. In other words, choice to select which set of performance issues you want to use with no automatic fixed-function lock-ins. The memory architecture looks like a huge win for programming. Larabee also has ridiculous amounts of vector processing. A 16 element wide VPU per core, yes 16 ints/floats at a time! Can you say single issue matrix operations? It can also handle mixed conditional processing in a single vector instruction. The Nvidia Cuda briefer said process both branches and then pick later-ouch! Wasted work!

I won't call Larabee a hands down win because Intel has to deliver hardware, but that was a big shot across the GPU manufacturer's bows. Big enough I think Intel accepted their crappy GPU division wasn't going to get anywhere, so they did this in secret believing in the end Larabee had huge upside in comparison to dedicated GPUs.

Overall the biggest GPGPU problem today is a totally non-unified programming model. You have to pick one hardware product and target it with it's own nearly identical but different API set. That's going to make the masses shy away for the most part. Larabee tries to avoid that by having a standard CPU programming model, and enough power to do a 95% in-software graphics pipeline. Nvidia is trying to avoid it by joining & pushing OpenCL to try and hide the HW dependencies, I haven't heard anything about Ati, but if they don't play OpenCL they will be dead in 5 years.

OBTW: Nvidia owns PhysX now and it runs on the GPU. The stand alone card is dead with the wreckage of the company Nvidia bought the PhysX product and engineering team from. Intel bought Havoc, what do you think they want to run it on hardware-wise? And don't forget Sony, they have Bullet, an excellent open source physics engine project done by one of their Playstation guys, and it has Cell acceleration.

mdriftmeyer · August 19, 2008 3:28AM

Quote:

Originally Posted by Hiro

Sorry, I'm not going to go over the whole thread so apologies.

I saw both the Larabee and Cuda briefs and demos last week at SIGGRAPH.

Today, this second, Cuda wins because it is here right now. It has a significant long term weakness though. Cuda programming is severely memory access constrained and with Nvidia keeping chunks of fixed function logic in the pipeline and texture-centric loading it runs into memory bottlenecks. High quality interactive ray-tracing is memory constrained, not processing constrained on Cuda. THAT's why the live demo was three bounces, not five. Also you really have to think about how to manage that memory to minimize the bottlenecks. Over the long term that means programming pain and with what Nvidia has put together to date the chances to change that aren't particularly wonderful.

Ati is in even worse straits with memory access bottlenecks, but they do have a somewhat more accessible programming model, but not by much.

Larabee is LOTS of memory bandwidth, full cache coherency and access handled just like a CPU when doing general purpose work, can you say programmer friendly? Or you can choose to use available fixed function transistors for doing graphics texture work. In other words, choice to select which set of performance issues you want to use with no automatic fixed-function lock-ins. The memory architecture looks like a huge win for programming. Larabee also has ridiculous amounts of vector processing. A 16 element wide VPU per core, yes 16 ints/floats at a time! Can you say single issue matrix operations? It can also handle mixed conditional processing in a single vector instruction. The Nvidia Cuda briefer said process both branches and then pick later-ouch! Wasted work!

I won't call Larabee a hands down win because Intel has to deliver hardware, but that was a big shot across the GPU manufacturer's bows. Big enough I think Intel accepted their crappy GPU division wasn't going to get anywhere, so they did this in secret believing in the end Larabee had huge upside in comparison to dedicated GPUs.

Overall the biggest GPGPU problem today is a totally non-unified programming model. You have to pick one hardware product and target it with it's own nearly identical but different API set. That's going to make the masses shy away for the most part. Larabee tries to avoid that by having a standard CPU programming model, and enough power to do a 95% in-software graphics pipeline. Nvidia is trying to avoid it by joining & pushing OpenCL to try and hide the HW dependencies, I haven't heard anything about Ati, but if they don't play OpenCL they will be dead in 5 years.

OBTW: Nvidia owns PhysX now and it runs on the GPU. The stand alone card is dead with the wreckage of the company Nvidia bought the PhysX product and engineering team from. Intel bought Havoc, what do you think they want to run it on hardware-wise? And don't forget Sony, they have Bullet, an excellent open source physics engine project done by one of their Playstation guys, and it has Cell acceleration.

We know PhysX is owned by Nvidia.. That was announced several months ago.

How did you manage to go to SIGGrAPH 2008 and not get the AMD announcement of evolving their streams solution to be OpenCL compliant? I love the use of "evolve" as not to piss off current Streams developers. Let's face it. That will be deprecated and OpenCL will be the solution.

http://www.amd.com/us-en/Corporate/A...127451,00.html

Quote:

AMD Drives Adoption of Industry Standards in GPGPU Software Development

− Plans full support of Microsoft DirectX® 11 and OpenCL to enable increased C/C++ cross-platform programming efficiency—

Sunnyvale, Calif. -- August 6, 2008 --AMD (NYSE: AMD) today announced efforts to help further increase the ease and efficiency of software development using AMD Stream™ processing with an extensive set of upgrades planned for future versions of the Stream Software Development Kit (SDK).

The improvements are designed to reduce the time and effort needed to produce GPU accelerated applications that run on multiple platforms, by expanding support for industry standard application programming interfaces (APIs) and providing enhanced support for C/C++.

Through a series of updates to the SDK scheduled over the course of the next 18 months, AMD plans to add full support for DirectX 11, the next-generation suite of advanced APIs from Microsoft.

DirectX 11 is expected to build upon the already outstanding performance of DirectX® 10.1 for 3-D graphics rendering and gaming control. It is also being designed to introduce a host of new technologies aimed at making it easier for programmers to create general purpose graphics processing (GPGPU) accelerated applications that can run on any Windows Vista® powered platform.

“Just as it ushered in the era of advanced 3-D gaming for the masses, DirectX is poised to be at the vanguard of the GPGPU revolution,” said Anantha Kancherla, manager of Windows desktop and graphics technologies, Microsoft. “DirectX 11 gives developers the power to more easily harness the astonishing capabilities of AMD GPUs for general purpose computation, and gives consumers an effortless way to experience all that AMD Stream has to offer, on the hundreds of millions of Microsoft Windows powered systems worldwide.”

As previously announced AMD is also supporting efforts to develop OpenCL as an open standard and plans to evolve the Stream SDK to be OpenCL compliant. Through equal support for both DirectX 11 and OpenCL, and by continuing to give developers the option of creating and using their own programming languages and high level tools, AMD is executing on a strategy designed to give programmers maximum choice and flexibility.

“Industry standards are essential to unlocking the compute potential of GPUs and driving broad adoption of this capability in mainstream applications,” said Rick Bergman, senior vice president and general manager, Graphics Product Group, AMD. “GPGPU is now moving past the era of closed and fully proprietary development chains. With the advent of DirectX 11 and OpenCL, C/C++ programmers worldwide will have standardized and easier ways of leveraging the GPU’s computational capabilities.”

AMD will also continue to enhance and support the Brook+ programming language, providing programmers a stable, high-performance platform for accelerating their applications.

About AMD Stream™

AMD Stream is a set of open AMD technologies that allow the hundreds of parallel Stream cores inside AMD GPUs to accelerate general purpose applications, resulting in platforms capable of delivering dramatically high performance-per-watt. The freely distributed, fully open Stream SDK allows programmers to make advanced use of AMD hardware, and helps them to create fast, energy efficient applications on a growing variety of platforms and operating systems.

About AMD

Advanced Micro Devices (NYSE: AMD) is a leading global provider of innovative processing solutions in the computing and graphics markets. AMD is dedicated to driving open innovation, choice and industry growth by delivering superior customer-centric solutions that empower consumers and businesses worldwide. For more information, visit www.amd.com.

©2008 Advanced Micro Devices, Inc. AMD, the AMD Arrow logo, Stream and combinations thereof are trademarks of Advanced Micro Devices, Inc. Other names are for informational purposes only and may be trademarks of their respective owners.

hiro · August 19, 2008 10:47AM

Quote:

Originally Posted by mdriftmeyer

We know PhysX is owned by Nvidia.. That was announced several months ago.

You may have known that but the conversation a few posts up didn't look like they did.

Quote:

Originally Posted by mdriftmeyer

How did you manage to go to SIGGrAPH 2008 and not get the AMD announcement of evolving their streams solution to be OpenCL compliant? I love the use of "evolve" as not to piss off current Streams developers. Let's face it. That will be deprecated and OpenCL will be the solution.

http://www.amd.com/us-en/Corporate/A...127451,00.html

Because Ati didn't have any dedicated sessions. Even the press release you cited was released the week prior to SIGGRAPH. I wasn't the only one wondering why they let the biggest graphics event of the year go by so quietly. Their booth was basic milquetoast with marketing fluff talking heads. They should at least have done a Tech Talk or BoF, but they didn't so they get to look like they didn't have anything worth saying.

mdriftmeyer · August 19, 2008 2:10PM

Quote:

Originally Posted by Hiro

You may have known that but the conversation a few posts up didn't look like they did.

Because Ati didn't have any dedicated sessions. Even the press release you cited was released the week prior to SIGGRAPH. I wasn't the only one wondering why they let the biggest graphics event of the year go by so quietly. Their booth was basic milquetoast with marketing fluff talking heads. They should at least have done a Tech Talk or BoF, but they didn't so they get to look like they didn't have anything worth saying.

Wow, that's classic of AMD to pull such a weak public appearance. However, the BOF for OpenGL they gave more information yet still that is truly weak that they didnt' bother to give people, at the conference, an updated roadmap within a package of free stuff.

hiro · August 19, 2008 8:24PM

Quote:

Originally Posted by mdriftmeyer

Wow, that's classic of AMD to pull such a weak public appearance. However, the BOF for OpenGL they gave more information yet still that is truly weak that they didnt' bother to give people, at the conference, an updated roadmap within a package of free stuff.

I couldn't make the OpenGL BoF, it started before the Cuda raytracing talk finished (which backed up the Cuda programming intro talk) and wasn't even at the convention center. Not to mention my focus is mostly off the pretty part of the visuals now and more into the underlying compute issues which will allow more physically-based animations.

programmer · August 19, 2008 11:03PM

Quote:

Originally Posted by Hiro

You may have known that but the conversation a few posts up didn't look like they did.

The discussion wasn't about who owns what, it was about what was in the particular video that was linked to. I am well aware of the current status of PhysX, their software, etc.

Sadly precious few real details on OpenCL are still available.

wizard69 · August 20, 2008 12:00AM

Quote:

Originally Posted by Programmer

The discussion wasn't about who owns what, it was about what was in the particular video that was linked to. I am well aware of the current status of PhysX, their software, etc.

Sadly precious few real details on OpenCL are still available.

Which makes one think if Apple has some sort of hardware tie in. Let's face it OpenCL is more or less out of the bag now, why would Apple keep it all bottled up? Of course I ask that still wondering when the NDA is going to come off the iPhone SDK.

The thing that bothers me about Apples statements is that the last time something similar was said all we got was a lower price on iPhone. So I sit here wondering if there is enough evidence in Apples comments for us to believe that anything more than lower cost iPods are coming? Apple has said that PA was aquired for projects related to iPods, so I could see Apple trying to lower hardware prices to the point that no one could easily compete. In other words the stuff PA ships is not so much hardware to improve performance as it is to lower costs to allow Apple to offer products at a much lower cost.

As you can see I have reservations about what the meaning of Apples comments are. There are all sorts of rumors and such but no hard evidence as to what those engineers are up to at PA. One thing that comes to mind is this: did Apple retain the entire staff at PA? If the whole organization is intact that may give us some idea of the scope of the project(s) at PA. In other words if the staffing hasn't changed I would have a hard time believing they are all working on one little SoC for an iPod.

Frankly I'm perplexed and just don't know what is up.

Dave

mjteix · August 20, 2008 8:17AM

Quote:

Originally Posted by wizard69

In other words if the staffing hasn't changed I would have a hard time believing they are all working on one little SoC for an iPod.

Frankly I'm perplexed and just don't know what is up.

Dave

As I understand it, the PA staff will continue servicing their previous customers.

wizard69 · August 22, 2008 11:31PM

What would everybodies reaction be if the coming tech that is suppose to impact margins is the installation of SSD in some members of the Mac line up. It might be a total coincidence but the new Micron/Intel tech is coming on line very soon. Using such would give Apple a chance to lead with new tech. Further they have the demand that would command agressive price breaks and or higher capacity devices.

Just a thought as I ponder the possibilities that might impact margins.

Dave

mdriftmeyer · August 23, 2008 3:13AM

Quote:

Originally Posted by wizard69

What would everybodies reaction be if the coming tech that is suppose to impact margins is the installation of SSD in some members of the Mac line up. It might be a total coincidence but the new Micron/Intel tech is coming on line very soon. Using such would give Apple a chance to lead with new tech. Further they have the demand that would command agressive price breaks and or higher capacity devices.

Just a thought as I ponder the possibilities that might impact margins.

Dave

Problematic faster writes and reads aren't game changers. These will be new products, in new markets.

wizard69 · August 23, 2008 11:09AM

Quote:

Originally Posted by mdriftmeyer

Problematic faster writes and reads aren't game changers. These will be new products, in new markets.

You seem to be awfully certain of that. Mind you it is not that I wouldn't want to see such, it is just that I would expect more leaks if something game changing was coming.

I guess we will have to wait a few more weeks to see if Apple lives up to it's own hype. I'm still trying to imagine what they where alluding to.

Dave

Intel Chipsets all the Way with OpenCL

Comments