Intel Chipsets all the Way with OpenCL

programmer · August 7, 2008 11:11PM

Quote:

Originally Posted by Marvin

I want Nvidia's chips to replace all graphics chips in Apple products for now and the foreseeable future.

I really don't understand the love-in people have for nVidia. I've had to work with them and their hardware for over 10 years, and never has it been the most pleasant of experiences. I can't say working with the Intel integrated GPU group has been particularly rewarding either, but the Intel CPU group is definitely better.

Quote:

Then I see one of two things happening. Either Intel buys Nvidia or Nvidia merge up with AMD and ATI. They seem to be siding with them in the USB 3 dispute.

The market doesn't have 2 sides, but a specific dispute generally does... so a 3rd party will always have to side with either 1st or 2nd party. Means nothing in the grand scheme of things. ATI and nVidia still don't get along very well.

Quote:

Intel don't really need NVidia but NVidia can't survive on its own. For now yes but not in say 2 years times so they're going to have to go somewhere.

Right, so in 2-5 years they are going to shrivel up and die. Then the vultures move in and pick the corpse clean of its valuable patents. Just as well -- I can't stand nVidia's CEO.

Quote:

"To render F.E.A.R. at 60 frames per second ... and that's about the performance of a 2006-vintage Nvidia or Advanced Micro Devices/ATI graphics chip. This year's chips are three to four times as fast. In other words, unless Intel is prepared to make big, hot Larrabee chips, I don't think it's going to be competitive with today's best graphics chips on games."

Yes, I've noticed that a lot of people aren't reading the SIGGRAPH paper very well. Reading comprehension is clearly a lost art. The paper does not describe a particular implementation in silicon, it describes things in terms of "Larrabee units" which are an idealized metric. They spend a lot of time showing linear scalability.

Quote:

"But ray tracing merits just one paragraph and one figure in this paper, which establish merely that Larrabee is more efficient at ray tracing than an ordinary Xeon server processor. It falls well short of establishing that ray tracing is a viable option on Larrabee, however."

And yet you point to ATI's efforts in ray tracing? Hardly a viable option there either.

Quote:

"even with the impressive 16 single precision operations per clock (per core), a 1.0 GHz Larrabee chip would need 62 cores to equal the performance of the latest teraflop GPUs from NVIDIA and AMD that will ship this year."

And completely ignores the fact that the general purpose cores in Larrabee aren't constrained to do things the way the more fixed-function architectures have to do them. There is a lengthy discussion in the paper about how the graphics pipeline can be tuned and adapted to the particular application/game. nVidia has been beating ATI over the head with how its model is more adaptable than Radeon. Now Intel is coming along with something hugely more adaptable yet. And this is good because there is more and more variety in how games are doing their graphics, and these techniques don't all map ideally to how the current GPUs and APIs would like things to be done. Intel is giving us back software rendering without taking away the performance.

Quote:

So on the assumption that Intel doesn't deliver on Larrabee, which let's face it wouldn't be a huge surprise given their previous efforts in graphics processing, Apple will have to find GPUs that let OpenCL shine and they won't be from Intel.

Very poor assumption. And don't let their previous efforts mislead you. Even if Larrabee falls short of the latest ATI/nVidia efforts in terms of graphics, its generalized nature means it will kill them when using OpenCL. And when doing pretty much anything that doesn't fit nicely into the limitations of the graphics pipeline (see above).

As for Intel's integration of a GPU into their CPUs will be a member of the "lake" series technology. Late 2009 will be their first attempt, and it'll be a mobile part. I'd speculate that it will likely be the post-Nehalem architecture (Sandy Bridge) that integrates with Larrabee technology for the first time... at the earliest. If you read the SIGGRAPH paper it quite clearly discusses Larrabee as a platform, not just as a one-off product. Intel will inevitably throw their industry-leading process technology at it, which will give them a substantial advantage over nVidia and ATI.

nVidia has another problem... QuickPath. Intel isn't licensing it to them. That means they'll be out of the chipset business next year, and it also means that even if Intel doesn't pull Larrabee onto the main CPU's die they can potentially use QuickPath to get a higher performance CPU <-> GPU connection than nVidia can through PCIe. ATI is AMD so they can achieve this with the AMD CPUs and HyperTransport (and the Fusion product, if it ever arrives).

Intel doesn't have to win with their first delivery. They've defined an impressively scalable architecture, and they can iterate on it. They have more resources, they have better process technology. nVidia is freaking out because they hear the train coming, they are tied to the tracks, and there is nothing they can do about it.

Quote:

given that ATI/Nvidia chips already have over 100 cores running at more than 1GHz... The 8600M GT has 32 stream processors at 950MHz.

Be careful about comparing apples and oranges... those stream processors aren't directly comparable to the Larrabee cores (not even close), and its not public yet what the shipping clock rate or core count of the initial Intel part will be. The paper quite carefully doesn't give that away. Also, not much is said about the supporting infrastructure for the cores... but nVidia's infrastructure isn't much to crow about.

futurepastnow · August 8, 2008 1:42AM

Quote:

Originally Posted by Marvin

"To render F.E.A.R. at 60 frames per second--a common definition of good-enough gaming performance--required from 7 to 25 cores, assuming each was running at 1GHz. Although there's a range here depending on the complexity of each frame, good gameplay requires maintaining a high frame rate--so it's possible that F.E.A.R. would, in practice, require at least a 16-core Larrabee processor.

And that's about the performance of a 2006-vintage Nvidia or Advanced Micro Devices/ATI graphics chip. This year's chips are three to four times as fast.

In other words, unless Intel is prepared to make big, hot Larrabee chips, I don't think it's going to be competitive with today's best graphics chips on games."

"But ray tracing merits just one paragraph and one figure in this paper, which establish merely that Larrabee is more efficient at ray tracing than an ordinary Xeon server processor. It falls well short of establishing that ray tracing is a viable option on Larrabee, however."

http://www.hpcwire.com/features/Inte..._Larrabee.html

"even with the impressive 16 single precision operations per clock (per core), a 1.0 GHz Larrabee chip would need 62 cores to equal the performance of the latest teraflop GPUs from NVIDIA and AMD that will ship this year."

So on the assumption that Intel doesn't deliver on Larrabee, which let's face it wouldn't be a huge surprise given their previous efforts in graphics processing, Apple will have to find GPUs that let OpenCL shine and they won't be from Intel.

That's a very dangerous assumption. Intel means business, or they wouldn't be taking this so far. By the way, the "1GHz" figure was just used by Intel to make the math simpler. Larrabee will run much faster. They're talking ~2.5GHz.

Quote:

As I said, ATI have raytracing chips now. Intel are now only saying they will have chips that might be powerful enough more than a year away. The only way Larrabee would have been interesting is if at the worst they were shipping it in very early 2009. Now it seems they will have to make a 62-core chips at the end of 2009 just to compete with what ATI and Nvidia have right now. It makes a lot of sense that this would be the case given that ATI/Nvidia chips already have over 100 cores running at more than 1GHz.

"Cores" is a completely meaningless term for comparison. When Nvidia and ATI talk about cores, they're referring to stream processors, which individually are useless- they are grouped into clusters in the GPU. One core, as Intel defines it, would be the equivalent of dozens of ATI or NV "cores".

Quote:

The 8600M GT has 32 stream processors at 950MHz. It would be good to see that go into the Macbook and the 8700M GT with 32 at 1.25GHz into the larger Macbook - I'm going to omit the 'pro' moniker because I think they are merging and the larger one's price will drop. I reckon the Geforce 9 series will draw too much power and not allow apple to drop the price. The MB is fine, the MBP just needs to come down so put good GPUs in the MB and drop the MBP with a minor GPU upgrade. Using OpenCL, both these GPUs will be more than enough to be competitive with laptops with higher end GPUs without OpenCL.

And just FYI, the 8600M GT reference clock speed is 475MHz, not 950MHz, and Apple is underclocking them in the MBP for thermal/noise reasons. Also, the 8700M GT is clocked at 625MHz, not 1.25GHz as you state. Not that it matters, since comparing the clock speeds of such totally different architectures is as worthless as comparing the core counts.

marvin · August 8, 2008 2:42AM

Quote:

Originally Posted by Programmer

I really don't understand the love-in people have for nVidia.

It's consistently the fastest hardware and ATI drivers cause people a lot of problems - more on Windows than on the Mac though.

Quote:

Originally Posted by Programmer

Yes, I've noticed that a lot of people aren't reading the SIGGRAPH paper very well. Reading comprehension is clearly a lost art. The paper does not describe a particular implementation in silicon, it describes things in terms of "Larrabee units" which are an idealized metric. They spend a lot of time showing linear scalability.

I just found the paper:

http://softwarecommunity.intel.com/U...e_manycore.pdf

It seems like it was picked up wrongly. The 1GHz clock they used was not what they will use in production and it says that 16 cores handle real-time ray-tracing.

They didn't run the games either, just sampled frames and ran them through a test model. It seems that the production Larrabee will have 16-25 cores running at 1.7GHz-2.5GHz. 150W minimum though - that's not going in a laptop.

http://arstechnica.com/news.ars/post...e-part-ii.html

That's an old article so it's possible they will improve power consumption by the time it ships.

Quote:

Originally Posted by Programmer

Very poor assumption. And don't let their previous efforts mislead you. Even if Larrabee falls short of the latest ATI/nVidia efforts in terms of graphics, its generalized nature means it will kill them when using OpenCL. And when doing pretty much anything that doesn't fit nicely into the limitations of the graphics pipeline (see above).

Nvidia's/ATI's and Intel's models are converging from different directions. Intel are trying to make the CPU architecture more specialized, the others are trying to make specialized hardware perform more generic tasks. The success will depend on what the implementation actually allows people to do. If Larrabee allows people to accelerate a whole load of tasks but the GPU models are limited to graphics but still perform faster then if graphics acceleration is all people need, they want to get the best and they'll still go for Nvidia/ATI.

As you rightly point out though, graphics processing employs a wide range of techniques these days so their implementation will likely win in the end even if not at first and that shows in the paper too.

So far though, Intel has delivered nothing but a paper. ATI hardware has already rendered the trailers for the Transformers film with real-time ray-tracing. You say this isn't a viable option but I'm not sure why it wouldn't be.

What is interesting is:

"The Nvidia

GeForce 8 operates in a similar fashion, organizing its scalar

SIMD processors in groups of 32 that execute the same

instruction [Nickolls et al. 2008]. The main difference is that in

Larrabee the loop control, cache management, and other such

operations are code that runs in parallel with the VPU, instead of

being implemented as fixed function logic."

Points again to a move to Geforce 8 chips being a precursor for Larrabee support. The iMac has the 8800GS, the Mac Pro the 8800GT, MBP has the 8600M GT.

Quote:

Originally Posted by Programmer

Intel doesn't have to win with their first delivery. They've defined an impressively scalable architecture, and they can iterate on it. They have more resources, they have better process technology. nVidia is freaking out because they hear the train coming, they are tied to the tracks, and there is nothing they can do about it.

That's what I was saying though, they will get desperate enough and sell out to someone. They won't just go out of business. Someone will buy them up.

Here's what I see happening, Apple go all Nvidia before January. Then they release Snow Leopard at MacWorld. Then they make the transition to Larrabee. This is the big transition Apple will have been talking about - state of the art and the competition won't even compare to this. Touch products are ok but it's still a bit gimmicky. Orders of magnitude improvements in performance is much better.

mjteix · August 8, 2008 9:05AM

Quote:

Originally Posted by Marvin

Points again to a move to Geforce 8 chips being a precursor for Larrabee support. The iMac has the 8800GS, the Mac Pro the 8800GT, MBP has the 8600M GT.

Here's what I see happening, Apple go all Nvidia before January. Then they release Snow Leopard at MacWorld. Then they make the transition to Larrabee...

OMG! You're daydreaming.

It's not because the name is Snow Leopard that it will be release during the winter. It's not even in beta yet!

SJ said it will be release around next summer, probably at the 2009 WWDC or just after.

The 8800GS on the iMac is BTO only, the 8800GT on the Mac Pro is BTO only...

FWIW, I don't find nvidia offerings very attractive these days.

Until there are changes, I'd rather Apple to use the best/appropriate chips from all three companies (Intel included).

--------

Anyway, to have something that "the competion can't match", it has to be exclusive technology, chips, whatever, made by/for Apple.

Whatever nvidia/amd-ati/intel releases in the coming months/years will anyway be available to all manufacturers.

It will be just a matter of time for others to match Apple's offerings.

programmer · August 8, 2008 9:54AM

Quote:

Originally Posted by mjteix

SJ said it will be release around next summer, probably at the 2009 WWDC or just after.

Did he even say that much? I was figuring on a fall release like Leopard... about the same time Intel claims the first Larrabee will reach the consumer.

Quote:

Until there are changes, I'd rather Apple to use the best/appropriate chips from all three companies (Intel included).

Amen to that... and that is exactly why things like OpenCL and GLSL are created, so that code can be created independent of the hardware that is running it.

Quote:

Originally Posted by Marvin

ATI hardware has already rendered the trailers for the Transformers film with real-time ray-tracing. You say this isn't a viable option but I'm not sure why it wouldn't be.

nVidia hardware can be used to do the same techniques. ATI hasn't delivered "ray tracing hardware" any more than Intel CPUs are "ray tracing hardware". I haven't looked at what the Transformers film guys did, but "real-time" in the film business is considerably different than what it means to most people, especially games. Its also possible that they applied a whole network of machines, or something like that.

Quote:

It seems that the production Larrabee will have 16-25 cores running at 1.7GHz-2.5GHz. 150W minimum though - that's not going in a laptop.

As I said, they still haven't told us what the hardware will really be. And no, it won't be mobile initially, but then neither are the comparable nVidia/ATI parts. The mobile GPUs have a whole different target performance level.

mjteix · August 8, 2008 11:02AM

Quote:

Originally Posted by Programmer

Did he even say that much? I was figuring on a fall release like Leopard... about the same time Intel claims the first Larrabee will reach the consumer.

Now, you got me thinking...

It may not be SJ himself, but I have a memory of a slide stating summer 2009...

As long as it is released when ready..

marvin · August 8, 2008 1:22PM

Quote:

Originally Posted by mjteix

OMG! You're daydreaming.

It's not because the name is Snow Leopard that it will be release during the winter. It's not even in beta yet!

SJ said it will be release around next summer, probably at the 2009 WWDC or just after.

An AppleInsider news article said that. There is no projected time frame for Snow Leopard. No beta means nothing right now as WWDC was just 8 weeks ago. That leaves 21 weeks - about 6 months to get it ready. It doesn't even have to ship until a bit later but IMO, Apple need to get this into developer's hands as quickly as they can.

Apps need to be recompiled and developers need to see how to write code to best take advantage of the new developments. WWDC does seem a much more likely release time and more appropriate venue but it's not beyond possibility that we can see something solid at MacWorld.

Quote:

Originally Posted by mjteix

The 8800GS on the iMac is BTO only, the 8800GT on the Mac Pro is BTO only...

FWIW, I don't find nvidia offerings very attractive these days.

They are BTO just now but notice they are all higher end options. The ATI chips only handle the low end of Apple's lineup. What's not attractive about Nvidia? The 8800 GT is excellent price/performance where there were nothing but problems with the X1900 XT. The ATI chips underperform and the Geforce 7300 BTO in the old iMac is faster than the ATI chips in the new one.

Quote:

Originally Posted by mjteix

Until there are changes, I'd rather Apple to use the best/appropriate chips from all three companies (Intel included).

Naturally and currently that would be Nvidia. Then in over a year, they can glance towards Intel.

Quote:

Originally Posted by mjteix

Anyway, to have something that "the competion can't match", it has to be exclusive technology, chips, whatever, made by/for Apple.

Whatever nvidia/amd-ati/intel releases in the coming months/years will anyway be available to all manufacturers.

It will be just a matter of time for others to match Apple's offerings.

They don't have OpenCL though. Apple are developing this and competitors won't have the implementation Apple do. Microsoft doesn't have the ability to do these kind of things. Apple change stuff all the time - if Microsoft made changes like Apple did like architecture, OS core and so on, the business world would be very unhappy.

Microsoft couldn't for example drop PPC support (if they had it) like it seems Apple will. That platform isn't very old at all but it will be obsolete after Snow Leopard. This ability to change things puts them ahead.

Custom chips could be the big thing but it could be a custom chipset designed to make the CPU/GPU link as fast as possible. It could be both. You'd have to ask what kind of chip could Apple design that would put them ahead? I'd say not that many really. The CPU and GPU pretty much do everything. Custom chips are really what the whole move to GPGPUs is about - they are trying to move away from that model to allow for better advances in software.

Quote:

Originally Posted by Programmer

ATI hasn't delivered "ray tracing hardware" any more than Intel CPUs are "ray tracing hardware". I haven't looked at what the Transformers film guys did, but "real-time" in the film business is considerably different than what it means to most people, especially games. Its also possible that they applied a whole network of machines, or something like that.

http://www.tomshardware.com/news/Lar...cing,5769.html

Real-time in film is the same as games i.e. frames are generated fast enough so that they appear to be non-static.

"In terms of performance, the Radeon 2900XT 1GB rendered Transformers scenes in 20-30 frames per second, in 720p resolution and no Anti-Aliasing. With the Radeon 3870, the test scene jumped to 60 fps, with a drop to 20 fps when the proprietary Anti-Aliasing algorithm was applied. Urbach mentioned that the Radeon 4870 hits the same 60 fps - and stays at that level with Anti-Aliasing (a ray-tracer is not expecting more than 60 fps.) JulesWorld’s technology also works on Nvidia GeForce 8800 cards and above, but the lack of a tessellation unit causes a bit more work on the ray-tracer side.

In future, Urbach expects to see 1080p, 4K and higher resolutions being rendered in real time.

So, what about photo-realism of these scenes? Unlike the general "3-5 years" answer you are hearing in the industry right now, he believes that this goal could be achieved by the end of the year."

http://www.rage3d.com/print.php?arti...icles/cinema20

mdriftmeyer · August 8, 2008 3:33PM

Quote:

Originally Posted by Marvin

An AppleInsider news article said that. There is no projected time frame for Snow Leopard. No beta means nothing right now as WWDC was just 8 weeks ago. That leaves 21 weeks - about 6 months to get it ready. It doesn't even have to ship until a bit later but IMO, Apple need to get this into developer's hands as quickly as they can.

Apps need to be recompiled and developers need to see how to write code to best take advantage of the new developments. WWDC does seem a much more likely release time and more appropriate venue but it's not beyond possibility that we can see something solid at MacWorld.

They are BTO just now but notice they are all higher end options. The ATI chips only handle the low end of Apple's lineup. What's not attractive about Nvidia? The 8800 GT is excellent price/performance where there were nothing but problems with the X1900 XT. The ATI chips underperform and the Geforce 7300 BTO in the old iMac is faster than the ATI chips in the new one.

Naturally and currently that would be Nvidia. Then in over a year, they can glance towards Intel.

They don't have OpenCL though. Apple are developing this and competitors won't have the implementation Apple do. Microsoft doesn't have the ability to do these kind of things. Apple change stuff all the time - if Microsoft made changes like Apple did like architecture, OS core and so on, the business world would be very unhappy.

Microsoft couldn't for example drop PPC support (if they had it) like it seems Apple will. That platform isn't very old at all but it will be obsolete after Snow Leopard. This ability to change things puts them ahead.

Custom chips could be the big thing but it could be a custom chipset designed to make the CPU/GPU link as fast as possible. It could be both. You'd have to ask what kind of chip could Apple design that would put them ahead? I'd say not that many really. The CPU and GPU pretty much do everything. Custom chips are really what the whole move to GPGPUs is about - they are trying to move away from that model to allow for better advances in software.

http://www.tomshardware.com/news/Lar...cing,5769.html

Real-time in film is the same as games i.e. frames are generated fast enough so that they appear to be non-static.

"In terms of performance, the Radeon 2900XT 1GB rendered Transformers scenes in 20-30 frames per second, in 720p resolution and no Anti-Aliasing. With the Radeon 3870, the test scene jumped to 60 fps, with a drop to 20 fps when the proprietary Anti-Aliasing algorithm was applied. Urbach mentioned that the Radeon 4870 hits the same 60 fps - and stays at that level with Anti-Aliasing (a ray-tracer is not expecting more than 60 fps.) JulesWorld’s technology also works on Nvidia GeForce 8800 cards and above, but the lack of a tessellation unit causes a bit more work on the ray-tracer side.

In future, Urbach expects to see 1080p, 4K and higher resolutions being rendered in real time.

So, what about photo-realism of these scenes? Unlike the general "3-5 years" answer you are hearing in the industry right now, he believes that this goal could be achieved by the end of the year."

http://www.rage3d.com/print.php?arti...icles/cinema20

Such power would go a long way towards Resolution Independence, compound UIs with differing bit depths and threading of views task without compromising the system to a crawl.

From this Google quick search there is quite a bit of interesting working going on:

http://www.google.com/search?client=...utf-8&oe=utf-8

programmer · August 9, 2008 8:50AM

Quote:

Originally Posted by Marvin

They don't have OpenCL though. Apple are developing this and competitors won't have the implementation Apple do. Microsoft doesn't have the ability to do these kind of things. Apple change stuff all the time - if Microsoft made changes like Apple did like architecture, OS core and so on, the business world would be very unhappy.

Except that Apple is trying to make OpenCL an open standard. And Microsoft could certainly add it as it doesn't require changing the existing system. Microsoft added HLSL to D3D just fine.

Apple's real advantage is its integration of hardware and software.

Quote:

JulesWorld’s technology also works on Nvidia GeForce 8800 cards and above, but the lack of a tessellation unit causes a bit more work on the ray-tracer side.

So this is the important point... you have to compare this specific tech on all the competing pieces of hardware in order to do an apples-to-apples comparison. Even though two things may be painted with the "ray tracing" label doesn't mean they are equivalent, hence research work by Intel and this product from JulesWorld cannot be used to compare how well ray tracing works on different hardware. And a Larrabee implementation of JulesWorld could do EVERYTHING on the Larrabee (unlike ATI and nVidia GPUs), and thus accelerate it all. And no contortions to make it work on a GPU would be required. Which is an advantage to Larrabee instead of a detraction.

wizard69 · August 9, 2008 1:22PM

Just a couple of notes to interject as I've not followed this thread closely.

First Snow Leopard:

Apple has indicated that there won't be a lot of user land changes which I believe is very likely. However from that one can not come to the conclusion that it will immediately be leveraged by every app out there. I suspect that it will be a long time before many apps fully leverage all the improvements In Snow Leopard. Just because Apple improves an underlying feature, such as threading, doesn't mean an app can leverage that improvement without a rewrite.

OpenCL:

This is / will be a nice technology but I believe it is mistake to believe it is the route to ultimate performance. I just have very hard time believing that any tech that adjusts code dynamically for the hardware available will always lead to the fastest code. It might at times but ultimate performance needs custom crafted code. In a way OpenCL reminds me of Java with all the good and bad.

Ray Tracing:

Ultimately video hardware that can do such realtime will be very compelling. My question is this; is an array of somewhat general purpose GPU processors the best approach here. Especially in Apples case where they have everything from handheld units to desktops that could make use of ray tracing. If Apple is interested in such I see them taking a different approach that would be custom hardware. A couple months ago I read about some college student accellerating ray tracing for a low end processor via custom hardware. I could see Apple doing the same and being able to offer ray tracing on the iPod Touch on up. That certainly would put them ahead of the pack.

It is a potential answer anyways to what Apple might have up it's sleeve. That is of they have anything hardware related at all. But giving some of Apples IP purchases of the past (Racer) and the state of interest in the acedemic world, I could see Apple moving into the world off hardware accellerated ray tracing. At least it sounds like a good idea when looked at from the context of an improvement that can't be touched by the competition. At least that is the idea that is spinnig around in my head today. Note too that that ray tracing acceleration could come as an instruction or feature in a general purpose vector unit. The idea being simply a low cost and low power ray tracing.

CPU's in general:

To be honest I don't see Apple focusing on just one vendor anytime soon. They would simply loose to much leverage and flexibility. At least up to the point where you can't get GPU free processor any more. Apple has done well so far being able to switch GPUs as required.

Dave

marvin · August 9, 2008 5:58PM

Quote:

Originally Posted by mdriftmeyer

Such power would go a long way towards Resolution Independence, compound UIs with differing bit depths and threading of views task without compromising the system to a crawl.

It certainly would - I forgot about RI. Unfortunately, Apple probably will aim for that in the Snow Leopard release so probably not anywhere near a January release. They did say get RI ready for 2009.

Quote:

Originally Posted by Programmer

Except that Apple is trying to make OpenCL an open standard. And Microsoft could certainly add it as it doesn't require changing the existing system. Microsoft added HLSL to D3D just fine.

Apple's real advantage is its integration of hardware and software.

Exactly but this is where the PC industry falls down. They can't deliver this integration because it's so fragmented and adopting and promoting a standard like OpenCL would take far longer than in the Mac world. Microsoft likely won't adopt it either because they are developing their own proprietary equivalent, which will no doubt appear in Windows 7.

After all, we can't expect them to let go of their DirectX monopoly so easily. Perhaps OpenCL will replace OpenGL so it'll be something like OpenCL vs DirectX 11. The downside is Microsoft have their XBox so the gaming community will still adopt it. The upside is Sony probably won't - they are a promoting member of the group supporting OpenGL and will likely look toward OpenCL especially if it can help developments on their muti-core cell.

Quote:

Originally Posted by Programmer

So this is the important point... you have to compare this specific tech on all the competing pieces of hardware in order to do an apples-to-apples comparison. Even though two things may be painted with the "ray tracing" label doesn't mean they are equivalent, hence research work by Intel and this product from JulesWorld cannot be used to compare how well ray tracing works on different hardware. And a Larrabee implementation of JulesWorld could do EVERYTHING on the Larrabee (unlike ATI and nVidia GPUs), and thus accelerate it all. And no contortions to make it work on a GPU would be required. Which is an advantage to Larrabee instead of a detraction.

They all do raytracing though, it just says that because the Nvidia chip doesn't have a hardware tessellation unit, it performs more slowly. Similarly graphics chips that don't have hardware T&L can still do it in software but it's just slower.

I agree in the long run Larrabee seems like the better option but for now, the dedicated chips are usable. Also if Larrabee comes out with 25 cores at 2.5GHz, Nvidia and ATI might still have faster solutions that support a generic enough programming model to accelerate tasks like Larrabee. Sure they only have stream processors but they don't necessarily have to have anything more than that. The majority of heavy computation is basic number crunching. Ray-tracing, physics processing, it's all number crunching. You can do real-time raytracing using Playstation 3 machines:

http://www.youtube.com/watch?v=oLte5f34ya8

http://www.youtube.com/watch?v=zKqZKXwop5E

The second video is with Cell-based servers but they all have SPUs designed more for media and graphics processing like GPUs. Geforce GTX 280:

http://www.tomshardware.com/reviews/...0,1953-24.html

Hardware media encoding is 8 times faster than itunes. Itunes isn't all that fast but neither is that software apparently. This can be where Quicktime X fits into the picture. If Apple can use the capabilities of Nvidia chips now, it's possible they could deliver major speed ups for consumer-level apps where people really notice it.

If Apple reveal OpenCL where the majority of the lineup uses low end graphics chips and the majority of buyers will own those then Snow Leopard won't make much of an impact. If it doesn't make an impact then people won't see a need to upgrade and the OpenCL standard will take much longer to gain any ground as developers won't see a need to develop for it. This has already happened with Leopard. I see far more apps that don't work on and aren't supported in Leopard than apps that don't work in Tiger.

Unless Snow Leopard offers great things and Apple start shipping consumer machines capable of taking advantage of them then it will take so much longer for us to actually see the benefits of these great developments.

Quote:

Originally Posted by wizard69

I just have very hard time believing that any tech that adjusts code dynamically for the hardware available will always lead to the fastest code. It might at times but ultimate performance needs custom crafted code. In a way OpenCL reminds me of Java with all the good and bad.

I think it will be ok. JIT compilers can do runtime optimization that otherwise wouldn't be possible. If it sees SSE4 support, it can dynamically optimize for that when needed and in the best way possible. It's just down to implementation, language structure etc. OpenCL is based in C and Apple will make sure the compilers work as expected as it will be at the core of the OS.

Packetized bytecode in OpenCL has two great implications: portable code (end of the platform war - Java didn't quite deliver there though so maybe not) and efficient handling of multiple processors. The compilers do much more work in making your code efficient for you. Automatic Altivec was supposed to do this too of course and failed because the automatic compiler didn't support most types of coding but again Apple are in control of this now and they will benefit directly so they will put in far more effort to make sure it works.

Quote:

Originally Posted by wizard69

ray tracing on the iPod Touch

I think you just topped my fantasy level. These chips that do real-time raytracing are very power-hungry chips. If Apple or anyone could make chips do raytracing at the power requirements of the ipod, something would have been said before now. I'm sure some day this will happen but not in the short term.

This tech is really about being able to use processing units we already have but may be going unused. For example, some people I work with have MBPs with 8600M GT chips. They do absolutely nothing that requires more than integrated graphics so all those 32 GPU cores are being wasted even when they are doing video encoding. The mobile devices don't have quite the same spare cores lying around, they are pretty much built specially to use everything as efficiently as possible already. there could be room for improvement of course but nowhere near as drastic.

Quote:

Originally Posted by wizard69

To be honest I don't see Apple focusing on just one vendor anytime soon. They would simply loose to much leverage and flexibility.

Switching to Nvidia doesn't imply any limitations, present or future. They are free to switch. It seems to me that Nvidia's chips would be best to show OpenCL in its best light until Larrabee arrives, that's all. They can pick whichever chips do the best job but I don't see Intel's offerings on the table for at least a year and I have doubts that the first release will be enough if it's too power hungry, at least not for the majority of Apple's products, which are all mobile chipsets.

What we know is that Snow Leopard will introduce OpenCL, I'd agree likely WWDC 2009. Larrabee won't reach it's first incarnation until after this. The success of Snow Leopard depends on the benefits to consumers and developers. If we don't have the hardware (i.e good GPUs) for Snow Leopard to work well then it will take a long time before it's adopted leaving the door open for the competition to come in with rival standards.

If Apple start adopting better GPUs across the lineup ASAP, it means it's more likely consumers will see a big difference in Snow Leopard, unlike they saw in Leopard. Clearly it's a false assumption that consumers only care about features. In my experience, performance usually always tops everything. Features are important but it's second at best. This move means better value for money, better end user experience, more competitive with PCs, Apple can do more to push their software forward. If Quicktime X delivers hardware encoding using standard GPUs, guess what formats are going to be more widely adopted.

Media encode guy goes to encode WMV. It takes 2 hours to process. Quicktime X encodes to H264 in 1/10th the time. It's not that simple but it will change some people's workflows.

programmer · August 9, 2008 8:06PM

Quote:

Originally Posted by wizard69

OpenCL:

This is / will be a nice technology but I believe it is mistake to believe it is the route to ultimate performance. I just have very hard time believing that any tech that adjusts code dynamically for the hardware available will always lead to the fastest code. It might at times but ultimate performance needs custom crafted code. In a way OpenCL reminds me of Java with all the good and bad.

Little is known about OpenCL, but from what can be surmised there are 2 important aspects to it. The first is that it is a language for doing data parallelism. Second it is JIT. It will not be like Java, it will be more like GLSL/HLSL. Those languages revolutionized graphics. OpenCL is a language for getting higher performance than you can get from C++, and for less effort. C++ is a terrible language for concurrency, it is time for something better.

mdriftmeyer · August 9, 2008 10:42PM

OpenCL being hosted by Khronos.org [OpenGL and the rest] has the necessary industry endorsements and committee to make it soon sit side by side that of OpenGL and the rest.

marvin · August 10, 2008 5:30AM

Quote:

Originally Posted by mdriftmeyer

OpenCL being hosted by Khronos.org [OpenGL and the rest] has the necessary industry endorsements and committee to make it soon sit side by side that of OpenGL and the rest.

I still wonder if it will work along with OpenGL or replace it. After all, OpenGL commands would benefit from the same parallelism and some may execute faster using the CPU - multi-threaded OpenGL is an example.

I guess we'll find out some more after Wednesday:

http://www.khronos.org/news/events/d...es_california/

They will talk about OpenGL 3 and OpenCL.

AMD announced support for both OpenCL and DirectX 11:

http://www.eweek.com/c/a/Desktops-an...ft-DirectX-11/

"AMD, which announced official support for both OpenCL and DirectX 11 on Aug. 6, will begin upgrading its Stream SDK (software development kit) during the next 18 months to help developers take advantage of GPU acceleration when building new types of applications."

That's a long time. AMD and NVidia both have GPGPU solutions. AMD's started as CTM (close-to-metal) as an assembler toolkit. Then they released their Stream SDK after CUDA similarly as a C-based toolkit. It seems to be more open than CUDA too.

The more articles I read, it seems ATI would be a better bet for Apple than Nvidia.

http://techreport.com/articles.x/14968

Then again, it seems to have more limitations currently:

http://forums.amd.com/forum/messagev...&enterthread=y

I guess it will depend on which gives the best performance for OpenCL. Some reports of real-time raytracing with CUDA:

http://forums.nvidia.com/index.php?s...=&#entry422937

The PA Semi purchase is for ipods/iphones according to Jobs:

http://bits.blogs.nytimes.com/2008/0...d-upside-down/

Bear in mind, the real-time raytracing thing is not a deal-breaker if a GPU can't do it. Raytracing just happens to be one of the most computationally expensive tasks a computer can do. Even Pixar didn't start using it in production until Cars 2 years ago. Real-time is only needed for games; for film, it can easily be much less than real-time.

Edit:

Quad core CPUs are coming this month:

http://www.pcadvisor.co.uk/news/index.cfm?newsid=13717

I reckon we'll see a QX9300 in the MBP so this would still warrant the higher price even if they do put a dedicated GPU in the Macbook.

Also, the ship date for Snow Leopard is on Apple's site:

http://www.apple.com/macosx/snowleopard/

"Snow Leopard — scheduled to ship in about a year"

So it should appear at WWDC 2009. At least that'll give them a chance to establish more capable hardware before the OS is available. They can fit another two generations of hardware in before it arrives.

programmer · August 10, 2008 9:40AM

Quote:

Originally Posted by Marvin

I still wonder if it will work along with OpenGL or replace it. After all, OpenGL commands would benefit from the same parallelism and some may execute faster using the CPU - multi-threaded OpenGL is an example.

The two will co-exist as they have different purposes. OpenGL can be threaded, and Apple's implementation on the MacPro is already. OGL3 may include improvements to make that more efficient, but it won't be replaced by OCL.

wizard69 · August 10, 2008 2:54PM

Quote:

Originally Posted by Programmer

Little is known about OpenCL, but from what can be surmised there are 2 important aspects to it. The first is that it is a language for doing data parallelism. Second it is JIT. It will not be like Java, it will be more like GLSL/HLSL. Those languages revolutionized graphics. OpenCL is a language for getting higher performance than you can get from C++, and for less effort. C++ is a terrible language for concurrency, it is time for something better.

I agree that C++ leaves a lot to be desired moving into a heavily multi threaded world. You are also correct in the thought that we really don't know much about OpenCL at least I don't. In any event I imagine an environment that allows one code base to be dynamicallly optimized for the hardware available at run time.

Now there are good points and bad points there. Number one good point being that your code can leverage hardware you might not even know about. The number one bad point for me is that this sort of reminds me of Java and the highly variable performance there. Certainly things are improving with respect to dynamic execution but I don't see it as the avenue to raw performance. It is certainly the avenue to much better performance at little cost or effort as let's face it 32 specialized processor availabe for a task can be compellng. Who knows though maybe my impression of what OpenCL is is a bit off.

Frankly I'm still hoping that Apple adds of own special sauce to the mix in the form of a specialized co processor modeled somewhat on AltVec. Of course that would be a vector processor for modern times and as such highly optimized for parallel execution on a number of cores. If the hardware is made as open and accessible as AltVec of old I see the potential for a lot of interest. If nothing else it offers up a regular and open platform that GPUs don't. In other words I see GPU hardware as being to variable and limited for a lot of usages.

As a side not somebody commented above that hardware ray tracing acceleration on a IPod like device is to power hungery. That I'm not to sure about. It might be a bit of a mind set change in the reordering of graphics functions but I think it is possible at a reasonable expenditure of power. In any event it will be interesting to see what Apples grand vision is in a few months time when the details become available.

Dave

futurepastnow · August 10, 2008 4:10PM

Quote:

Originally Posted by wizard69

Frankly I'm still hoping that Apple adds of own special sauce to the mix in the form of a specialized co processor modeled somewhat on AltVec. Of course that would be a vector processor for modern times and as such highly optimized for parallel execution on a number of cores. If the hardware is made as open and accessible as AltVec of old I see the potential for a lot of interest. If nothing else it offers up a regular and open platform that GPUs don't. In other words I see GPU hardware as being to variable and limited for a lot of usages.

But why spend the power, cooling, and material cost on something like that when there's a GPU sitting next to it, with GPGPU instructions available?

wizard69 · August 10, 2008 7:40PM

Quote:

Originally Posted by FuturePastNow

But why spend the power, cooling, and material cost on something like that when there's a GPU sitting next to it, with GPGPU instructions available?

Simple really, as stated GPUs are way to limited in what they are capable of doing and I suspect will remain that way for some time. I just don't see GPUs morphing into what would be a more general purpose specialize co processor. If that makes sense at all. A co processor needs to be able to live in harmony with the main CPU. That means address space, I/O and such. A GPU only is usefull for a limited class of problems.

Dave

mdriftmeyer · August 10, 2008 11:34PM

Quote:

Originally Posted by Marvin

I still wonder if it will work along with OpenGL or replace it. After all, OpenGL commands would benefit from the same parallelism and some may execute faster using the CPU - multi-threaded OpenGL is an example.

I guess we'll find out some more after Wednesday:

http://www.khronos.org/news/events/d...es_california/

They will talk about OpenGL 3 and OpenCL.

AMD announced support for both OpenCL and DirectX 11:

http://www.eweek.com/c/a/Desktops-an...ft-DirectX-11/

"AMD, which announced official support for both OpenCL and DirectX 11 on Aug. 6, will begin upgrading its Stream SDK (software development kit) during the next 18 months to help developers take advantage of GPU acceleration when building new types of applications."

That's a long time. AMD and NVidia both have GPGPU solutions. AMD's started as CTM (close-to-metal) as an assembler toolkit. Then they released their Stream SDK after CUDA similarly as a C-based toolkit. It seems to be more open than CUDA too.

The more articles I read, it seems ATI would be a better bet for Apple than Nvidia.

http://techreport.com/articles.x/14968

Then again, it seems to have more limitations currently:

http://forums.amd.com/forum/messagev...&enterthread=y

I guess it will depend on which gives the best performance for OpenCL. Some reports of real-time raytracing with CUDA:

http://forums.nvidia.com/index.php?s...=&#entry422937

The PA Semi purchase is for ipods/iphones according to Jobs:

http://bits.blogs.nytimes.com/2008/0...d-upside-down/

Bear in mind, the real-time raytracing thing is not a deal-breaker if a GPU can't do it. Raytracing just happens to be one of the most computationally expensive tasks a computer can do. Even Pixar didn't start using it in production until Cars 2 years ago. Real-time is only needed for games; for film, it can easily be much less than real-time.

Edit:

Quad core CPUs are coming this month:

http://www.pcadvisor.co.uk/news/index.cfm?newsid=13717

I reckon we'll see a QX9300 in the MBP so this would still warrant the higher price even if they do put a dedicated GPU in the Macbook.

Also, the ship date for Snow Leopard is on Apple's site:

http://www.apple.com/macosx/snowleopard/

"Snow Leopard — scheduled to ship in about a year"

So it should appear at WWDC 2009. At least that'll give them a chance to establish more capable hardware before the OS is available. They can fit another two generations of hardware in before it arrives.

Note the Speaker for the OpenCL Presentation: Aaftab Munshi

http://www.amazon.com/OpenGL-ES-2-0-...429034&sr=11-1

Quote:

Aaftab Munshi is the spec editor for the OpenGL ES 1.1 and 2.0 specifications. Now at Apple, he was formerly senior architect in ATI’s handheld group.

marvin · August 11, 2008 12:17PM

Quote:

Originally Posted by wizard69

Simple really, as stated GPUs are way to limited in what they are capable of doing and I suspect will remain that way for some time. I just don't see GPUs morphing into what would be a more general purpose specialize co processor. If that makes sense at all. A co processor needs to be able to live in harmony with the main CPU. That means address space, I/O and such. A GPU only is usefull for a limited class of problems.

But you're saying to get round the issue of inflexible GPUs, the solution is to use even more limited Apple-exclusive co-processors. In the short term I can see this - for example if they need to accelerate vector graphics and the way they do it will never change then certainly a co-processor could be the best solution but in the long term, using generalized processors is the way forward. GPUs won't always be specialized and have been moving away from this for a while now.

Specialized chips take up space and so say you make one that accelerates video encoding/decoding, one that does vector graphics. On top of that, you still have the CPU and GPU and when you aren't doing anything that uses the specialized chips, that's wasted space. It's a far better approach to simply have a minimal set of chips handle everything so they are always in use. The ideal situation being that you have one chip that makes any piece of software as fast as possible and that's where Larrabee is aiming eventually.

Quote:

Originally Posted by mdriftmeyer

Note the Speaker for the OpenCL Presentation: Aaftab Munshi

Neil Trevett is also going to talk about OpenCL and the implications it has for OpenGL. He works on embedded content at NVidia.

It's good to see that Apple are hiring people like Munshi who are so directly linked to setting open industry standards. They hired the guy behind the CUPs printing along with buying the source code. The more they do this and finally partner with industry heavyweights, the stronger they will get over time.

Intel Chipsets all the Way with OpenCL

Comments