Intel Chipsets all the Way with OpenCL

wizard69 · August 11, 2008 7:39PM

Quote:

Originally Posted by Marvin

But you're saying to get round the issue of inflexible GPUs, the solution is to use even more limited Apple-exclusive co-processors.

Oh but just the opposite. If Apple makes the facility as open as Alt-Vec it would be far from unaccessible and really not Apple exclusive anymore. Contrast this with the average GPU which is not all that accessible even if you have documentation. Combine that accessibility with a high level interface like OpenCL and there would be very little reason for Apple developers to not use the facility.

Quote:

In the short term I can see this - for example if they need to accelerate vector graphics and the way they do it will never change then certainly a co-processor could be the best solution but in the long term, using generalized processors is the way forward.

The current crop of GPU's are far form general purpose processors. In reality they are highly specialized. I mean really how many GPU's can sit as an equal on a processors I/O bus and work transparently in the processors native address space. I use the term co-processor so that it is explicitly understood that the unit has equal access with respect to the main processor.

Quote:

GPUs won't always be specialized and have been moving away from this for a while now.

In the long run does that make sense though?

Quote:

Specialized chips take up space and so say you make one that accelerates video encoding/decoding, one that does vector graphics. On top of that, you still have the CPU and GPU and when you aren't doing anything that uses the specialized chips, that's wasted space.

Does it really make sense to load down your GPU" card with a raw data stream to be encoded into a particular video format. Especially when that video card may be heavily involved in other processing that is more clearly its domain. Plus you are making the assumption that future GPU's will go the way of individual processors as capable as the vector units in Cell. I don't see the rationalization for this. As graphics demands continue to increase the GPU will still have trouble keeping up. Mean while the world has a whole host of applications that can make use of a set of strong vector processors.

Not to bring it up again but look at Cell and Sony's Playstation. Sony didn't say hey we have one of the most powerful GPU's going we don't need a vector facility or any other additional capability beyond they core processor. Nope, instead they realized that the GPU has a certain duty to perform that it is optimized for. Then they looked at what might be considered a good solution to the types of problems their developers would be faced with. What they implemented was a reasonable general purpose vector processor. A good general purpose attack on a certain class of code.

Now do I want to see Apple emulate cells vector units exactly - not at all. What I want is a modern implementation that is flexible in the sense of programmer access and is optimized for a class of applications that are at the core of Apples user interests.

Quote:

It's a far better approach to simply have a minimal set of chips handle everything so they are always in use. The ideal situation being that you have one chip that makes any piece of software as fast as possible and that's where Larrabee is aiming eventually.

There never will be an all knowing universal processor. There are just to many things that can benefit from specialization. The problem with specialization of course is being able to develop and deliver it. In this regard Apple is just about the only company that can do so. If the vector processor contains optimizations for Ray Tracing I would say they are the only company right now that can deliver.

In any event if Apple where to implement a scalable flexible Vector processor they would most certainly be moving to a minimization. That would come from Apple having a universal facility to implement against. It would also mean more flexibility for Apple in what they can offer up for the raster processor. The simple reality is that Larrabee sucks from the standpoint that it is not flexible enough to deliver across all of Apples hardware platforms

Quote:

Neil Trevett is also going to talk about OpenCL and the implications it has for OpenGL. He works on embedded content at NVidia.

It's good to see that Apple are hiring people like Munshi who are so directly linked to setting open industry standards. They hired the guy behind the CUPs printing along with buying the source code. The more they do this and finally partner with industry heavyweights, the stronger they will get over time.

I'm somewhat impressed with Apples approach to open systems and open source. They certainly have sponsored a lot more than they get credit for. Frankly a lot of people/organizations have benefitted from their efforts with GCC and a host of other projects.

Still it isn't really clear what and where they are going with OpenCL. That to some extent is to be expected as they don't want to give away their hardware development direction. Fortunately Apples secrecy provides for lots of fuel for forum discussions

Dave

mdriftmeyer · August 12, 2008 12:14AM

Quote:

Originally Posted by wizard69

Oh but just the opposite. If Apple makes the facility as open as Alt-Vec it would be far from unaccessible and really not Apple exclusive anymore. Contrast this with the average GPU which is not all that accessible even if you have documentation. Combine that accessibility with a high level interface like OpenCL and there would be very little reason for Apple developers to not use the facility.

The current crop of GPU's are far form general purpose processors. In reality they are highly specialized. I mean really how many GPU's can sit as an equal on a processors I/O bus and work transparently in the processors native address space. I use the term co-processor so that it is explicitly understood that the unit has equal access with respect to the main processor.

In the long run does that make sense though?

Does it really make sense to load down your GPU" card with a raw data stream to be encoded into a particular video format. Especially when that video card may be heavily involved in other processing that is more clearly its domain. Plus you are making the assumption that future GPU's will go the way of individual processors as capable as the vector units in Cell. I don't see the rationalization for this. As graphics demands continue to increase the GPU will still have trouble keeping up. Mean while the world has a whole host of applications that can make use of a set of strong vector processors.

Not to bring it up again but look at Cell and Sony's Playstation. Sony didn't say hey we have one of the most powerful GPU's going we don't need a vector facility or any other additional capability beyond they core processor. Nope, instead they realized that the GPU has a certain duty to perform that it is optimized for. Then they looked at what might be considered a good solution to the types of problems their developers would be faced with. What they implemented was a reasonable general purpose vector processor. A good general purpose attack on a certain class of code.

Now do I want to see Apple emulate cells vector units exactly - not at all. What I want is a modern implementation that is flexible in the sense of programmer access and is optimized for a class of applications that are at the core of Apples user interests.

There never will be an all knowing universal processor. There are just to many things that can benefit from specialization. The problem with specialization of course is being able to develop and deliver it. In this regard Apple is just about the only company that can do so. If the vector processor contains optimizations for Ray Tracing I would say they are the only company right now that can deliver.

In any event if Apple where to implement a scalable flexible Vector processor they would most certainly be moving to a minimization. That would come from Apple having a universal facility to implement against. It would also mean more flexibility for Apple in what they can offer up for the raster processor. The simple reality is that Larrabee sucks from the standpoint that it is not flexible enough to deliver across all of Apples hardware platforms

I'm somewhat impressed with Apples approach to open systems and open source. They certainly have sponsored a lot more than they get credit for. Frankly a lot of people/organizations have benefitted from their efforts with GCC and a host of other projects.

Still it isn't really clear what and where they are going with OpenCL. That to some extent is to be expected as they don't want to give away their hardware development direction. Fortunately Apples secrecy provides for lots of fuel for forum discussions

Dave

Don't forget such Vector units allow Apple to show compelling reasons for companies like ANSYS, ALGOR, PTC's PRO/Engineer and more.

Hell, if Apple has something extremely compelling from the handheld to the workstation even NASA would find it compelling at SIGGRAPH.

marvin · August 12, 2008 3:45AM

Quote:

Originally Posted by wizard69

Does it really make sense to load down your GPU" card with a raw data stream to be encoded into a particular video format. Especially when that video card may be heavily involved in other processing that is more clearly its domain.

When you do video encoding, you'd just be drawing your main display so it should handle it ok. It makes sense to use it when considering how many people already have one and aren't using it to its full advantage - Apple do prefer people to buy new hardware though.

Quote:

Originally Posted by wizard69

Now do I want to see Apple emulate cells vector units exactly - not at all. What I want is a modern implementation that is flexible in the sense of programmer access and is optimized for a class of applications that are at the core of Apples user interests.

I see where you're going now. I guess if they had something similar to what Sony have in the Cell and those processors could be used easily with OpenCL, it would probably be better than trying to use a GPU.

The Cell processor is very fast as demonstrated on the PS3s doing raytracing and it's very power efficient. Documentation said it draws 30W, which is ridiculously low for that kind of processing.

The question is could Apple develop a chip like this with low power draw and yet fast enough to deliver a beneficial performance improvement and still be cost effective? I don't think they can. Toshiba have made one though, so perhaps they will outsource it:

http://news.softpedia.com/news/Toshi...PU-66416.shtml

The QX9300 quad mobile CPU release date is posted as August 17th so I imagine we'll get new laptops after that.

The OpenGL 3 spec is out:

http://www.khronos.org/news/press/re...enerations_of/

No real info on OpenCL other than the groups are working closely to ensure the best interoperability. There should be more info tomorrow though.

"OpenGL 3.0 sets the stage for a revolution to come ? we now have the roadmap machinery and momentum in place to rapidly and reliably develop OpenGL - and are working closely with OpenCL to ensure that OpenGL plays a pivotal role in the ongoing revolution in programmable visual computing."

That wording makes it sound like they are concerned OpenGL will be made obsolete.

mdriftmeyer · August 12, 2008 3:28PM

Quote:

Originally Posted by Marvin

When you do video encoding, you'd just be drawing your main display so it should handle it ok. It makes sense to use it when considering how many people already have one and aren't using it to its full advantage - Apple do prefer people to buy new hardware though.

I see where you're going now. I guess if they had something similar to what Sony have in the Cell and those processors could be used easily with OpenCL, it would probably be better than trying to use a GPU.

The Cell processor is very fast as demonstrated on the PS3s doing raytracing and it's very power efficient. Documentation said it draws 30W, which is ridiculously low for that kind of processing.

The question is could Apple develop a chip like this with low power draw and yet fast enough to deliver a beneficial performance improvement and still be cost effective? I don't think they can. Toshiba have made one though, so perhaps they will outsource it:

http://news.softpedia.com/news/Toshi...PU-66416.shtml

The QX9300 quad mobile CPU release date is posted as August 17th so I imagine we'll get new laptops after that.

The OpenGL 3 spec is out:

http://www.khronos.org/news/press/re...enerations_of/

No real info on OpenCL other than the groups are working closely to ensure the best interoperability. There should be more info tomorrow though.

"OpenGL 3.0 sets the stage for a revolution to come – we now have the roadmap machinery and momentum in place to rapidly and reliably develop OpenGL - and are working closely with OpenCL to ensure that OpenGL plays a pivotal role in the ongoing revolution in programmable visual computing."

That wording makes it sound like they are concerned OpenGL will be made obsolete.

Considering PA Semi's muti-core PowerPCs drew considerably less power it becomes clear Apple has plans for that IP.

wizard69 · August 12, 2008 7:05PM

Quote:

Originally Posted by Marvin

The Cell processor is very fast as demonstrated on the PS3s doing raytracing and it's very power efficient. Documentation said it draws 30W, which is ridiculously low for that kind of processing.

The question is could Apple develop a chip like this with low power draw and yet fast enough to deliver a beneficial performance improvement and still be cost effective? I don't think they can. Toshiba have made one though, so perhaps they will outsource

But this is exactly what I'm saying Apple does have this capability. It is a combination of in house AltVec experience and the not so modest skills that reside at P A Semi. Remember the employees of PA have extremely broad back grounds in chip design, it is not just PPC but just about every recent processor ISA.

As to power usage that can be managed well and if implemented as part of a SOC might even save power. In the case of iPod you only need the raw power required to implement a reasonable fast decoder. There is nothing that would keep Apple from deploying a 400MHz unit in an iPod and a 4 GHz unit in Mac Pro. From the programmers stand point his code simply runs faster on the Pro. This is pretty much to be expected.

Well if we are to believe Apple, PA was purchased to finish off a project for Apple. Further it has been stated that the chips are iPod related. Now that leads to the sort of speculation we see in this thread. The question becomes just what added value could PA put into an ARM based device like IPod. Frankly everybody and their brother is in the ARM SOC market trying to supply the traditional I/O. For Apple to even bother with PA the would need to be implementing unique tech. This my blathering about a vector unit. It is not to difficult to see such tech being applied product wide especially if designed to be scalable from the beginning.

Also to touch upon another concern about cost, it could be an issue if targetted at one device. Apple wouldn't do that and if the aggreate possible units taken into account is hardly a concern. Apple already ships IPods with a dedicated decoder so this sort of upgrade only opens the platform up more.

Dave

marvin · August 13, 2008 3:35AM

Quote:

Originally Posted by wizard69

But this is exactly what I'm saying Apple does have this capability. It is a combination of in house AltVec experience and the not so modest skills that reside at P A Semi. Remember the employees of PA have extremely broad back grounds in chip design, it is not just PPC but just about every recent processor ISA.

It would have to be x86 architecture I imagine. Unless Apple can manage to use different architectures on the fly - after all, OpenCL generates bytcode.

http://www.theregister.co.uk/2006/05/17/pasemi_core_ti/

Dual Core 2GHz PPC at 7 Watts in 2006. Given that the 8600M GT not underclocked has 32 stream processors at 950MHz and runs at 22W max, Apple would aim to get below that. As they say:

"It will make sense to have more lower performance cores"

http://gizmodo.com/382929/apple-buys...ent-processors

"ARM, designer of the current iPhone chip, is boasting that they can do a 0.25 watt A9 chip with multicores at 1GHz"

Assuming PA Semi just equalled that and it scaled linearly, they would be able to do 32 x 1GHz chips running at 8 Watts. More generic cores than stream processors but using less than half the power.

Without OpenCL though, I doubt they'd be able to use them properly so I doubt we'll be seeing them in Mac updates coming soon. Graphics cards can currently be used with and without OpenCL so that would be the best route to go at this point in time without expecting people to buy new hardware next year simply to take better advantage of the new software developments.

mdriftmeyer · August 13, 2008 4:07AM

Quote:

Originally Posted by Marvin

It would have to be x86 architecture I imagine. Unless Apple can manage to use different architectures on the fly - after all, OpenCL generates bytcode.

http://www.theregister.co.uk/2006/05/17/pasemi_core_ti/

Dual Core 2GHz PPC at 7 Watts in 2006. Given that the 8600M GT not underclocked has 32 stream processors at 950MHz and runs at 22W max, Apple would aim to get below that. As they say:

"It will make sense to have more lower performance cores"

http://gizmodo.com/382929/apple-buys...ent-processors

"ARM, designer of the current iPhone chip, is boasting that they can do a 0.25 watt A9 chip with multicores at 1GHz"

Assuming PA Semi just equalled that and it scaled linearly, they would be able to do 32 x 1GHz chips running at 8 Watts. More generic cores than stream processors but using less than half the power.

Without OpenCL though, I doubt they'd be able to use them properly so I doubt we'll be seeing them in Mac updates coming soon. Graphics cards can currently be used with and without OpenCL so that would be the best route to go at this point in time without expecting people to buy new hardware next year simply to take better advantage of the new software developments.

I wouldn't be surprised if the upcoming laptops have these capabilities, just not leveraged to their full extent until 10.6.

marvin · August 14, 2008 3:26AM

It looks like ATI are going all the way with OpenCL:

http://www.tomshardware.com/news/AMD...PGPU,6072.html

A product should be out in the first quarter of next year.

Ditching their proprietary controls is a big step forward and shows that OpenCL will be able to use graphics cards without proprietary means. That article says they will release an APU too in the middle of next year.

There are reports of the same technology used in the real-time ray-tracing thing from Jules World, which I think was based on AMD/ATI products to give Second Life a bit of an overhaul:

http://www.techcrunch.com/2008/07/09...ng-technology/

Pretty impressive. That's all server-side though so no client rendering. Maybe this is the way games will eventually go. If they can stream the video fast enough, you can have people playing on a mobile phone or a workstation and they'll get the same quality of graphics.

No more distribution either, you simply buy a license to play in the game online and they can issue immediate updates on encountering a bug.

programmer · August 14, 2008 9:58AM

Quote:

Originally Posted by Marvin

But you're saying to get round the issue of inflexible GPUs, the solution is to use even more limited Apple-exclusive co-processors. ... Specialized chips take up space and so say you make one that accelerates video encoding/decoding, one that does vector graphics. ... The ideal situation being that you have one chip that makes any piece of software as fast as possible and that's where Larrabee is aiming eventually.

I don't think you understand what we are talking about. We are not advocating fixed function chips that handle specific operations (e.g. video codec). Instead the idea is to have processors that are much closer to general purpose but oriented toward data parallel operations, instead of scalar operations like the main CPU is. Current GPUs have some programmability which makes them more flexible than fixed function parts, but the nature of their programmability is very limited because it is encased in fixed-function graphics hardware. Every generation of GPU has been moving toward less fixed-function hardware, and Larrabee is the biggest step yet.

OpenCL would allow Apple to get into this end of the hardware design game as well. It doesn't create "bytecode" in the same way that Java does, it instead represents programs in a way that keeps more knowledge about them. This gives the JIT more opportunity to optimize using non-scalar hardware like GPUs and vector processors. Apple could thus incorporate data parallel processors into their chipsets, and use them to accelerate OpenCL. These processors would not need to be x86 compatible, nor would they be fixed function. And since they would run only OpenCL code they could be optimized for that, instead of needing to run normal scalar C++ code like the x86 does.

wizard69 · August 14, 2008 2:35PM

As programmer pointed out by vector processor we mean something that handles data parallel operations. There may be other optimizations for DSP operations which at times are closely related. What I could see Apple doing is offering up more hardwired features to accelerate things like video encode and decode at a low power point.

By doing so Apple can control the quality of the output via the design approaches they take. Same thing for ray tracing. Also not to be dismissed is all the demands by the BluRay organization if they ever want to deploy BluTay drives. If Apple wants to avoid mucking up their OS the way MS did then they need a hardware solution. All these design issues are related and could concievably be wrapped into one chip or function block.

Apple has all the IP and technical know how to go this route. The question of course is just what are they talking about. Looks like we will find out soon.

Dave

marvin · August 14, 2008 3:31PM

Quote:

Originally Posted by Programmer

I don't think you understand what we are talking about. We are not advocating fixed function chips that handle specific operations (e.g. video codec). Instead the idea is to have processors that are much closer to general purpose but oriented toward data parallel operations, instead of scalar operations like the main CPU is.

The others mentioned Apple processors, you were talking about Larrabee. Larrabee is clearly a general purpose processor but for Apple to develop a processor implies a number of things. Whether they have the resources to develop a chip that is advanced enough to be used as a general purpose processor and that it is cost-effective enough.

If they can't make it general purpose enough then there's likely no real advantage over simply using decent graphics cards.

Quote:

Originally Posted by wizard69

What I could see Apple doing is offering up more hardwired features to accelerate things like video encode and decode at a low power point.

Video algorithms are quite varied so I still think using a generalized processor is better for this - Blu-Ray will be a standard for a while though so it might not matter. The GTX 280 NVidia chip can currently encode as much as 10 times faster than the fastest Intel CPU:

http://www.anandtech.com/video/showdoc.aspx?i=3339&p=2

"Elemental's software, if it truly performs the way we've seen here, has the potential to be a disruptive force in both the GPU and CPU industries. On the GPU side it would give NVIDIA hardware a significant advantage over AMD's GPUs, and on the CPU side it would upset the balance between NVIDIA and Intel. Video encoding has historically been an area where Intel's CPUs have done very well, but if the fastest video encoder ends up being a NVIDIA GPU - it could mean that video encoding performance would be microprocessor agnostic, you'd just need a good NVIDIA GPU.

If you're wondering why Intel is trying to launch Larrabee next year, this is as good of a consumer example as you're going to get."

Larrabee-type GPGPUs will be the most flexible when they arrive but until then, I reckon the best move is to stick with current GPUs from NVidia or ATI. If Apple can make a good, generalized co-processor and still manage to narrow down their profit margins, that would be great. I just doubt they'll be able to do it and although they could bundle hardware that is redundant until 10.6, I think a better move would be to use good GPUs now and that adds a selling point for current hardware and then after 10.6, an Apple processor would sell the hardware then or if Larrabee is good enough, just add that in.

programmer · August 15, 2008 9:50AM

Quote:

Originally Posted by Marvin

The others mentioned Apple processors, you were talking about Larrabee. Larrabee is clearly a general purpose processor but for Apple to develop a processor implies a number of things. Whether they have the resources to develop a chip that is advanced enough to be used as a general purpose processor and that it is cost-effective enough.

If they can't make it general purpose enough then there's likely no real advantage over simply using decent graphics cards.

Actually I think I floated the idea of an Apple designed vector processor first around here in another thread. And I haven't been talking about Larrabee specifically except where I mention it by name. Apple doesn't have to design a general purpose processor, they only have to design one that can accelerate OpenCL effectively. They also have the option of licensing and enhancing a core from somewhere else... e.g. take an ARM9 core and add SIMD, or license the Cell SPU from IBM. Other examples designed by much much smaller companies are our there if you look. I believe they have more than enough expertise internally (and in PA Semi) to design a vector core. The advantages of doing this are that they don't have to work around the limitations & variations of ATI/nVidia GPUs, they can integrated it into an Apple-design chipset (which has high speed access to main memory), and it would be theirs and theirs alone... but everybody can code for it using OpenCL.

wizard69 · August 15, 2008 11:49AM

The other issue that seems to not be considered when looking at Apples in house capabilities is their involment in what became AltVec. Apple has a lot more in house skill than many in this thread seem to want to give them. Combine that with PA and their is literially nothing they couldn't do.

Of course I'm this thread we are more or less dreaming about what we would like to see Apple do. The problem is we could be very much blowing Apples public statements way out of proportion. Still I wait for the next iPods to hit as I suspect this is where Apple really can leverage a custom designed SOC. Across the board custom chips are also possible for a number of reasons but I'm slightly less positive here. Of all the reasons expressed here for Apple to support custom chips I still think the issue of supporting BluRay in a way that doesn't muck up the system is huge.

Since demand for BluRay support will only increase Apple will have to address that. Knowing Apple they aren't going to through a big design effort into that BluRay support and not get something more general out of it. In part this is what motivates my imagination Apples interest I'm general and open solutions.

Dave

marvin · August 15, 2008 1:30PM

Quote:

Originally Posted by Programmer

Apple doesn't have to design a general purpose processor, they only have to design one that can accelerate OpenCL effectively. They also have the option of licensing and enhancing a core from somewhere else... e.g. take an ARM9 core and add SIMD, or license the Cell SPU from IBM.

This means extra cost though. It could be that Apple's strategy of reducing profit margins is to allow them to add in more components and keep the price the same but I don't see why they'd do that when they can't use the hardware until 10.6, which is still just under a year away. By that time Larrabee could be ready or about to be released, so unless an Apple processor is better than what Intel are offering, it's just extra cost.

Just being able to 'accelerate OpenCL' doesn't mean anything as you don't know what form OpenCL code will take. ATI have had to build new chips to support it and have released one:

http://www.amd.com/us-en/Corporate/V...126593,00.html

This can go in the new Mac Pro in October alongside the Nehalem Core i7 processors. I doubt that an Apple processor could rival a high end GPU like that, which is already OpenCL compatible.

This covers developers pretty well and since OpenCL is yet to be released, the mobile machines can get by with CUDA until then:

http://www.nvidia.com/object/cuda_get.html

All the Nvidia chips Apple use are supported. CUDA and OpenCL have a lot in common:

"How is CUDA different from GPGPU?

CUDA is designed from the ground-up for efficient general purpose computation on GPUs. It uses a C-like programming language and does not require remapping algorithms to graphics concepts. CUDA is an extension to C for parallel computing. It allows the programmer to program in C, without the need to translate problems into graphics concepts. Anyone who can program C can swiftly learn to program in CUDA."

http://benchmarkreviews.com/index.ph...1&limitstart=1

It's not too far fetched to consider that CUDA and Brook+ code could be simply adapted to work on OpenCL devices. ATI and NVidia are both members of the Khronos group backing OpenCL and representatives from both companies as well as Apple spoke on it at siggraph. They are not rival standards but a progression to the same goal.

Quote:

Originally Posted by wizard69

Still I wait for the next iPods to hit as I suspect this is where Apple really can leverage a custom designed SOC.

Maybe one made by Nvidia?

http://www.reghardware.co.uk/2008/06...aunches_tegra/

"Nvidia said devices equipped with the Tegra series of SoCs to debut "late 2008", but comments made by executives suggest it's hoping manufacturers will have product out in time for Christmas."

mdriftmeyer · August 15, 2008 2:38PM

http://www.hothardware.com/News/NVID...acing-on-GPUs/

Image Links are active to full-size:

[CENTER]

NVIDIA GPU Ray Tracing at 2560x1600[/CENTER]

Quote:

At three bounces, performance is demonstrated at up to 30 frames per second (fps) at HD resolutions of 1920x1080 for an image-based lighting paint shader, ray traced shadows, reflections and refractions running on four next-generation Quadro GPUs in an NVIDIA Quadro Plex 2100 D4 Visual Computing System (VCS).

Is there room for APPLE to bring a game changer to the scene? Hell Yes, and it's not just OpenCL.

If you need 4 of those badboy Quadro GPUs, not yet released, to pull 30fps at 16:9 HD ray tracing there is something too brute force in the approach that staggers the mind.

Something new needs to be addressed and from a different approach, both on the hardware and the software algorithm designs that provides a revolutionary approach and not just a brute force evolutionary approach.

programmer · August 15, 2008 10:56PM

Quote:

Originally Posted by Marvin

This means extra cost though. It could be that Apple's strategy of reducing profit margins is to allow them to add in more components and keep the price the same but I don't see why they'd do that when they can't use the hardware until 10.6, which is still just under a year away. By that time Larrabee could be ready or about to be released, so unless an Apple processor is better than what Intel are offering, it's just extra cost.

Does it? Not necessarily. Through the miracle of integrated circuits, additional functionality can be included in the north/south bridge chip(s) for extremely small incremental costs (in dollars and watts). This could take the form of computational units that process data as the DMA engine already built into the chipset is moving the data.

Apple doesn't have to wait for OpenCL, either. They could deploy versions of QuickTime, OpenGL, Quartz, etc. that leverage the new hardware either directly or via not-yet-ready-for-3rd-party-developer versions of OpenCL.

Quote:

Just being able to 'accelerate OpenCL' doesn't mean anything as you don't know what form OpenCL code will take.

Apple has tons of experience writing AltiVec & SSE code. They (and other industry notables) have designed OpenCL to address a class of problems that is fairly well understood, which is how they can define this standard in the first place. This same knowledge can propel their hardware design.

Quote:

ATI have had to build new chips to support it and have released one:

That's because ATI (and nVidia) have been building GPUs.

Quote:

I doubt that an Apple processor could rival a high end GPU like that, which is already OpenCL compatible..... All the Nvidia chips Apple use are supported. CUDA and OpenCL have a lot in common:

Yes, OpenCL probably owes a lot from CUDA. The idea of CUDA was valid, the realization was problematic. It was a language for nVidia GPUs, but the industry needs something more widely applicable. That will be OpenCL.

I think you might be surprised by what Apple could achieve, particularly if they built it into the motherboard chipset. It is a long round trip out to the GPU and back, so shipping an OpenCL program and its data there & back introduces significant inefficiencies. And then the GPU nature of the ATI/nVidia devices still puts constraints and limitations on them that non-GPU processors may not have. And lastly, the GPU might be busy doing graphics.

wizard69 · August 16, 2008 9:58AM

Programmer covered most of the issues around the use of the GPU as a computational device but I will highlight a couple of other concerns.

First is that the computational units of a GPU really aren't optimized for general purpose computing. At best they do a lot of 32bit floating point really fast. Now I know thee are attempts in place to deliver 64bit floats and better control but one has to ask at what cost to the GPU in speed and power? Does it really make sense to saddle the GPU with features it doesn't need? Especially in the general case.

Second; as programmer alluded to the GPU is in a sense far from the main CPU. That leads to all sorts of issues largely based on the need to ship of computations to a foriegn land. Now if Apple where to deliver a vector unit that is a full co-processor without all these restrictions then the usability of such a facility increases dramatically. Admittedly this would be easier to implement on AMD hardware with HyperTransport. But that doesn't really matter as support in the chip set can accomplish much the same. The idea is that the co-processor has full access to program address space and can be scheduled by the OS in a fairly general way.

There still seems to be reluctance to believe Apple can pull off such a chip. I have no doubt they can, rather the issue is will they and what is the chips scope. Even a chipset modification that focuses on video acceleration and the legal issues around BluRay could offer Apple huge advantages. As stated before it can mean a stable way to deliver the IP while avoiding messing up the OS. Further it would mean just one driver across all products.

Lastly one has to realize that a vector processor is not a GPU. So trying to drape the functions of one unit on top of another is senseless. These sorts of facilities really need to evolve in their own directions. Vector units especially can benefit from evolution. For example supprt of 64 bit and greater floats and operations on very long vectors. Plus it never really hurts to hardwire more advance functions. This is certainly the anti RISC approach to computation but realities are different these days as there is lots of silicon available for just about anything.

Dave

programmer · August 16, 2008 10:12AM

Quote:

Originally Posted by wizard69

Admittedly this would be easier to implement on AMD hardware with HyperTransport. But that doesn't really matter as support in the chip set can accomplish much the same. The idea is that the co-processor has full access to program address space and can be scheduled by the OS in a fairly general way.

This is the biggest hole in this theory... Apple's Intel-based machines are just about to start transitioning to QuickPath. That means if they were to introduce a chipset of their own design it would have short lifespan before they'd have to replace the CPU interface. Plus they would need to get a license from Intel for QP (although I suspect they'd have an easier time of it than nVidia has). The timing just doesn't fit well. Doesn't mean they won't do it though... I've been waiting for Apple to do something like this for years.

marvin · August 16, 2008 10:54AM

Quote:

Originally Posted by mdriftmeyer

Is there room for APPLE to bring a game changer to the scene? Hell Yes, and it's not just OpenCL.

If you need 4 of those badboy Quadro GPUs, not yet released, to pull 30fps at 16:9 HD ray tracing there is something too brute force in the approach that staggers the mind.

Something new needs to be addressed and from a different approach, both on the hardware and the software algorithm designs that provides a revolutionary approach and not just a brute force evolutionary approach.

Real-time raytracing isn't necessary though - real-time would only be needed for games but games won't use ray-tracing for a while yet due to the extra requirements for anti-aliasing. It would likely be used more for film/CGI rendering, which can take as long as it needs - current CPUs take minutes per frame. GPUs are typically measured in frames per second. Even a few seconds per frame for post-pro quality is a big step forward.

The key is that it does it faster than just using the CPU. If a CPU is rendering raytracing on 8-cores only and the GPU can perform some of the calculations then it's wasted processing power. For some tasks, the GPU will rival the 8-core CPU and if you can use it, you've just cut your render time in half. That's a major selling point.

Check out the upcomng Nvidia GT200 series GPUs vs the itunes and premiere encoders:

http://www.youtube.com/watch?v=8C_Pj1Ep4nw

These are CPU vs GPU demos. Even the Geforce 8 series GPUs can help out using CUDA so it's CPU + GPU. The only concern is heat when it comes to the laptop hardware.

On the subject of ray-tracing there are algorithms being developed to improve it. Carmack talks about that here:

http://www.pcper.com/article.php?aid=532

Sparse voxel octree raytracing.

"The idea would be that you have to have a general purpose solution that can approach all sorts of things and is at least capable of doing the algorithms necessary for this type of ray tracing operation at a decent speed. I think it’s pretty clear that that’s going to be there in the next generation. In fact, years and years ago I did an implementation of this with complete software based stuff and it was interesting; it was not competitive with what you could do with hardware, but it’s likely that I’ll be able to put something together this year probably using CUDA."

This is an important point about ray-tracing in games:

"I think that we can have huge benefits completely ignoring the traditional ray tracing demos of “look at these shiny reflective curved surfaces that make three bounces around and you can look up at yourself”. That’s neat but that’s an artifact shader, something that you look at one 10th of 1% of the time in a game. And you can do a pretty damn good job of hacking that up just with a bunch environment map effects. It won’t be right, but it will look cool, and that’s all that really matters when you’re looking at something like that. We are not doing light transport simulation here, we are doing something that is supposed to look good."

Just look at Gran Tourismo - almost photographic at times - no ray-tracing:

http://www.youtube.com/watch?v=2ko_1lfgvA0

Same deal with Crysis. They have a whole paper about the techniques they used. Occlusion is probably the most commonly used technique for realistic lighting and you don't even need ray-tracing for that. Crysis uses screen space AO:

http://www.youtube.com/watch?v=zoRmoDjhpiY

The difference between ray-tracing and say environment mapping is basically that you get realistic effects and you can handle self-reflection. It takes away your control though. If you want to make an object reflect a particular set of imagery for artistic effect, you don't need the expense of raytracing. This is why you don't use ray-tracing for compositing reflecting CGI with film - there is no 3D environment to reflect so you just have to map it. You still use ray-tracing for AO etc though.

Quote:

Originally Posted by Programmer

I think you might be surprised by what Apple could achieve, particularly if they built it into the motherboard chipset. It is a long round trip out to the GPU and back, so shipping an OpenCL program and its data there & back introduces significant inefficiencies. And then the GPU nature of the ATI/nVidia devices still puts constraints and limitations on them that non-GPU processors may not have. And lastly, the GPU might be busy doing graphics.

Yeah, I agree that current GPUs aren't the ideal solution in the long term. If they can get some sort of processor together that doesn't increase costs significantly, doesn't generate too much heat and delivers a directed performance boost it could be far better than trying to tax a GPU that will just turn the fans up full blast all the time. The question in my mind is not about technical capability so much as feasibility given the time frames.

Apple didn't buy PA Semi that long ago. To design and outsource manufacturing of a custom built processor might be possible but with Larrabee coming in Summer 2009 or thereabouts, going this route would serve two things. Keeping Apple ahead of the competition and allowing developers to ready software for bigger things to come. As I say though developers can get the OpenCL compatible ATI firestream or probably any of the GT200 series chips and rest assured that their developments will work ok - in fact the Geforce 8 series and up would do. Here is an Nvidia rep talking about OpenCL and CUDA:

"CUDA is a highly tuned programming environment and architecture that includes compilers, software tools, and an API. For us, OpenCL is another API that is an entry point into the CUDA parallel-programming architecture. Programmers will use the same mindset and parallel programming strategy for both OpenCL and CUDA. They are very similar in their syntax, but OpenCL will be more aligned with OS X while CUDA is a based on standard C for a variety of platforms. We have designed our software stack so that to our hardware, OpenCL and CUDA code look the same. They are simply two different paths toward GPU-accelerated code.

"The CUDA and OpenCL APIs differ so code is not 100 percent compatible. However, they share similar constructs for defining data parallelism so the codes will be very similar and the porting efforts will be minor. The fact that CUDA is available today and is supported across all major operating systems, including OS X, means that developers have a stable, pervasive environment for developing gigaflop GPU applications now, that can be easily integrated with OS X via OpenCL when it is released.

"OpenCL actually utilizes the CUDA driver stack in order to deliver great performance on Nvidia GPUs. In the processor market, there are many different types of tools and languages. The parallel computing market will evolve similarly with many different types of tools. The developer gets to choose the development tools that best fills their needs."

http://www.bit-tech.net/hardware/200...cture-review/6

On the subject of GPU computation affecting graphics, typically you will do computation associated with what you are rendering and this doesn't impact performance significantly at all:

http://www.youtube.com/watch?v=yIT4lMqz4Sk

wizard69 · August 16, 2008 9:28PM

Quote:

Originally Posted by Programmer

This is the biggest hole in this theory... Apple's Intel-based machines are just about to start transitioning to QuickPath. That means if they were to introduce a chipset of their own design it would have short lifespan before they'd have to replace the CPU interface. Plus they would need to get a license from Intel for QP (although I suspect they'd have an easier time of it than nVidia has). The timing just doesn't fit well. Doesn't mean they won't do it though... I've been waiting for Apple to do something like this for years.

I have to agree the timing isn't good. On the other hand a key issue here is that Apple needs to start supporting some sort of hardware accelerated video play back and related needs to deliver a BluRay solution. Since Apple uses laptop chips in more than half it's machines they have a tiny bit more breathing room.

As to what we would like to see and what we might get; well let's just say that while I can hope for a high performance solution I'm expecting a little less. It would be nice to see Apple deliver a facility that people in the sciences and advance engineering would be compelled to own but I'm simply not convinced that Apple has what it takes. They could literially sink the last if the RISC machines if they wanted to. All they need to do is to offer such across the board at a reasonable cost.

Dave

Intel Chipsets all the Way with OpenCL

Comments