Coprocessor future?

mike fix · December 19, 2011 5:25PM

http://www.theregister.co.uk/2011/11...5_performance/

This looks like a very interesting option for the future. Any thoughts if Apple will adopt such a strategy?

karl kuehn · December 19, 2011 9:56PM

Your question is really, realy vauge. It is a good bet that with GPUs pushishing twords ever more flexible processors, and CPUs pushing more and more cores that the two will meet (if however briefly) at some point. The Larabee/Knights Bridge/Knights Corner products are a great bet by Intel to have a guess at where this meeting point will be. But the (lab) product they have right now is not really useable for many things since it is too expensive, too hot, and to specialized at the moment. But in the future it is a pretty safe bet that the inheritors to these ideas will be the dominant species, and thus in use by Apple.

wizard69 · December 20, 2011 1:05AM

Quote:

Originally Posted by Mike Fix

http://www.theregister.co.uk/2011/11...5_performance/

This looks like a very interesting option for the future. Any thoughts if Apple will adopt such a strategy?

Honestly I doubt it, but I've been wrong before. 😉

There are several reason here.

First; think about OpenCL, which is likely how this processor would be accessed' people don't understand where and how OpenCL is useful now. So they certainly would not understand where and how they would benefit from this processor. In the end the processor would only be used in the most demanding situations, because of this it would be forever expensive.

Think about the way OpenCL is used today. It is extremely successful if you understand what it is yet people seem to think it is a failure? The reason people see it as a failure is they don't understand the capabilities of the hardware. This hardware may have more general capabilities but the problem is it won't be on every machine. GPU hardware has to be on every machine thus which processor do you think will find acceptance. Especially as GPU hardware is further optimized for compute workloads across entire families of devices. In other words with OpenCL targeting the GPU you effectively gain performance, to one degree or another, across a wide range of hardware. In Apples case we are talking everything from the Mini on up can benefit from GPU acceleration. Go the co-processor route and you are stuck with the high end. Plus you need the GPU anyways, it is not like it can be deleted out of hand on current Apple hardware.

The second issue is this, Apple is only superficially focused on high end computing. Or high performance computing if you like that term. Sure the Mac Pro is a work station class machine, and a good solid machine but it isn't what one would implement a cluster with. I can't see Intel selling this chip for anything less than a couple of thousand unless they want to bleed cash. I just can't see the volume equation that would allow for low prices. Low prices would be needed to allow a range of implementations and to attract the likes of Apple. For Apple what would be the draw?

Third; I really see this approach of Intels as being fundamentally flawed, as the chip should not be implemented as a co-Processor. Really I think the whole idea is stupid. Current Intel hardware carries a whole bunch of bagged from day pays gone by. What Intel needs is a clean break architecture as a replacement for I86_64. That is Intel should be implementing this multi CPU as a host processor. As such it would have the volume to push into the mainstream. It is nice that Intel has the cash to do this design and it may have limited success but I just don't see it being placed on desktop machines the way GPUs are.

dobby · December 20, 2011 11:51AM

Was the AltiVec Velocity engine on the early G4's (Motorola/IBM) a co processor?

The intel one described is not for pc usage but there have been many co processors over the years on various platforms.

I think IBM's usage for GPU processing for offloading specific instructions is a great idea.

Dobby.

hiro · December 20, 2011 4:34PM

Quote:

Originally Posted by Mike Fix

http://www.theregister.co.uk/2011/11...5_performance/

This looks like a very interesting option for the future. Any thoughts if Apple will adopt such a strategy?

That's just the server-focused revision of the failed Larrabee GPU. I've been following it since ~2007. Larrabee had some significant performance problems as a GPU, despite the great and grand ideas. Its best attribute compared to general purpose GPU use is that the programming model is x86 based so there isn't a need to learn an additional special purpose GPU programming language.

Whether this architecture or a general purpose GPU based architecture will perform better will depend on the combination of the problem and the programmers technical abilities to do radical parallelization to make the GPU worthwhile. GPU can be faster, but the problem has to be broken down correctly or it is mostly wasted capability. Like if you cannot find 30,000+ operations the can be done essentially simultaneously, in parallel, without any dependencies, you will severely underutilize the GPU solution. So if you can only break down into a couple hundred simultaneous, parallel, non-dependent, chunks the x86 version will probably be more efficient.

hiro · December 20, 2011 4:45PM

Quote:

Originally Posted by wizard69

Honestly I doubt it, but I've been wrong before. ��

There are several reason here.

First; think about OpenCL, which is likely how this processor would be accessed' people don't understand where and how OpenCL is useful now. So they certainly would not understand where and how they would benefit from this processor. In the end the processor would only be used in the most demanding situations, because of this it would be forever expensive.

This is the repackaged/tweaked Larrabee. It will do OpenCL, but nowhere near as effectively as a OpenCL pipelined GPU. This MIC is designed for heavy lifting but far easier programming to the scientific community than having to learn OpenCL or Cuda variants.

Quote:

Originally Posted by wizard69

Think about the way OpenCL is used today. It is extremely successful if you understand what it is yet people seem to think it is a failure? The reason people see it as a failure is they don't understand the capabilities of the hardware. This hardware may have more general capabilities but the problem is it won't be on every machine. GPU hardware has to be on every machine thus which processor do you think will find acceptance. Especially as GPU hardware is further optimized for compute workloads across entire families of devices. In other words with OpenCL targeting the GPU you effectively gain performance, to one degree or another, across a wide range of hardware. In Apples case we are talking everything from the Mini on up can benefit from GPU acceleration. Go the co-processor route and you are stuck with the high end. Plus you need the GPU anyways, it is not like it can be deleted out of hand on current Apple hardware.

I agree. This part is really targeted towards the cluster user as a motivation to not put multiple GPUs in each box, but to put multiple of these in each box instead of the GPUs.

Quote:

Originally Posted by wizard69

The second issue is this, Apple is only superficially focused on high end computing. Or high performance computing if you like that term. Sure the Mac Pro is a work station class machine, and a good solid machine but it isn't what one would implement a cluster with. I can't see Intel selling this chip for anything less than a couple of thousand unless they want to bleed cash. I just can't see the volume equation that would allow for low prices. Low prices would be needed to allow a range of implementations and to attract the likes of Apple. For Apple what would be the draw?

Third; I really see this approach of Intels as being fundamentally flawed, as the chip should not be implemented as a co-Processor. Really I think the whole idea is stupid. Current Intel hardware carries a whole bunch of bagged from day pays gone by. What Intel needs is a clean break architecture as a replacement for I86_64. That is Intel should be implementing this multi CPU as a host processor. As such it would have the volume to push into the mainstream. It is nice that Intel has the cash to do this design and it may have limited success but I just don't see it being placed on desktop machines the way GPUs are.

Couldn't agree more. I was more excited about the server-based follow on announcement when it look like it was going to become a hardcore heavy CPU with max cache support for multi-core installs. I think Intel is screwing the pooch big time and giving Nvidia/AMD(ATi) a free pass. I guess it's true that Intel will never "get" the GPU.

wizard69 · December 20, 2011 6:53PM

Quote:

Originally Posted by Hiro

This is the repackaged/tweaked Larrabee. It will do OpenCL, but nowhere near as effectively as a OpenCL pipelined GPU. This MIC is designed for heavy lifting but far easier programming to the scientific community than having to learn OpenCL or Cuda variants.

Is the use of OpenCL really that difficult? In this context you still have to have parallel code to execute and map to the hardware. From what I can see the Intel hardware does not appear to be general purpose enough to schedule like cores in the main CPU. So some sort of API would still be used.

On the flip side you have AMD and NVidia trying to effectively refactor the GPU architecture to make them more flexible when it comes to code that doesn't fit SIMD. From Apples standpoint I can't see a big draw, for the Intel solution, as they will soon benefit from these new GPUs across many of their machines.

Quote:

I agree. This part is really targeted towards the cluster user as a motivation to not put multiple GPUs in each box, but to put multiple of these in each box instead of the GPUs.

Yep, that appears to be iNtels goal. The problem is this where is the money? That is how many of these chips could they realistically sell, into clusters capable of using them. At least NVidia and AMD have volume potential.

Quote:

Couldn't agree more. I was more excited about the server-based follow on announcement when it look like it was going to become a hardcore heavy CPU with max cache support for multi-core installs. I think Intel is screwing the pooch big time and giving Nvidia/AMD(ATi) a free pass. I guess it's true that Intel will never "get" the GPU.

Yeah Intel is pretty dumb at times. Such a chip could be a new generation of desktop and better hardware, instead they make hardware tha t will be of limited benefit to them and the industry. AMD on the other hand will eventually have sound GPU architectures on everything from Fusion on up that can support the CPU as a computational unit. By the way I realize that they basically have this already but honestly in a rudimentary form. If AMD can realize their recently laid out goals they will have covered a broad range of performance needs with extremely capable GPUs.

Intel on the other hand has Ivy Bridge coming. Now supposedly IB is a much better GPU with OpenCL support. Maybe so but we all know how Intel has screwed that up in the past. Even if OpenCL support is viable where is the upside? AMD or NVidia literally gives you a range of options including installing and dedicating a GPU to just compute loads. Intel isn't even in the game. So yeah they don't get it. Then again maybe they don't want to get it and rather would like to try to out market the GPU makers.

mr. me · December 20, 2011 8:57PM

Three things:

The G4 was exclusively Motorola/Freescale.

Altivec is an exclusive trademark of Freescale.

The Altivec vector processing unit was part of each G4 microprocessor and was not a separate co-processor.

hiro · December 21, 2011 9:33PM

Quote:

Originally Posted by wizard69

Is the use of OpenCL really that difficult? In this context you still have to have parallel code to execute and map to the hardware. From what I can see the Intel hardware does not appear to be general purpose enough to schedule like cores in the main CPU. So some sort of API would still be used.

On the flip side you have AMD and NVidia trying to effectively refactor the GPU architecture to make them more flexible when it comes to code that doesn't fit SIMD. From Apples standpoint I can't see a big draw, for the Intel solution, as they will soon benefit from these new GPUs across many of their machines.

OpenCL is not intuitive (CUDA is worse) unless you are a card carrying graphics engine coder good at the hardware optimizations. Larrabee was explicitly designed to run out of the box x86 code with no modifications so I would be very surprised it that has changed. Larrabee was also designed to do GPU style OpenGL/DirectX/OpenCL so those were options that I expect still exist, but relative to Nvidia/ATi GPUs the poor performance of those APIs were the reason Larrabee was cancelled.

The bottom line is that this architecture is taking WAY too long to get to market, long enough that it won't do so well if there are any Windows/Linux Grand Central APIs style in the next couple years. Those try to hide the worst of the OpenCL GPGPU scheduling issues. If you can think in terms of independent tasks you can go a long way without getting too specialized.

wizard69 · December 21, 2011 10:16PM

Quote:

Originally Posted by Hiro

OpenCL is not intuitive (CUDA is worse) unless you are a card carrying graphics engine coder good at the hardware optimizations. Larrabee was explicitly designed to run out of the box x86 code with no modifications so I would be very surprised it that has changed.

I was left with the impression that Intel was getting its performance from the vector processing capability in his chip. Thus to get the floating point performance you would have to program similar to the way a GPU is programmed to get anywhere near the speculated performance.

If the chip could run x86 code then I wonder what is missing to make it a host processor? For me it seems like Intel is missing a. Significant opportunity here to move beyond x86.

Quote:

Larrabee was also designed to do GPU style OpenGL/DirectX/OpenCL so those were options that I expect still exist, but relative to Nvidia/ATi GPUs the poor performance of those APIs were the reason Larrabee was cancelled.

Which leads me back to thinking that something like OpenCL would be required to harness what is actually powerful on the chip. I'm still left with the impression that most of the value in this guy is due to vector processing.

Quote:

The bottom line is that this architecture is taking WAY too long to get to market, long enough that it won't do so well if there are any Windows/Linux Grand Central APIs style in the next couple years. Those try to hide the worst of the OpenCL GPGPU scheduling issues. If you can think in terms of independent tasks you can go a long way without getting too specialized.

Maybe a FUD champainge from Intel? By the time this thing hits the street at a reasonable price AMD should have next generation Fusion GPUs working in real hardware. Commodity hardware can go a long way in the real world. I forgot what AMDs long term plan was but maybe it was 2013 that they expected to have equality with the main CPU on the memory bus. At that point you pretty much have a true heterogenous system. Maybe the GPU might not be an exceptional performer, in a SoC, but it will be cheap and easy to implement. Cheap and easy can be very compelling.

wizard69 · December 23, 2011 12:01AM

It is a beast too. Almost 1TFlop double precision and far far better single precision doing compute loads. That is for a product that ships next year (9 th of Jan). This is a product that uses AMDs so called Graphics Core Next technology, a technology that put a bit more focus on compute workloads.

There is no sense in my repeating AMDs well done propaganda so I suggest going over to www.AMD.com for details. Needless to say though this is a ground breaking GPU. For "most" users this highlights the up hill battle Intel will have. People doing serious number crunching usually want serious graphics so if your acceleration can be had in one card why bother with Intels choice?

Note a couple of things. One; this is AMDs first generation of these chips and apparently is relatively low powered (28nm). Two; this technology will migrate down into lower powered chips for use on more mainstream machines. Thus it wouldn't be impossible to see a Mini with one of these running at 1TFlop single precision.

All of this combined is pretty impressive, especially looking back on what a Cray use to cost 20 years ago. AMD has a lot of incentive to push hard to enhance these chips too, as NVidia would love to be able to corner the market.

Coprocessor future?

Comments