Interesting thoughts Amorph. I have not been reading exactly how cell processors work, but if your saying cells can be specific cores, such as an Altivec core I guess it could be possible in a multi core processor system to dedicate an Altivec, or SIMD cell core for FPU (because that is all it handles is Floating point #'s) with double, or more the current Altivec bandwidth, and have a group of cell cores per-processor maybe 2 to 4 to do equal the mathematical output of the Altivec, or SIMD FPU's so all the mathematical instructions could just fly through with the quickness.
That is how should be done IMO if possible. Because Altivec is very impressive for what it does (FP#'s), but the rest of the math was handled no differently other than it wasn't handling the Floats. You would almost think it could be possible to have cores for Long, Short, Float, and so on in a single chip.
Anyway, just a rant.. Nothing to see here.. move along..
Because Altivec is very impressive for what it does (FP#'s), but the rest of the math was handled no differently other than it wasn't handling the Floats. You would almost think it could be possible to have cores for Long, Short, Float, and so on in a single chip.
Anyway, just a rant.. Nothing to see here.. move along..
Altivec is an 128bit VPU. It can do 16 bytes, 8 short ints, 4 32bit integers, 2 double precision floating point ops, etc. It is useful for many things, as long as the programmer isn't a chump.
But I would imagine that the affect of Altivec could be emulated through the Cell system without a huge pile of transistors necessary. Really, it just seems like adding a couple more general purpose ALU/FPU units, and tying them to the main network as well as putting them in close proximity to each other with a little optional control logic, so when Altivec instructions are called, they are handled by this lobe of sorts. If the system already computes a great deal of its arithmetic in parallel, Altivec isn't going to deliver a real speed bonus. Instead, it's just backwards compatibility, and when Cell variants get in macs a bit down the road, I bet we'll see a shift they way we did with the Mac->Power Mac upgrade. So I don't really see why Altivec is necessary.
...if you can get enough programmers to care to spend the time to learn how to best use it...
Hard to see how getting top notch programmers will be a problem. After all, it's going to power the most popular console in the world. There's a lot of $$ waiting to entice those gaming houses.
Since the PS2, middleware has become a lot more common in the console world and I expect it to become even moreso in the future. If the middleware developers get it right, then by default, the game programmers will too.
Not really. If you sift through all of the information available on Cell it appears to come down to something as simple as asymmetric cores on a single chip with an unconventional memory addressing/sharing scheme. The main core is probably fairly traditional, and quite possibly PowerPC. The additional cores are more along the lines of PS2 vector units. There is nothing about sharing execution units or reconfiguring circuitry -- that stuff is all on the distant horizon but considerably farther out than the Cell delivery date. If the main core of a Cell chip is PowerPC then it could include an AltiVec unit, or it might not. The additional vector cores would execute a different instruction set entirely that includes its own set of vector operation primitives.
Could Apple use such a thing? If its main core is a true PPC32 or PPC64, then possibly -- they've had machines with DSPs in them in the past, plus there have been rumours of add-on media processors and vector units in the main chipset. Since this kind of thing is asymmetric you need to code for it specifically, or have some kind of higher level system service that converts your operations to run on the available hardware (a la CoreImage and OpenGL 2). Whether this is better than a multi-core version of the 9x0 series is an open question. High clock rates have been postulated for the Cell (and XBox2, for that matter) but all of that information appeared before the 970FX tripped over its feet and then finally arrived at "only" 2.5 GHz wearing a water tank on its back.
Kickaha and Amorph couldn't moderate themselves out of a paper bag. Abdicate responsibility and succumb to idiocy. Two years of letting a member make personal attacks against others, then stepping aside when someone won't put up with it. Not only that but go ahead and shut down my posting priviledges but not the one making the attacks. Not even the common decency to abide by their warning (afer three days of absorbing personal attacks with no mods in sight), just shut my posting down and then say it might happen later if a certian line is crossed. Bullshit flag is flying, I won't abide by lying and coddling of liars who go off-site, create accounts differing in a single letter from my handle with the express purpose to decieve and then claim here that I did it. Everyone be warned, kim kap sol is a lying, deceitful poster.
Now I guess they should have banned me rather than just shut off posting priviledges, because kickaha and Amorph definitely aren't going to like being called to task when they thought they had it all ignored *cough* *cough* I mean under control. Just a couple o' tools.
Don't worry, as soon as my work resetting my posts is done I'll disappear forever.
Kickaha and Amorph couldn't moderate themselves out of a paper bag. Abdicate responsibility and succumb to idiocy. Two years of letting a member make personal attacks against others, then stepping aside when someone won't put up with it. Not only that but go ahead and shut down my posting priviledges but not the one making the attacks. Not even the common decency to abide by their warning (afer three days of absorbing personal attacks with no mods in sight), just shut my posting down and then say it might happen later if a certian line is crossed. Bullshit flag is flying, I won't abide by lying and coddling of liars who go off-site, create accounts differing in a single letter from my handle with the express purpose to decieve and then claim here that I did it. Everyone be warned, kim kap sol is a lying, deceitful poster.
Now I guess they should have banned me rather than just shut off posting priviledges, because kickaha and Amorph definitely aren't going to like being called to task when they thought they had it all ignored *cough* *cough* I mean under control. Just a couple o' tools.
Don't worry, as soon as my work resetting my posts is done I'll disappear forever.
As a game designer I would say that the cell apprach will be fun. As of now I am not writing my game to work with todays tech. I intend for it to be released when they are dual core dual proc smt systems with more than one GFX chipset. May run like crap now. But eventualy it will be nice having my as of right now 12 threads running on their own logical unit. When it comes to the programming its not that hard switching the desing paradigms, its harder to switch your data paradigms. How you look at, orgamize and operate on the data becomes weird. Kinda of like writing things for altivec, that really strain my brain. but yeah, what I meant to say is that I don't think that cell will do badly, but For now I do not believe the implementation in the PS3 or the workstation will be impressive enough to me compared to say the SMP route that all the chip maker are leaning to.
Not really. If you sift through all of the information available on Cell it appears to come down to something as simple as asymmetric cores on a single chip with an unconventional memory addressing/sharing scheme. The main core is probably fairly traditional, and quite possibly PowerPC. The additional cores are more along the lines of PS2 vector units. There is nothing about sharing execution units or reconfiguring circuitry -- that stuff is all on the distant horizon but considerably farther out than the Cell delivery date. If the main core of a Cell chip is PowerPC then it could include an AltiVec unit, or it might not. The additional vector cores would execute a different instruction set entirely that includes its own set of vector operation primitives.
Interesting, because I'd read that from various (public) sources, including IBM slides posted at Ars, that IBM was going significantly multicore, first of all, versus dual-core now, and second of all that the Cell cores were incomplete and special purpose (partly so that they could clock really high, another thing I've seen on an IBM slide).
If all it is is a nifty fabric for traditional AMP or SMP on die — and I have no reason to doubt you on this — then... meh. I honestly had the impression from reading around that IBM were trying for a paradigm shift rather than an incremental improvement.
Quote:
Could Apple use such a thing?
It's much easier to imagine them using this far more conservative design strategy than the more far-out one that I'd imagined, certainly. It'll make a nice competitor to the 8641D coming out from Freescale next year. But that's all.
Interesting, because I'd read that from various (public) sources, including IBM slides posted at Ars, that IBM was going significantly multicore, first of all, versus dual-core now, and second of all that the Cell cores were incomplete and special purpose (partly so that they could clock really high, another thing I've seen on an IBM slide).
Well the Cell is significantly multi-core (8+ cores is pretty significant), and the Cell cores are specialized vector execution units, and they probably will clock higher than a general purpose core could. But what I'm saying is that they are still only several specialized high clock cores on a single die. That's a far cry from what was being discussed a few posts back.
This is a major paradigm shift for most software to deal with... right now everyone is pretty much coding for one or two SMP threads. That is a lot different.
If the Cell cores are PowerPC, cheap and high frequenzy is there a possibility that we might see Cell based accelerator cards? If ATI thinks they can use the main RAM as VRAM using the high bandwidth PCIe bus, woundn't it be possible to strap on a dozen Cells on a PCIe card and make a fairly cheap accelerator card for general number crunching, 3D, video rendering and such?
Not really. If you sift through all of the information available on Cell it appears to come down to something as simple as asymmetric cores on a single chip with an unconventional memory addressing/sharing scheme.
In my limited surfing this is what I've come up with. Those cores are apparently PPC or PPC derived with the obvious new cores to support additional functionality.
Quote:
The main core is probably fairly traditional, and quite possibly PowerPC. The additional cores are more along the lines of PS2 vector units. There is nothing about sharing execution units or reconfiguring circuitry -- that stuff is all on the distant horizon but considerably farther out than the Cell delivery date. If the main core of a Cell chip is PowerPC then it could include an AltiVec unit, or it might not. The additional vector cores would execute a different instruction set entirely that includes its own set of vector operation primitives.
What I wonder is how much room is there to improve Alt-Vec with respect to doing vector operations. Will 4 cores, each working on a 16 or 32 bit vector be better than the Alt-Vec approach?
The bigger problem as I see it is that Cell has to implement a 64 bit processor to remain viable for more than a year or two. I really believe that the addressing range 64 bits offers will be significant in a short time in this market.
Quote:
Could Apple use such a thing? If its main core is a true PPC32 or PPC64, then possibly -- they've had machines with DSPs in them in the past, plus there have been rumours of add-on media processors and vector units in the main chipset. Since this kind of thing is asymmetric you need to code for it specifically, or have some kind of higher level system service that converts your operations to run on the available hardware (a la CoreImage and OpenGL 2).
The reality here is that Apple has had prior expereince here with DSP and frankly it did not go over well. Since Apple has a well recieved, but partial, DSP facility in Alt-Vec it would seem to make more sense to simply extend Alt-Vec to make use of the additional transistors available to them.
Apple needs to offer a uniform programming environment, it is pretty obvious that they undestand this as they have switched over to all Alt-Vec enabled processors. Now that doesn't mean that Apple could build the software required to offer the Alt-vec programming environment on Cell if it doesn't already exist there. I just have a very hard time seeing the payoff for the types of applications that Apple hardware runs. In other words it wouldn't make much sense to tye up the entire Cell chip just to emulate Alt-Vec especially when hardware performance is still moving forward with conventional implementations.
Quote:
Whether this is better than a multi-core version of the 9x0 series is an open question. High clock rates have been postulated for the Cell (and XBox2, for that matter) but all of that information appeared before the 970FX tripped over its feet and then finally arrived at "only" 2.5 GHz wearing a water tank on its back.
I had to laugh here a bit. The water tank on its back reminded me of the old fashion steam engines with their saddle tank water resivoirs.
I'm still of the opinion that the existance of the water cooling system in Apples hardware is pretty clear indications of IBM's failure to meet customer expectations. We do not see a whole lot of water cooling going on in the AMD 64 bit world, and very little if any in the rest of the 90nm world. More than anything I see this as a driver at Apple to search out new technologies and vendors. I just don't see Cell being this alternative.
The main obstacle will be that, for historical reasons, game designers are used to writing monolithic, single-threaded apps for single-core CPUs.
Where did you get that idea from? Us console and arcade engineers have been writing games for multi-chip systems for as long as I've been in the business, over a decade. Sound, physics, AI/game logic are all broken up into sub-programs running in parallel on different chips in the machine to various degrees depending on hardware.
Where did you get that idea from? Us console and arcade engineers have been writing games for multi-chip systems for as long as I've been in the business, over a decade. Sound, physics, AI/game logic are all broken up into sub-programs running in parallel on different chips in the machine to various degrees depending on hardware.
My bad, especially since I made exactly the same point elsewhere in the thread! I meant Windows game programmers.
What I wonder is how much room is there to improve Alt-Vec with respect to doing vector operations. Will 4 cores, each working on a 16 or 32 bit vector be better than the Alt-Vec approach?
This doesn't jive with my understanding. If a Cell chip is a main PowerPC core with 8 vector cores this means it will be executing 9 different instruction streams: 1 stream of PowerPC instructions and 8 streams of whatever instructions the vector cores use (probably not PowerPC). Each vector core will be able to do operations similar in nature to what AltiVec currently does on the G4 or G5. In other words the Cell will be (theoretically) capable of 9 times the computation if running at the same clock rate (and the Cell's clock rate will likely be substantially higher). To support this level of computation they have developed a memory scheme that is different than the traditional PowerPC model. It also goes without saying that existing code won't just work on the Cell's vector cores... but this is just as well since these vector cores aren't likely to be out of order superscalar processors like the 970 is so careful coding and algorithmic redesign will be required to run at all, nevermind get peak performance.
Quote:
The bigger problem as I see it is that Cell has to implement a 64 bit processor to remain viable for more than a year or two. I really believe that the addressing range 64 bits offers will be significant in a short time in this market.
Given its target market in the embedded and console space, I don't think this is a requirement. Also, with a substantially different memory addressing scheme it might not be necessary to go to 64-bit pointers to acheive larger memory sizes. The scalar integer units may well be 64-bit, however -- at least on the main core. The cost in terms of processor complexity is not that high.
Quote:
The reality here is that Apple has had prior expereince here with DSP and frankly it did not go over well. Since Apple has a well recieved, but partial, DSP facility in Alt-Vec it would seem to make more sense to simply extend Alt-Vec to make use of the additional transistors available to them.
On the other hand they still provide a considerable amount of system services which use hardware acceleration internally. The OpenGL shaders, CoreAudio, CoreImage, QuickTime, vector library, network stack, etc could all be re-optimized over time to take advantage of specialized hardware. Nonetheless I tend to think that Apple would rather stick with the traditional PowerPC w/ AltiVec programming model and start adding cores. I don't actually expect to see Cell in Apple's future.
Quote:
In other words it wouldn't make much sense to tye up the entire Cell chip just to emulate Alt-Vec especially when hardware performance is still moving forward with conventional implementations.
Cell cannot "emulate" AltiVec. They are two different beasts.
Quote:
I had to laugh here a bit. The water tank on its back reminded me of the old fashion steam engines with their saddle tank water resivoirs.
Well I'm glad somebody got the joke.
Quote:
I'm still of the opinion that the existance of the water cooling system in Apples hardware is pretty clear indications of IBM's failure to meet customer expectations. We do not see a whole lot of water cooling going on in the AMD 64 bit world, and very little if any in the rest of the 90nm world. More than anything I see this as a driver at Apple to search out new technologies and vendors.
AMD isn't at 90 nm yet and they recently pushed back their scheduled move to that process. I think Apple's use of water cooling on the 2.5 GHz machines was a direct result of a deep desire to keep the machines quiet and deal with the very significant heat density issues that result from being able to suddenly expend such a huge amount of power from such a small area. These G5s can go from very low power consumption to very high power consumption very quickly and water has the specific heat capacity to absorb that initial spike without having to continuously keep fans blowing at full speed. If the darn unit didn't looks so... so... automotive then it might actually be a compelling piece of technology.
Where did you get that idea from? Us console and arcade engineers have been writing games for multi-chip systems for as long as I've been in the business, over a decade. Sound, physics, AI/game logic are all broken up into sub-programs running in parallel on different chips in the machine to various degrees depending on hardware.
While that is true, the coming machines are somewhat different in nature. There are going to more cores than ever before and they are going to be more general purpose than ever before. This presents some new challenges to get peak performance from this hardware.
I want you to know that I had a great respnse to this post yesterday but the server apparently started having problems. So here is an attempt at a condenced version.
Quote:
Originally posted by Programmer
This doesn't jive with my understanding. If a Cell chip is a main PowerPC core with 8 vector cores this means it will be executing 9 different instruction streams: 1 stream of PowerPC instructions and 8 streams of whatever instructions the vector cores use (probably not PowerPC). Each vector core will be able to do operations similar in nature to what AltiVec currently does on the G4 or G5.
What is clear is that I really don't have much information here, the issue is that I'd be surprised if those 8 vector cores are as wide as the Alt-Vec unit in PPC. What I see is 8 cores of very modest width (maybe one word) that are used together in various combinations.
The problem as I see it is that having that many cores of the Alt-Vec type will take up a huge amount of room. Not just for the cores but for the supporting caches and communications logic. I just don't see current processes supporting that many Alt-Vec type units well on one chip.
The other problem is the efficency of that sort of implementation. Making full use of that many wide vector units would be a problem. Thus the thought that the vector units would be narrow devices, very much like many of the DSP's available today.
Quote:
In other words the Cell will be (theoretically) capable of 9 times the computation if running at the same clock rate (and the Cell's clock rate will likely be substantially higher). To support this level of computation they have developed a memory scheme that is different than the traditional PowerPC model. It also goes without saying that existing code won't just work on the Cell's vector cores... but this is just as well since these vector cores aren't likely to be out of order superscalar processors like the 970 is so careful coding and algorithmic redesign will be required to run at all, nevermind get peak performance.
I read this and think that maybe you know more about Cell than you are willing to let on Even so the implied simple execution units lead me back to thinking narrow vector units.
Quote:
Given its target market in the embedded and console space, I don't think this is a requirement. Also, with a substantially different memory addressing scheme it might not be necessary to go to 64-bit pointers to acheive larger memory sizes. The scalar integer units may well be 64-bit, however -- at least on the main core. The cost in terms of processor complexity is not that high.
64 bits will be huge in the future. In the case of Cell I think it would simplfy things more than anything. As you say the cost isn't that high.
On the other hand the embedded an console market is heading towards 64 bit machinery. It is simply a matter of getting costs under control, with memory beign the big issue. So we are talking a year or two before large memory systems are cost effective. I can't see this team designing a chip that is only going to be competitive for a year or two.
Quote:
On the other hand they still provide a considerable amount of system services which use hardware acceleration internally. The OpenGL shaders, CoreAudio, CoreImage, QuickTime, vector library, network stack, etc could all be re-optimized over time to take advantage of specialized hardware. Nonetheless I tend to think that Apple would rather stick with the traditional PowerPC w/ AltiVec programming model and start adding cores. I don't actually expect to see Cell in Apple's future.
I see Cell in Apples future but maybe not in the way you see it. I see Cell as an opportunity on IBM's part to optimize the execution units within the PPC line. What would be really neat is if Cell lead to hand crafted execution units to replace the dense sea of logic that is the 970 we all know and love. The idea here being to replace some of the hot stuff within the 970's.
So hopefully Cell becomes a proving or development framework for things that can be extended to the 970 series.
Quote:
Cell cannot "emulate" AltiVec. They are two different beasts.
I'm not sure you meant to say that. Obviously Cell can emulate anything it wants to emulate, that is simply a matter of writing the right code.
What I was getting at and this is only a consderation if Apple is interestedin Cell, has there been an attempt to design the Cell vector units so they could emulate or work in place of, in an efficent manner the Alt-Vec units.
Quote:
Well I'm glad somebody got the joke.
AMD isn't at 90 nm yet and they recently pushed back their scheduled move to that process. I think Apple's use of water cooling on the 2.5 GHz machines was a direct result of a deep desire to keep the machines quiet and deal with the very significant heat density issues that result from being able to suddenly expend such a huge amount of power from such a small area. These G5s can go from very low power consumption to very high power consumption very quickly and water has the specific heat capacity to absorb that initial spike without having to continuously keep fans blowing at full speed. If the darn unit didn't looks so... so... automotive then it might actually be a compelling piece of technology.
That automotive look is technology none the less. Old technology yes but well understood and reliable.
The G5's though are just the opposite. Very new technology that frankly has not meant expectations of anybody. That is not to say that the 970's don't work, just that they haven't gotten to where people (JOBS) had expected. What is worst they didn't get there and where they are now is problematic.
It is all well and nice that IBM has a 90nm process but we shouldn't forget that the process needs alot of work. Instead of claiming to have hit a wall IBM really should be saying they are working on breaching the wall. Otherwise one is left with the impression that PPC doesn't mean much to them. IBM needs a leading edge 90nm process not a trailing edge process.
What is clear is that I really don't have much information here, the issue is that I'd be surprised if those 8 vector cores are as wide as the Alt-Vec unit in PPC. What I see is 8 cores of very modest width (maybe one word) that are used together in various combinations.
I disagree. I expect 128-bit registers just like AltiVec and the PS2's vector units. Each vector core will have a set of at least 32 of them. Consider, for example, that the latest ATI & nVidia graphics chips have large numbers of vertex and pixel shader engines with 12+ 128-bit 4-way floating point registers each.
Quote:
The problem as I see it is that having that many cores of the Alt-Vec type will take up a huge amount of room. Not just for the cores but for the supporting caches and communications logic. I just don't see current processes supporting that many Alt-Vec type units well on one chip.
Number of transistors is far less of a problem than you think... especially on the 65nm process. Also, many of AltiVec's transistors goes to certain kinds of functionality that may not be present in the vector cores.
Quote:
The other problem is the efficency of that sort of implementation. Making full use of that many wide vector units would be a problem. Thus the thought that the vector units would be narrow devices, very much like many of the DSP's available today.
You've heard all the grousing about how hard the PS3 will be to program for, right...? Sony isn't terribly interested in making it easy on the software guys.
Quote:
I read this and think that maybe you know more about Cell than you are willing to let on Even so the implied simple execution units lead me back to thinking narrow vector units.
Word == Mum.
Quote:
64 bits will be huge in the future. In the case of Cell I think it would simplfy things more than anything. As you say the cost isn't that high.
Yes, I expect to see 64-bit integer registers on the scalar core. Whether conventional 64-bit addressing is supported on all (or any) of the cores is another matter.
Quote:
I see Cell in Apples future but maybe not in the way you see it. I see Cell as an opportunity on IBM's part to optimize the execution units within the PPC line. What would be really neat is if Cell lead to hand crafted execution units to replace the dense sea of logic that is the 970 we all know and love. The idea here being to replace some of the hot stuff within the 970's.
Possibly, although issues of design IP must be a bit hairy with Sony and Toshiba involved.
Quote:
I'm not sure you meant to say that. Obviously Cell can emulate anything it wants to emulate, that is simply a matter of writing the right code.
Okay, what I meant was "emulate in a useful fashion". The vector cores probably couldn't emulate it at all due to limited/specialized functionality.
Quote:
What I was getting at and this is only a consderation if Apple is interestedin Cell, has there been an attempt to design the Cell vector units so they could emulate or work in place of, in an efficent manner the Alt-Vec units.
Not without different software.
Quote:
It is all well and nice that IBM has a 90nm process but we shouldn't forget that the process needs alot of work. Instead of claiming to have hit a wall IBM really should be saying they are working on breaching the wall. Otherwise one is left with the impression that PPC doesn't mean much to them. IBM needs a leading edge 90nm process not a trailing edge process. [/B]
Keep in mind what they actually said -- conventional scaling has hit the wall. They (and everyone else) still have avenues to pursue, but the rapid clock rate scaling of the past is over. They don't have a trailing edge process, and everybody else is having troubles too. I don't see where you get the impression that PPC doesn't mean much to them, aside from being impatient for the next new thing.
Comments
That is how should be done IMO if possible. Because Altivec is very impressive for what it does (FP#'s), but the rest of the math was handled no differently other than it wasn't handling the Floats. You would almost think it could be possible to have cores for Long, Short, Float, and so on in a single chip.
Anyway, just a rant.. Nothing to see here.. move along..
Originally posted by onlooker
Because Altivec is very impressive for what it does (FP#'s), but the rest of the math was handled no differently other than it wasn't handling the Floats. You would almost think it could be possible to have cores for Long, Short, Float, and so on in a single chip.
Anyway, just a rant.. Nothing to see here.. move along..
Altivec is an 128bit VPU. It can do 16 bytes, 8 short ints, 4 32bit integers, 2 double precision floating point ops, etc. It is useful for many things, as long as the programmer isn't a chump.
But I would imagine that the affect of Altivec could be emulated through the Cell system without a huge pile of transistors necessary. Really, it just seems like adding a couple more general purpose ALU/FPU units, and tying them to the main network as well as putting them in close proximity to each other with a little optional control logic, so when Altivec instructions are called, they are handled by this lobe of sorts. If the system already computes a great deal of its arithmetic in parallel, Altivec isn't going to deliver a real speed bonus. Instead, it's just backwards compatibility, and when Cell variants get in macs a bit down the road, I bet we'll see a shift they way we did with the Mac->Power Mac upgrade. So I don't really see why Altivec is necessary.
Originally posted by AirSluf
...if you can get enough programmers to care to spend the time to learn how to best use it...
Hard to see how getting top notch programmers will be a problem. After all, it's going to power the most popular console in the world. There's a lot of $$ waiting to entice those gaming houses.
Originally posted by Programmer
I think there is a fair bit of wrong-tree-barking going on in this thread.
Tease.
Originally posted by Amorph
Tease.
Not really. If you sift through all of the information available on Cell it appears to come down to something as simple as asymmetric cores on a single chip with an unconventional memory addressing/sharing scheme. The main core is probably fairly traditional, and quite possibly PowerPC. The additional cores are more along the lines of PS2 vector units. There is nothing about sharing execution units or reconfiguring circuitry -- that stuff is all on the distant horizon but considerably farther out than the Cell delivery date. If the main core of a Cell chip is PowerPC then it could include an AltiVec unit, or it might not. The additional vector cores would execute a different instruction set entirely that includes its own set of vector operation primitives.
Could Apple use such a thing? If its main core is a true PPC32 or PPC64, then possibly -- they've had machines with DSPs in them in the past, plus there have been rumours of add-on media processors and vector units in the main chipset. Since this kind of thing is asymmetric you need to code for it specifically, or have some kind of higher level system service that converts your operations to run on the available hardware (a la CoreImage and OpenGL 2). Whether this is better than a multi-core version of the 9x0 series is an open question. High clock rates have been postulated for the Cell (and XBox2, for that matter) but all of that information appeared before the 970FX tripped over its feet and then finally arrived at "only" 2.5 GHz wearing a water tank on its back.
Now I guess they should have banned me rather than just shut off posting priviledges, because kickaha and Amorph definitely aren't going to like being called to task when they thought they had it all ignored *cough* *cough* I mean under control. Just a couple o' tools.
Don't worry, as soon as my work resetting my posts is done I'll disappear forever.
Now I guess they should have banned me rather than just shut off posting priviledges, because kickaha and Amorph definitely aren't going to like being called to task when they thought they had it all ignored *cough* *cough* I mean under control. Just a couple o' tools.
Don't worry, as soon as my work resetting my posts is done I'll disappear forever.
Originally posted by Programmer
Not really. If you sift through all of the information available on Cell it appears to come down to something as simple as asymmetric cores on a single chip with an unconventional memory addressing/sharing scheme. The main core is probably fairly traditional, and quite possibly PowerPC. The additional cores are more along the lines of PS2 vector units. There is nothing about sharing execution units or reconfiguring circuitry -- that stuff is all on the distant horizon but considerably farther out than the Cell delivery date. If the main core of a Cell chip is PowerPC then it could include an AltiVec unit, or it might not. The additional vector cores would execute a different instruction set entirely that includes its own set of vector operation primitives.
Interesting, because I'd read that from various (public) sources, including IBM slides posted at Ars, that IBM was going significantly multicore, first of all, versus dual-core now, and second of all that the Cell cores were incomplete and special purpose (partly so that they could clock really high, another thing I've seen on an IBM slide).
If all it is is a nifty fabric for traditional AMP or SMP on die — and I have no reason to doubt you on this — then... meh. I honestly had the impression from reading around that IBM were trying for a paradigm shift rather than an incremental improvement.
Could Apple use such a thing?
It's much easier to imagine them using this far more conservative design strategy than the more far-out one that I'd imagined, certainly. It'll make a nice competitor to the 8641D coming out from Freescale next year. But that's all.
Originally posted by Amorph
Interesting, because I'd read that from various (public) sources, including IBM slides posted at Ars, that IBM was going significantly multicore, first of all, versus dual-core now, and second of all that the Cell cores were incomplete and special purpose (partly so that they could clock really high, another thing I've seen on an IBM slide).
Well the Cell is significantly multi-core (8+ cores is pretty significant), and the Cell cores are specialized vector execution units, and they probably will clock higher than a general purpose core could. But what I'm saying is that they are still only several specialized high clock cores on a single die. That's a far cry from what was being discussed a few posts back.
This is a major paradigm shift for most software to deal with... right now everyone is pretty much coding for one or two SMP threads. That is a lot different.
Originally posted by Programmer
Not really. If you sift through all of the information available on Cell it appears to come down to something as simple as asymmetric cores on a single chip with an unconventional memory addressing/sharing scheme.
In my limited surfing this is what I've come up with. Those cores are apparently PPC or PPC derived with the obvious new cores to support additional functionality.
The main core is probably fairly traditional, and quite possibly PowerPC. The additional cores are more along the lines of PS2 vector units. There is nothing about sharing execution units or reconfiguring circuitry -- that stuff is all on the distant horizon but considerably farther out than the Cell delivery date. If the main core of a Cell chip is PowerPC then it could include an AltiVec unit, or it might not. The additional vector cores would execute a different instruction set entirely that includes its own set of vector operation primitives.
What I wonder is how much room is there to improve Alt-Vec with respect to doing vector operations. Will 4 cores, each working on a 16 or 32 bit vector be better than the Alt-Vec approach?
The bigger problem as I see it is that Cell has to implement a 64 bit processor to remain viable for more than a year or two. I really believe that the addressing range 64 bits offers will be significant in a short time in this market.
Could Apple use such a thing? If its main core is a true PPC32 or PPC64, then possibly -- they've had machines with DSPs in them in the past, plus there have been rumours of add-on media processors and vector units in the main chipset. Since this kind of thing is asymmetric you need to code for it specifically, or have some kind of higher level system service that converts your operations to run on the available hardware (a la CoreImage and OpenGL 2).
The reality here is that Apple has had prior expereince here with DSP and frankly it did not go over well. Since Apple has a well recieved, but partial, DSP facility in Alt-Vec it would seem to make more sense to simply extend Alt-Vec to make use of the additional transistors available to them.
Apple needs to offer a uniform programming environment, it is pretty obvious that they undestand this as they have switched over to all Alt-Vec enabled processors. Now that doesn't mean that Apple could build the software required to offer the Alt-vec programming environment on Cell if it doesn't already exist there. I just have a very hard time seeing the payoff for the types of applications that Apple hardware runs. In other words it wouldn't make much sense to tye up the entire Cell chip just to emulate Alt-Vec especially when hardware performance is still moving forward with conventional implementations.
Whether this is better than a multi-core version of the 9x0 series is an open question. High clock rates have been postulated for the Cell (and XBox2, for that matter) but all of that information appeared before the 970FX tripped over its feet and then finally arrived at "only" 2.5 GHz wearing a water tank on its back.
I had to laugh here a bit. The water tank on its back reminded me of the old fashion steam engines with their saddle tank water resivoirs.
I'm still of the opinion that the existance of the water cooling system in Apples hardware is pretty clear indications of IBM's failure to meet customer expectations. We do not see a whole lot of water cooling going on in the AMD 64 bit world, and very little if any in the rest of the 90nm world. More than anything I see this as a driver at Apple to search out new technologies and vendors. I just don't see Cell being this alternative.
Dave
Originally posted by Amorph
The main obstacle will be that, for historical reasons, game designers are used to writing monolithic, single-threaded apps for single-core CPUs.
Where did you get that idea from? Us console and arcade engineers have been writing games for multi-chip systems for as long as I've been in the business, over a decade. Sound, physics, AI/game logic are all broken up into sub-programs running in parallel on different chips in the machine to various degrees depending on hardware.
Originally posted by Tuttle
Where did you get that idea from? Us console and arcade engineers have been writing games for multi-chip systems for as long as I've been in the business, over a decade. Sound, physics, AI/game logic are all broken up into sub-programs running in parallel on different chips in the machine to various degrees depending on hardware.
My bad, especially since I made exactly the same point elsewhere in the thread! I meant Windows game programmers.
Originally posted by wizard69
What I wonder is how much room is there to improve Alt-Vec with respect to doing vector operations. Will 4 cores, each working on a 16 or 32 bit vector be better than the Alt-Vec approach?
This doesn't jive with my understanding. If a Cell chip is a main PowerPC core with 8 vector cores this means it will be executing 9 different instruction streams: 1 stream of PowerPC instructions and 8 streams of whatever instructions the vector cores use (probably not PowerPC). Each vector core will be able to do operations similar in nature to what AltiVec currently does on the G4 or G5. In other words the Cell will be (theoretically) capable of 9 times the computation if running at the same clock rate (and the Cell's clock rate will likely be substantially higher). To support this level of computation they have developed a memory scheme that is different than the traditional PowerPC model. It also goes without saying that existing code won't just work on the Cell's vector cores... but this is just as well since these vector cores aren't likely to be out of order superscalar processors like the 970 is so careful coding and algorithmic redesign will be required to run at all, nevermind get peak performance.
The bigger problem as I see it is that Cell has to implement a 64 bit processor to remain viable for more than a year or two. I really believe that the addressing range 64 bits offers will be significant in a short time in this market.
Given its target market in the embedded and console space, I don't think this is a requirement. Also, with a substantially different memory addressing scheme it might not be necessary to go to 64-bit pointers to acheive larger memory sizes. The scalar integer units may well be 64-bit, however -- at least on the main core. The cost in terms of processor complexity is not that high.
The reality here is that Apple has had prior expereince here with DSP and frankly it did not go over well. Since Apple has a well recieved, but partial, DSP facility in Alt-Vec it would seem to make more sense to simply extend Alt-Vec to make use of the additional transistors available to them.
On the other hand they still provide a considerable amount of system services which use hardware acceleration internally. The OpenGL shaders, CoreAudio, CoreImage, QuickTime, vector library, network stack, etc could all be re-optimized over time to take advantage of specialized hardware. Nonetheless I tend to think that Apple would rather stick with the traditional PowerPC w/ AltiVec programming model and start adding cores. I don't actually expect to see Cell in Apple's future.
In other words it wouldn't make much sense to tye up the entire Cell chip just to emulate Alt-Vec especially when hardware performance is still moving forward with conventional implementations.
Cell cannot "emulate" AltiVec. They are two different beasts.
I had to laugh here a bit. The water tank on its back reminded me of the old fashion steam engines with their saddle tank water resivoirs.
Well I'm glad somebody got the joke.
I'm still of the opinion that the existance of the water cooling system in Apples hardware is pretty clear indications of IBM's failure to meet customer expectations. We do not see a whole lot of water cooling going on in the AMD 64 bit world, and very little if any in the rest of the 90nm world. More than anything I see this as a driver at Apple to search out new technologies and vendors.
AMD isn't at 90 nm yet and they recently pushed back their scheduled move to that process. I think Apple's use of water cooling on the 2.5 GHz machines was a direct result of a deep desire to keep the machines quiet and deal with the very significant heat density issues that result from being able to suddenly expend such a huge amount of power from such a small area. These G5s can go from very low power consumption to very high power consumption very quickly and water has the specific heat capacity to absorb that initial spike without having to continuously keep fans blowing at full speed. If the darn unit didn't looks so... so... automotive then it might actually be a compelling piece of technology.
Originally posted by Tuttle
Where did you get that idea from? Us console and arcade engineers have been writing games for multi-chip systems for as long as I've been in the business, over a decade. Sound, physics, AI/game logic are all broken up into sub-programs running in parallel on different chips in the machine to various degrees depending on hardware.
While that is true, the coming machines are somewhat different in nature. There are going to more cores than ever before and they are going to be more general purpose than ever before. This presents some new challenges to get peak performance from this hardware.
Originally posted by Programmer
This doesn't jive with my understanding. If a Cell chip is a main PowerPC core with 8 vector cores this means it will be executing 9 different instruction streams: 1 stream of PowerPC instructions and 8 streams of whatever instructions the vector cores use (probably not PowerPC). Each vector core will be able to do operations similar in nature to what AltiVec currently does on the G4 or G5.
What is clear is that I really don't have much information here, the issue is that I'd be surprised if those 8 vector cores are as wide as the Alt-Vec unit in PPC. What I see is 8 cores of very modest width (maybe one word) that are used together in various combinations.
The problem as I see it is that having that many cores of the Alt-Vec type will take up a huge amount of room. Not just for the cores but for the supporting caches and communications logic. I just don't see current processes supporting that many Alt-Vec type units well on one chip.
The other problem is the efficency of that sort of implementation. Making full use of that many wide vector units would be a problem. Thus the thought that the vector units would be narrow devices, very much like many of the DSP's available today.
In other words the Cell will be (theoretically) capable of 9 times the computation if running at the same clock rate (and the Cell's clock rate will likely be substantially higher). To support this level of computation they have developed a memory scheme that is different than the traditional PowerPC model. It also goes without saying that existing code won't just work on the Cell's vector cores... but this is just as well since these vector cores aren't likely to be out of order superscalar processors like the 970 is so careful coding and algorithmic redesign will be required to run at all, nevermind get peak performance.
I read this and think that maybe you know more about Cell than you are willing to let on
Given its target market in the embedded and console space, I don't think this is a requirement. Also, with a substantially different memory addressing scheme it might not be necessary to go to 64-bit pointers to acheive larger memory sizes. The scalar integer units may well be 64-bit, however -- at least on the main core. The cost in terms of processor complexity is not that high.
64 bits will be huge in the future. In the case of Cell I think it would simplfy things more than anything. As you say the cost isn't that high.
On the other hand the embedded an console market is heading towards 64 bit machinery. It is simply a matter of getting costs under control, with memory beign the big issue. So we are talking a year or two before large memory systems are cost effective. I can't see this team designing a chip that is only going to be competitive for a year or two.
On the other hand they still provide a considerable amount of system services which use hardware acceleration internally. The OpenGL shaders, CoreAudio, CoreImage, QuickTime, vector library, network stack, etc could all be re-optimized over time to take advantage of specialized hardware. Nonetheless I tend to think that Apple would rather stick with the traditional PowerPC w/ AltiVec programming model and start adding cores. I don't actually expect to see Cell in Apple's future.
I see Cell in Apples future but maybe not in the way you see it. I see Cell as an opportunity on IBM's part to optimize the execution units within the PPC line. What would be really neat is if Cell lead to hand crafted execution units to replace the dense sea of logic that is the 970 we all know and love. The idea here being to replace some of the hot stuff within the 970's.
So hopefully Cell becomes a proving or development framework for things that can be extended to the 970 series.
Cell cannot "emulate" AltiVec. They are two different beasts.
I'm not sure you meant to say that. Obviously Cell can emulate anything it wants to emulate, that is simply a matter of writing the right code.
What I was getting at and this is only a consderation if Apple is interestedin Cell, has there been an attempt to design the Cell vector units so they could emulate or work in place of, in an efficent manner the Alt-Vec units.
Well I'm glad somebody got the joke.
AMD isn't at 90 nm yet and they recently pushed back their scheduled move to that process. I think Apple's use of water cooling on the 2.5 GHz machines was a direct result of a deep desire to keep the machines quiet and deal with the very significant heat density issues that result from being able to suddenly expend such a huge amount of power from such a small area. These G5s can go from very low power consumption to very high power consumption very quickly and water has the specific heat capacity to absorb that initial spike without having to continuously keep fans blowing at full speed. If the darn unit didn't looks so... so... automotive then it might actually be a compelling piece of technology.
That automotive look is technology none the less. Old technology yes but well understood and reliable.
The G5's though are just the opposite. Very new technology that frankly has not meant expectations of anybody. That is not to say that the 970's don't work, just that they haven't gotten to where people (JOBS) had expected. What is worst they didn't get there and where they are now is problematic.
It is all well and nice that IBM has a 90nm process but we shouldn't forget that the process needs alot of work. Instead of claiming to have hit a wall IBM really should be saying they are working on breaching the wall. Otherwise one is left with the impression that PPC doesn't mean much to them. IBM needs a leading edge 90nm process not a trailing edge process.
Originally posted by wizard69
What is clear is that I really don't have much information here, the issue is that I'd be surprised if those 8 vector cores are as wide as the Alt-Vec unit in PPC. What I see is 8 cores of very modest width (maybe one word) that are used together in various combinations.
I disagree. I expect 128-bit registers just like AltiVec and the PS2's vector units. Each vector core will have a set of at least 32 of them. Consider, for example, that the latest ATI & nVidia graphics chips have large numbers of vertex and pixel shader engines with 12+ 128-bit 4-way floating point registers each.
The problem as I see it is that having that many cores of the Alt-Vec type will take up a huge amount of room. Not just for the cores but for the supporting caches and communications logic. I just don't see current processes supporting that many Alt-Vec type units well on one chip.
Number of transistors is far less of a problem than you think... especially on the 65nm process. Also, many of AltiVec's transistors goes to certain kinds of functionality that may not be present in the vector cores.
The other problem is the efficency of that sort of implementation. Making full use of that many wide vector units would be a problem. Thus the thought that the vector units would be narrow devices, very much like many of the DSP's available today.
You've heard all the grousing about how hard the PS3 will be to program for, right...? Sony isn't terribly interested in making it easy on the software guys.
I read this and think that maybe you know more about Cell than you are willing to let on
Word == Mum.
64 bits will be huge in the future. In the case of Cell I think it would simplfy things more than anything. As you say the cost isn't that high.
Yes, I expect to see 64-bit integer registers on the scalar core. Whether conventional 64-bit addressing is supported on all (or any) of the cores is another matter.
I see Cell in Apples future but maybe not in the way you see it. I see Cell as an opportunity on IBM's part to optimize the execution units within the PPC line. What would be really neat is if Cell lead to hand crafted execution units to replace the dense sea of logic that is the 970 we all know and love. The idea here being to replace some of the hot stuff within the 970's.
Possibly, although issues of design IP must be a bit hairy with Sony and Toshiba involved.
I'm not sure you meant to say that. Obviously Cell can emulate anything it wants to emulate, that is simply a matter of writing the right code.
Okay, what I meant was "emulate in a useful fashion". The vector cores probably couldn't emulate it at all due to limited/specialized functionality.
What I was getting at and this is only a consderation if Apple is interestedin Cell, has there been an attempt to design the Cell vector units so they could emulate or work in place of, in an efficent manner the Alt-Vec units.
Not without different software.
It is all well and nice that IBM has a 90nm process but we shouldn't forget that the process needs alot of work. Instead of claiming to have hit a wall IBM really should be saying they are working on breaching the wall. Otherwise one is left with the impression that PPC doesn't mean much to them. IBM needs a leading edge 90nm process not a trailing edge process. [/B]
Keep in mind what they actually said -- conventional scaling has hit the wall. They (and everyone else) still have avenues to pursue, but the rapid clock rate scaling of the past is over. They don't have a trailing edge process, and everybody else is having troubles too. I don't see where you get the impression that PPC doesn't mean much to them, aside from being impatient for the next new thing.