The idea while possible doesn't leverage the design of Cell at all. The first thing to realize about Cell as a chip is that it has a full PPC 64 implementation on die. It doesn't need a support processor to implement a Mac.
What is unique about this PPC core is that it has the support processors (SPE'S) on the same die to allow extremely fast communications between elements. This fast communications in my opinion is really what Cell is offering up. This fast communications permits the use of the limited funtionality SPE's in an effiecent manner.
As to what Apple gets do realize that I'm not associated with Apple at all and infact I live in NY. So I really don't know what they are up to. However; If I where an engineer or maybe even Jobs at Apple I would be positively dripping with excitement about this technology.
I say technology because there is no reason for Apple to use the spcific Cell chip we know about. I would expect them to want to implement the chip in the low cost / desktop end of the business fairly quickly. There is tremendous potential here but one should not get overly excited as there are always gotchas. It would be interesting to see how well the Mac performs with the greatly reduced context switching need to service all the media channels current in use on a Mac.
Dave
Quote:
Originally posted by Brendon
That has been my thought, but I am just a speculator, how about it Wiz69 and Programmer, what do you think? to me in this scenario Apple gets it all, a great processor for the Cores of Tiger and access to much easier porting and better running games, as well as something for the Science Core.
Yes, except that IBM also had the system-on-chip expertise and ISA design expertise. So hopefully the SPEs aren't the disaster of an ISA that the PS2 vector units are (which is part of why there aren't any C/C++ compilers for the PS2's vector units).
Well it already appears that IBM did a good job with the SPE units but I'd really like to wee what the complete instruction set is.
Quote:
This I doubt. They probably didn't even tell Sony they had other customers for this new Power core -- it is an product that IBM created to license to anybody interested. It is very likely they have another big customer for the core... and its not Apple. Apple might be interested, but without the rest of the Cell attached it is much less compelling.
Well the PPE could be very compelling for Apple and its laptop and low power requirements even without the SPE's.
Quote:
Remember that IBM said explicitly that the 970FX would be a laptop chip. That may have changed since 90nm didn't pan out the way everyone thought, but that doesn't change the fact that they were planning for it to scale down in terms of power.
Yes I heard that too, but I also heard some time ago that IBM and Apple had a large contract that covered more processors than just the 970 and one generation after that. One of these processors was specificlaly for the low power market.
As to the 970 in a laptop, I'm just not sure what happened there. I never was one to see the 970 going into any laptop that Apple would want to sell.
Quote:
"In the loop" is very different than being "involved in design". What Apple needed to contribute to the processor could be done between an Apple and an IBM engineer in about 5 minutes of verbal conversation. IBM didn't consult Apple on the design of the 970's core -- they already had it from the POWER4. Apple said "we need VMX", so IBM tacked it on. The bus design was the one point where Apple may have had more influence because they've designed the only 970 northbridge we know of. In the case of Cell, however, that is provided by RamBus.
Well you may want to think that Verbal communications where enough to get the 970 off the ground, knowing IBM though I suspect that every little thing had to be documented to the tee. As to the actual developemnt of the processor I would not be surprised to find Apple engineers stationed with IBM engineers or atleast doing alot of travel back and forth.
Quote:
Probably more. Have I said otherwise? My point is merely that Apple wasn't involved in the chip's development, not that they aren't a likely customer.
Well I geuss part of the proble is that maybe we have a different understanding of "involved". Considering IBM's past with respect ot AltVec/VMX I would have to suggest that Apple was involved heavily. As to the SPE's I wonder how much of them is even the work of Sony/Toshiba rather than an answer to Apples wants.
Obviously that is a bit of a stretch, but Apple has got to realize that VMX/AltVec is getting a little old in the tooth and that to move forward will require new capabilities here.
This article and the one at ArsTechnica should tell you what you need to know. It seems to me that many people that read these articles miss half their content, so go back and read it again... it is packed with useful information.
Yeah I've read just about everything I can find on the net related to Cell. Unfortunately much of it is useless ramblings. The good stuff, such as Ars, is just not detailed enough to grasp fully what a SPE is capable of doing.
Quote:
These things are vector processors. It is probably going to be possible to run scalar code on them (IBM said that they were working with "open source compiler writers", i.e. GCC, about providing compilers for SPE), but that doesn't mean it'll run fast.
Yes I understand they are SIMD machines but they are also capable of running complete programs or atleast code fragments. If that is the case then there is an implication that they must be able to handle some scalar type code. The question is, is this a simple extension to the vector unit to give it a program counter and a branch unit or is it a more involved core such as an enhanced 440.
For a general purpose computer such as a Mac the question is important because it indicates just how flexible the SPE's are. Lets face it even a 440 core running at 4 GHz will do alot of general purpose work. Maybe the thing to do is to rummage through GCC's change logs to see if anything has been commited yet.
Well there was a PPC440 patent acquisition made fairly recently from IBM (to the Cell gang), no?
In other interesting notes, I happened across the inkling not too long ago that the 440's actually find use in some of Cisco's switches and routers. Neat, eh?
The idea while possible doesn't leverage the design of Cell at all. The first thing to realize about Cell as a chip is that it has a full PPC 64 implementation on die. It doesn't need a support processor to implement a Mac.
What is unique about this PPC core is that it has the support processors (SPE'S) on the same die to allow extremely fast communications between elements. This fast communications in my opinion is really what Cell is offering up. This fast communications permits the use of the limited funtionality SPE's in an effiecent manner.
As to what Apple gets do realize that I'm not associated with Apple at all and infact I live in NY. So I really don't know what they are up to. However; If I where an engineer or maybe even Jobs at Apple I would be positively dripping with excitement about this technology.
I say technology because there is no reason for Apple to use the spcific Cell chip we know about. I would expect them to want to implement the chip in the low cost / desktop end of the business fairly quickly. There is tremendous potential here but one should not get overly excited as there are always gotchas. It would be interesting to see how well the Mac performs with the greatly reduced context switching need to service all the media channels current in use on a Mac.
Dave
OK I guess that there are two groups one that says that Apple should utilize this technology in its purest form, to reap the full rewards. And another group that sees Apple not jumping into this water, but taking baby steps. OK the best question is would cell make a better video accelerator chip than what is on the market at that time? If so why wouldn't Apple transition into cell and not jump? Apple could make a daughter card for video acceleration and possible do it cheaper than what they are paying for video cards. Apple could also request that the current PPCs have the cell bus installed in them and for IBM to add VMX units and memory.
OK I guess that there are two groups one that says that Apple should utilize this technology in its purest form, to reap the full rewards. And another group that sees Apple not jumping into this water, but taking baby steps.
Or we could look at it as two schools, one cautious the other inhabited by raving mad men. In any event I think that maybe you and many others mis one important point about Cell, that is that it is mudular and the PPE (in Cell) is a PPC core. The expectation is that many variants of Cell will exist, this has been indicated by STI. Of interest to Mac users is the PPE and it should be an interest if it is in Cell or not.
Quote:
OK the best question is would cell make a better video accelerator chip than what is on the market at that time?
I'm not sure why there is such a strong association with Cell and video hardware. Everyone here should realize that nVidia is making a graphics chip for Sony's PS3. Cell is not a video chip, rather think of it as a processor that can handle multiple threads in hardware. Some of those threads are specialized onto SPE's. Now just what those SPE's are capable of doing isn't completely clear, alot of focus has been placed on the SIMD component but I've yet to see a definitive explanation of what the SPE's have going for them.
Quote:
If so why wouldn't Apple transition into cell and not jump?
Frankly I don't know what Apple is up to and niether do the majority of the people on this list. But look at it this way and see if anything below adds up in your mind.
1.
Apple is in desperate need of a low power processor for both the portables and the small form factor machines. The PPE in Cell provides this. Apparently anyways as it does appear that deliberate mis-information has been planted with respect to the low power issue.
2.
Apple hardware has been wanting for some time in Audio performance. An SPE or two could immediately correct this.
3.
The world is moving to parallel processing, either at the application level or at the system level. The PPE fits right in here in two ways. First it is threaded and second it is small to the point of allowing more than one to a die.
4.
AltVec is getting long in the tooth. Alternative technology is needed here, no matter what you hear about backwards compatibility. Cell and the SPE's are a good approach to processor loading.
Quote:
Apple could make a daughter card for video acceleration and possible do it cheaper than what they are paying for video cards. Apple could also request that the current PPCs have the cell bus installed in them and for IBM to add VMX units and memory.
Why would Apple want to add a daughter card when Cell can already take the place of just about all of Apples current processors? Cell makes sense as a system processor for low cost high performance hardware. Now anything is possible but who would buy?
As to current PPC why owuld they do this when Cell already has a PPC in there? A PPC that IBM did add VMX to. As to memory units Cell was designed right from the start to be modular. It would not surprise me at all to see Cell with different buses. Infact the different buses would be required by STI's goals.
spe in cell are thread and mac os x has a good multithreading stuff, plus with cocoa is easy to make threaded applications. Plus all coremedia can be acclererated with spe in trasparent mode to other developer, the good example of audio performance
and even if ppe are more simple that g5, don' t forget that play 3 will have 4 cell inside: can a powermac have less?
Or we could look at it as two schools, one cautious the other inhabited by raving mad men. In any event I think that maybe you and many others mis one important point about Cell, that is that it is mudular and the PPE (in Cell) is a PPC core. The expectation is that many variants of Cell will exist, this has been indicated by STI. Of interest to Mac users is the PPE and it should be an interest if it is in Cell or not.
Raving I think could discribe one, yes. But the rest are over zealious IMHO. I'm a cautious one and I just don't have enough information about CELL and the variants of it to say "Yes Apple should build a computer around those" yet.
Quote:
I'm not sure why there is such a strong association with Cell and video hardware. Everyone here should realize that nVidia is making a graphics chip for Sony's PS3. Cell is not a video chip, rather think of it as a processor that can handle multiple threads in hardware. Some of those threads are specialized onto SPE's. Now just what those SPE's are capable of doing isn't completely clear, alot of focus has been placed on the SIMD component but I've yet to see a definitive explanation of what the SPE's have going for them.
Vector processing and video cards.
Quote:
Frankly I don't know what Apple is up to and niether do the majority of the people on this list. But look at it this way and see if anything below adds up in your mind.
1.
Apple is in desperate need of a low power processor for both the portables and the small form factor machines. The PPE in Cell provides this. Apparently anyways as it does appear that deliberate mis-information has been planted with respect to the low power issue.
2.
Apple hardware has been wanting for some time in Audio performance. An SPE or two could immediately correct this.
3.
The world is moving to parallel processing, either at the application level or at the system level. The PPE fits right in here in two ways. First it is threaded and second it is small to the point of allowing more than one to a die.
4.
AltVec is getting long in the tooth. Alternative technology is needed here, no matter what you hear about backwards compatibility. Cell and the SPE's are a good approach to processor loading.
Ding, ding, ding.
Quote:
Why would Apple want to add a daughter card when Cell can already take the place of just about all of Apples current processors? Cell makes sense as a system processor for low cost high performance hardware. Now anything is possible but who would buy?
OK maybe clearer, in this case you say CELL can replace the 970 and I hear SHELL can replacethe 970. From what I have heard the PPC in CELL is a shell of the 970, so there are some things it does not do well. Double precision for one if I remember correctly. So maybe, I think, Apple would like curtain aspects of cell but not all, especially if there is a 980 or a 970MP that Apple has its eyes on. So how to benefit from cell and still have the option for a PPC upgrade, now that IBM has had some time to adress issues with manufacturing. You see I see a deep valley to the other side, CELL, some say jump and that is fine but I don't think that Apple will jump. I feel that they will build a bridge, and that is the argument. What, how, and when will Apple build this bridge. So yes this may be great for a powerbook and an iBook and the Mini, but cell has some things that would be of intrest to the high end as well. How does Apple go there? Excuse my ignorance but daughtercard springs to mind or some kind of support chip that shares the same bus. At least this way the bridge is being built even if you say "hey I still can't use that to get to the other side" we are, at least, closer.
Quote:
As to current PPC why owuld they do this when Cell already has a PPC in there? A PPC that IBM did add VMX to. As to memory units Cell was designed right from the start to be modular. It would not surprise me at all to see Cell with different buses. Infact the different buses would be required by STI's goals.
Thanks
Dave
As I already mentioned, to get that 4GHz speed they stripped down a PPC of the things that would make adding speed more difficult, the bare essentials if you will. So is it really a good substitute for a 970 or a 970MP if we ever see one, which think we will. For curtain tasks yes, but so is the video accelerator chip better at doing curtain things than a general processor is.
OK maybe clearer, in this case you say CELL can replace the 970 and I hear SHELL can replacethe 970. From what I have heard the PPC in CELL is a shell of the 970, so there are some things it does not do well. Double precision for one if I remember correctly.
Well I haven't heard anything at all about the PPE and its double precision capabilities. The SPE's do take a hit but that is relative to their single precision performance, the combined double performance of the Cell as a whole is still hard to match.
The bigger question in my mind is just how important this is to Apples low cost line? A PPE base processor would be a big win on any of Apples lower end equipment today. A complete Cell implementation would be an even bigger win, if software support is available.
Quote:
So maybe, I think, Apple would like curtain aspects of cell but not all, especially if there is a 980 or a 970MP that Apple has its eyes on. So how to benefit from cell and still have the option for a PPC upgrade, now that IBM has had some time to adress issues with manufacturing. You see I see a deep valley to the other side, CELL, some say jump and that is fine but I don't think that Apple will jump.
Maybe maybe not, but Apple would be absolutely foolish to not pursue a PPE based processor for their own needs. Remember Cell is a derivation of IBM's PPE technology and is itself modular, a custom arraingement for Apple should be a snap. Maybe Apple won't call it Cell but it will similar technology.
Quote:
I feel that they will build a bridge, and that is the argument. What, how, and when will Apple build this bridge. So yes this may be great for a powerbook and an iBook and the Mini, but cell has some things that would be of intrest to the high end as well. How does Apple go there? Excuse my ignorance but daughtercard springs to mind or some kind of support chip that shares the same bus. At least this way the bridge is being built even if you say "hey I still can't use that to get to the other side" we are, at least, closer.
Still you seem to be missing an importnat element, Cell has a PPC in it there is no bridge to be fashioned. The only thing Apple needs is the willingness to use the technology and a variant from IBM that is better suited to the memory systems PCs use.
Quote:
As I already mentioned, to get that 4GHz speed they stripped down a PPC of the things that would make adding speed more difficult, the bare essentials if you will. So is it really a good substitute for a 970 or a 970MP if we ever see one, which think we will.
I'm not sure it is accurate to describe Cell in this manner. In any event we are all waiting for more details on the PPE in Cell. This should clear things up a bit with respect to its performance. It should be noted though that much of what is in the 970 goes to generating heat not performance, so IBM may have eliminated a liability more than anything.
Quote:
For curtain tasks yes, but so is the video accelerator chip better at doing curtain things than a general processor is. [/B]
It should be noted though that much of what is in the 970 goes to generating heat not performance
Err. This statement doesn't make sense. Everything that went into the 970 is for generating performance. That it requires a lot of power is a consequence of how it generates performance. The 970 has OOOE, instruction grouping, instruction "micro" code, 8 instruction fetch, 4+1 dispatch+branch, and lots of execution resources all in an effort to extract more performance per cycle.
You already know Cell PPE and SPEs take the opposite approach and achieve performance through high clock rate.
Err. This statement doesn't make sense. Everything that went into the 970 is for generating performance.
Well I would agree that was the goal. It was not however the result.
Quote:
That it requires a lot of power is a consequence of how it generates performance. The 970 has OOOE, instruction grouping, instruction "micro" code, 8 instruction fetch, 4+1 dispatch+branch, and lots of execution resources all in an effort to extract more performance per cycle.
Ahh yes the power needed to generate the performance. The problem is the very power required to generate this performance is what limits the preformance of the processor. Lets face it going to water cooling was an glaring indication that all that "POWER" was going out the window as heat.
One could look at the problems of scalling the 970 as a clear indication that everything that went into the 970 did not achieve the goals performance wise.
Quote:
You already know Cell PPE and SPEs take the opposite approach and achieve performance through high clock rate.
Well not really high clock rate per say. The elimination of a great deal of over head allows for much better performance per watt. As a result of the simpler. lower power design they are able to achieve much higher clock rates. I do wonder if we will be able to make direct comparisons between the two technologies on Apple hardware.
Well I would agree that was the goal. It was not however the result.
I really don't understand how you can come to this conclusion. The result is higher performance, not just "heat". The 970fx power consumption power and performance are in line with x86 processors of the same vintage. It does well on integer, fantastic on floating point, and well on SIMD ops for its class of CPU.
The only thing it really is short on are integer resources and a large backside L3 cache. IBM can fix that if they choose to. They could also optimize the 970 circuit design for better power efficiency as well, if they choose to. I'll hazard a guess that IBM was concentrating all its resources on producing processors for a couple of big customers other than Apple, and thusly, choose not to optimize the 970.
Quote:
Ahh yes the power needed to generate the performance. The problem is the very power required to generate this performance is what limits the preformance of the processor. Lets face it going to water cooling was an glaring indication that all that "POWER" was going out the window as heat.
Apple could have easily used air cooling. They just preferred to do something more elegant in their minds. Otherwise, all those 100 Watt x86 processors must be using some sort of magic heat sink to cool just by air alone. Not only that, those x86 processors are all going dual core too. And lets remember those air blowers for the 2.3 GHz 970fx in the Xserves. That's two 2.3 GHz 970fx in a 1.75" enclosure which just use air cooling.
The only technical reason I can think of for Apple needing to use water cooling is some sort of problem with the 970fx ceramic packaging. It couldn't take the thermal cycling or something. If it is so, IBM should fix the packaging. Otherwise, the 970fx power profile of 100 Watts max at 2.5 GHz and 1.3V is in line with other processors of today, and Apple chose water cooling based on artistic rather than technical reasons.
Quote:
One could look at the problems of scalling the 970 as a clear indication that everything that went into the 970 did not achieve the goals performance wise.
I'm pinning most of the blame on IBM's optimistic marketing of their underperforming 90 nm fab. I'll grant you that the 970 burns hotter than average for a 52 million transister chip at 90 nm. I don't think the hotter-than-average power consumption is limiting however.
Quote:
Well not really high clock rate per say. The elimination of a great deal of over head allows for much better performance per watt. As a result of the simpler. lower power design they are able to achieve much higher clock rates. I do wonder if we will be able to make direct comparisons between the two technologies on Apple hardware.
I'm in a wait and see attitude for Cell. They are making a lot of compromises for that clock rate and there has been no acounting of how those compromises effect performance. So, performance per watt is still up in the air.
I really don't understand how you can come to this conclusion. The result is higher performance, not just "heat". The 970fx power consumption power and performance are in line with x86 processors of the same vintage. It does well on integer, fantastic on floating point, and well on SIMD ops for its class of CPU.
Well this is the area of contention, that is "higher performance". From my point of view the only thing that the 970 series does well is floating point, integer and SIMD operations win mostly due to higher clock rate. Yeah there are some codes that the 970 performs well on, but it certainly doesn't justify the huge amount of realestate devoted to out of order execution.
One should just ask if all of that hardware is worth it when the only wins come from higher clock rates.
Quote:
The only thing it really is short on are integer resources and a large backside L3 cache. IBM can fix that if they choose to.
Again you prove my point! The perception is that the 970 lacks integer resources yet it has a huge amount of die area devoted to trying to make use of the resources it has. In effect you have part of the chip spinning its wheels (producing heat) and accomplishing nothing.
Quote:
They could also optimize the 970 circuit design for better power efficiency as well, if they choose to. I'll hazard a guess that IBM was concentrating all its resources on producing processors for a couple of big customers other than Apple, and thusly, choose not to optimize the 970.
Well this I tend to agree with, atleast from the standpoint that much more could be done for the 970. I suspect though that that energy went into the PPE, as it would solve many of the problems with the 970 right off the bat.
Quote:
I'm pinning most of the blame on IBM's optimistic marketing of their underperforming 90 nm fab. I'll grant you that the 970 burns hotter than average for a 52 million transister chip at 90 nm. I don't think the hotter-than-average power consumption is limiting however.
I'd say it is pretty limiting if they didn't reach the projected speeds that they thought they would get. I really don't understnad how people cna deny that Apple/IBM did not make 3GHz and that heat was a problem there.
The reality is that he 970 is very hot for a chip its size. Further it really doesn't perform as well as some would like. Sure in axcells at some tasks but it doesn't lead the pack by most measures.
Quote:
I'm in a wait and see attitude for Cell. They are making a lot of compromises for that clock rate and there has been no acounting of how those compromises effect performance. So, performance per watt is still up in the air.
Well we certainly don't have much info on Cell or the PPE's but I'd take a bunch of PPE's on one chip over the current 970's any day. Yes I'm saying that based on little information, only that that is public. The reality is that the industry is moving to SMP to support systems with a large number of processes and threads, this is where the PPE's would work wonders! Performance per watt is still important but keeping the power disapation under control is vital to allow printing large numbers of processors per die.
. . . I'd take a bunch of PPE's on one chip over the current 970's any day. . .
From a photo of the Cell chip, it looks like IBM could replace the SPEs with two more PPE cores with 512KB L2 cache. (I remember rumors about a 3 core chip for Xbox.)
Well this is the area of contention, that is "higher performance". From my point of view the only thing that the 970 series does well is floating point, integer and SIMD operations win mostly due to higher clock rate. Yeah there are some codes that the 970 performs well on, but it certainly doesn't justify the huge amount of realestate devoted to out of order execution.
Out of order execution is necessary for extracting more scalar performance out of the deeper pipelined, higher clock rate architecture. With just 2 integer units, it has about the same integer performance at the same clock rate as the 744x which have 4 integer units. The expenditure of transistors for OOOE is justified. Having a deeper pipeline allowing it to clock higher, at the cost of more power consumption, allows it to have better overall integer and SIMD performance. The expenditure of transistors for a deeper pipeline is justified.
Quote:
One should just ask if all of that hardware is worth it when the only wins come from higher clock rates.
It's worth it. Apple should be using the 970 in all of its desktops and at least a high end laptop. At minimum, a 1.4 GHz G5 should be in the Mac Mini and a 1.6 GHz G5 should be in the eMac. They only choose not to.
Quote:
Again you prove my point! The perception is that the 970 lacks integer resources yet it has a huge amount of die area devoted to trying to make use of the resources it has. In effect you have part of the chip spinning its wheels (producing heat) and accomplishing nothing.
Again, the 970 is extracting about the same integer performance out of 2 integer units as the 744x does with 4 integer units at the same clock rate. That's a win for OOOE. Not only that it delivers more performance by having higher clock rate.
The perception that it lacks integer resources is because the instruction issue, fetch and dispatch can deliver more instructions than the current integer units can handle. This is mostly due to the lineage of the 970 with the Power4. Power4's market doesn't require absolute integer performance. The desktop market does. In the creation of the 970, IBM should have added 2 add-only integer units, or maybe a full one, along with the VMX unit, but the 970 integer performance was acceptable to Apple, and the only real change required was the addition of the SIMD unit.
Quote:
I'd say it is pretty limiting if they didn't reach the projected speeds that they thought they would get. I really don't understnad how people cna deny that Apple/IBM did not make 3GHz and that heat was a problem there.
IBM is using a conservative fab technique, in terms of performance, not yield, for the 970fx. They'll get to 3 GHz with dual stress liners (strained silicon technique) and or low-k. Dual stress liners is purported to have a 24% improvement in transistor speeds at the same power levels. The low-k should yield transistor speed improvement on the same order.
Quote:
The reality is that he 970 is very hot for a chip its size. Further it really doesn't perform as well as some would like. Sure in axcells at some tasks but it doesn't lead the pack by most measures.
It's in the same class at the Athlon 64 and Prescott in performance, die size and power consumption. That's doing pretty well. It clocks higher than the G4 on the same process and therefore has higher overall performance than the G4. That's a win for Apple.
In the future, well, we'll see what happens. There are many paths for the same result.
Quote:
Well we certainly don't have much info on Cell or the PPE's but I'd take a bunch of PPE's on one chip over the current 970's any day. Yes I'm saying that based on little information, only that that is public. The reality is that the industry is moving to SMP to support systems with a large number of processes and threads, this is where the PPE's would work wonders!
The Cell processor has 230-some million transistors. That's enough transistors to create a 970-based quad-core processor with the same die size as the Cell. There is too much unknown about the PPE to make any judgements yet.
Out of order execution is necessary for extracting more scalar performance out of the deeper pipelined, higher clock rate architecture. With just 2 integer units, it has about the same integer performance at the same clock rate as the 744x which have 4 integer units.
Yes and which one runs cooler?
Quote:
The expenditure of transistors for OOOE is justified. Having a deeper pipeline allowing it to clock higher, at the cost of more power consumption, allows it to have better overall integer and SIMD performance. The expenditure of transistors for a deeper pipeline is justified.
Well at least you recognize the power usage now. The problem is that at a given clock rate the overall performance of the 970 series isn't better, this you seem to mis. The OOOE engine did not increase performance enough to justify the expnse in transistors and heat. In other words it was UN-RISCy.
Quote:
It's worth it. Apple should be using the 970 in all of its desktops and at least a high end laptop. At minimum, a 1.4 GHz G5 should be in the Mac Mini and a 1.6 GHz G5 should be in the eMac. They only choose not to.
I can't honestly believe that you said that!!!!!!! First off how would you stick a 970 in a Mini? Second the performance would not be there, the 970 relies to a great extent on its high speed I/O bus for performance. Beyond the floating point unit that 970 is rather lackluster.
Considerign that Apple will have G4 follow ons available soon with integrated memory controllers the bandwidth limitations will be gone with respect to the G4. All of this will be available in a package that will actually run at a higher clock rate and much cooler than any 970 offering.
Quote:
Again, the 970 is extracting about the same integer performance out of 2 integer units as the 744x does with 4 integer units at the same clock rate. That's a win for OOOE. Not only that it delivers more performance by having higher clock rate.
Well that is one way to look at it, do ralize I see the opposite. The OOOE excution engine is a heat producer, one that doesn't lead to work being done.
Quote:
The perception that it lacks integer resources is because the instruction issue, fetch and dispatch can deliver more instructions than the current integer units can handle. This is mostly due to the lineage of the 970 with the Power4.
Agian how can you see the OOE being successful if the units being feed can't handle what is being presented to them.
Quote:
Power4's market doesn't require absolute integer performance. The desktop market does. In the creation of the 970, IBM should have added 2 add-only integer units, or maybe a full one, along with the VMX unit, but the 970 integer performance was acceptable to Apple, and the only real change required was the addition of the SIMD unit.
I'm not sure it was acceptable at all to Apple! They got what could be produced in the fastest acceptable time frame. It is simple as that, Apple was desparate for more power they didn't have time for IBM to redo the entire architecture of the Power4.
Quote:
IBM is using a conservative fab technique, in terms of performance, not yield, for the 970fx. They'll get to 3 GHz with dual stress liners (strained silicon technique) and or low-k. Dual stress liners is purported to have a 24% improvement in transistor speeds at the same power levels. The low-k should yield transistor speed improvement on the same order.
Yes they may very well do this and it is likely that such a chip will be produced for Apple. But like I said before, I'd much rather have the technology applied to a chip with multiple PPE's. Two or four would be a nice start.
Quote:
It's in the same class at the Athlon 64 and Prescott in performance, die size and power consumption. That's doing pretty well. It clocks higher than the G4 on the same process and therefore has higher overall performance than the G4. That's a win for Apple.
Since when are the current G4's on 90nm? Also relative to AMD's chips the 970 is very power hungery especially if you count up the number of transistors on chip.
Quote:
In the future, well, we'll see what happens. There are many paths for the same result.
The Cell processor has 230-some million transistors. That's enough transistors to create a 970-based quad-core processor with the same die size as the Cell. There is too much unknown about the PPE to make any judgements yet.
Yep Quad cores would be fantastic. By the same token though the PPE's would allow four cores on a much smaller die. These are likely to run cooler and faster than the 970's.
As to what Apple is getting on the next go around, it is likely to be a dual core 970 variant. That should be the avenue of least resistance given that process technology can be improved. But I would not be surprised to see IBM/Apple switch over to PPE based processors rahter quickly.
The 744x processor obviously. That doesn't negate the fact that 970 has higher performance than the 744x.
Quote:
Well at least you recognize the power usage now. The problem is that at a given clock rate the overall performance of the 970 series isn't better, this you seem to mis. The OOOE engine did not increase performance enough to justify the expnse in transistors and heat. In other words it was UN-RISCy.
I've always recognized the power usage. The only thing I object to is saying that "much of what is in the 970 goes to generating heat not performance." That's simply an untrue statement. It generates heat yes, but it is in service of generating performance.
The 970 is Apple's highest performance processor. The lowest clock rate Apple ships the 970 at is 1.6 GHz, and it is on average as fast or faster than any single processor G4 ever shipped. That's the lowest performing 970 Apple uses. Apple has nearly another GHz of clock rate to work with above that. That's a win for Apple.
Quote:
I can't honestly believe that you said that!!!!!!! First off how would you stick a 970 in a Mini? Second the performance would not be there, the 970 relies to a great extent on its high speed I/O bus for performance. Beyond the floating point unit that 970 is rather lackluster.
I wouldn't stick a 970 in a Mac mini. I would stick a 970 in a larger case than the mini, but it would still be $500. How much of a larger case, I don't know. Maybe cube sized. The mini's main attraction is its price, not form factor. Otherwise, the G4 cube would have been more successful.
The 970 with 1/3 bus ratio is fine. A 1.4 GHz CPU with 467 MHz FSB and it'll be faster than any G4 FSB ever shipped.
Quote:
Considerign that Apple will have G4 follow ons available soon with integrated memory controllers the bandwidth limitations will be gone with respect to the G4. All of this will be available in a package that will actually run at a higher clock rate and much cooler than any 970 offering.
Yes, but I bet when those e600 chips ship, Apple will have 970 chips that will have higher clock rate and higher performance than those e600 chips. And I bet they would be cheaper as well.
Quote:
Agian how can you see the OOE being successful if the units being feed can't handle what is being presented to them.
It is successful because the 970 manages the same integer IPC as the 744x with less integer execution resources and a pipeline with twice as many stages. Deeper pipelines result in a higher clock and therefore higher performance at each process node.
Integer performance is much more important for desktop uses. The perceived lack of integer resources is the result of that. If it had 4 integer units to match the dispatch width, it would be dominating (as opposed to competitive) in integer performance, no?
Quote:
I'm not sure it was acceptable at all to Apple! They got what could be produced in the fastest acceptable time frame. It is simple as that, Apple was desparate for more power they didn't have time for IBM to redo the entire architecture of the Power4.
It's strange that IBM had the time to integrate an SIMD unit on it. That's a whole lot more complex than adding another integer unit.
In the end, they did get a nice chip out of it that has the same integer performance per clock as the G4, yet clocks higher. More optimization of the 970 core will make it better. IBM has had another 2 years to optimize since the G5 systems first shipped, but you and I agree that they have higher priorities other than the 970.
Quote:
Since when are the current G4's on 90nm? Also relative to AMD's chips the 970 is very power hungery especially if you count up the number of transistors on chip.
Apple shipped 2 GHz 130 nm 970 systems 1.5 years ago. Motorola/Freescale has been shipping 130 nm 7447 chips for almost a year now. Where are those 2 GHz G4 chips? Likewise at 90 nm, the 970 will still have better performance than the G4-based chips.
130 nm Athlon 64 has 105 million transistors with 190 sq mm die size and a TDP of 90 Watts. 130 nm Athlon Barton has 54 million transistors with 101 sq mm die and something like 80 watts TDP for the 2.1 GHz. 130 nm 970 has 52 million transistors with ~120 sq mm die and something like 90 watts. Comparable to me.
"Very power hungry" is not the correct term.
Quote:
Yep Quad cores would be fantastic. By the same token though the PPE's would allow four cores on a much smaller die. These are likely to run cooler and faster than the 970's.
If the PPE performance is in the end greater than the 970, I'm all for it, but there is zero information on it.
The Cell processor has 230-some million transistors. That's enough transistors to create a 970-based quad-core processor with the same die size as the Cell. There is too much unknown about the PPE to make any judgements yet.
Bear in mind that when you say 230 mil transistors for a Cell processor, it's not like it is just another giant monolithic RISC monstrosity (in the same spirit as an Intel P4, for instance). What you get is a single "fancy/superscalar processor", teamed with 8 smaller coprocessors, each with their own SIMD pipeline. Therein lies the compelling design that you get for that big lump of transistors. So it's not a comparison of "1" Cell vs. a 4-core G5 chip for the silicon space you have available. It is more like a 1 + 8 coprocessor/SIMD cell team vs. a 4-core G5 chip. That's certainly not something to scoff at, either.
Think of it this way- if you have an office business with a large pool of office work (ranging from complex managerial tasks to simple, repetitive tasks) to get done, what's the most efficient way to configure your staff to get that work done as fast as possible? Do you hire a single "super manager" that can do all tasks at an accelerated pace (which would also demand a kingly premium salary, since he can do that), or do you hire a more moderate manager to handle the complex jobs as well as oversee a team of "smaller" workers that can handle mostly simple tasks, but can do large volumes of it? If the workload isn't that great, then perhaps the "single super manager" is not such a bad choice. On a certain level, it still doesn't make sense if you have that super manager chewing away on a pile of very simple work (thereby using a paltry 10% of his sheer processing potential) that could have been done by a smaller "associate"(s). At some point as the volume of simple, repetitive tasks scales up considerably, it then makes more sense to go with a medium manager with a team of associates. The team of associates also have the flexibility of being scaled in size to more accurately match the workload. You can get quite a bit more bang for your buck with a large, scaleable team, rather than blowing the budget on 4-super managers to do all the work. Therein lies the "magic" of this Cell topology, imo.
It would not surprise me a bit if the OoOE capability ends up actually still existing in practice in such a setup (albeit, not on the integrated hardware level, but maybe on a software sheduling level). The 8-coprocessors could somewhat be seen as "extended pipelines" of the main CPU. They aren't logically in series with the main CPU (so much as logically concurrent). However, this does not eliminate the possibility that task xyz2 could be sent off to a coprocessor in front of xyz1 because it may fit in a "processing bubble" more nicely. It's just OoOE happening on a different granular level.
Whew, that was a lot of typing! I guess a lot of it is fanciful thinking out loud, so don't take it as the gospel of the way things will be (as if I would be particular authoritative on the topic in the slightest). I just wanted to wax philosophical on the perks of a "processor team" and that the loss of conventional integrated OoOE does not necessarily mean that some level of OoO-ness cannot still exist. If all that OoOE plumbing can be substituted for additional functional processing unit relstate, with additional clock speed headroom as a perk, I think it is not so crucial a tradeoff that is suggested. It might be just a "wash", or it may yield some considerable advantages not possible before with more conventional approaches. Granted, this is not to say it will be invulnerable to any pathological computing scenario, but then all processors are subject to this to one degree or other. Ideally, you come out ahead for the type of work you are expecting, with the nonideal cases happening far and few between.
Comments
The idea while possible doesn't leverage the design of Cell at all. The first thing to realize about Cell as a chip is that it has a full PPC 64 implementation on die. It doesn't need a support processor to implement a Mac.
What is unique about this PPC core is that it has the support processors (SPE'S) on the same die to allow extremely fast communications between elements. This fast communications in my opinion is really what Cell is offering up. This fast communications permits the use of the limited funtionality SPE's in an effiecent manner.
As to what Apple gets do realize that I'm not associated with Apple at all and infact I live in NY. So I really don't know what they are up to. However; If I where an engineer or maybe even Jobs at Apple I would be positively dripping with excitement about this technology.
I say technology because there is no reason for Apple to use the spcific Cell chip we know about. I would expect them to want to implement the chip in the low cost / desktop end of the business fairly quickly. There is tremendous potential here but one should not get overly excited as there are always gotchas. It would be interesting to see how well the Mac performs with the greatly reduced context switching need to service all the media channels current in use on a Mac.
Dave
Originally posted by Brendon
That has been my thought, but I am just a speculator, how about it Wiz69 and Programmer, what do you think? to me in this scenario Apple gets it all, a great processor for the Cores of Tiger and access to much easier porting and better running games, as well as something for the Science Core.
Originally posted by Programmer
Yes, except that IBM also had the system-on-chip expertise and ISA design expertise. So hopefully the SPEs aren't the disaster of an ISA that the PS2 vector units are (which is part of why there aren't any C/C++ compilers for the PS2's vector units).
Well it already appears that IBM did a good job with the SPE units but I'd really like to wee what the complete instruction set is.
Quote:
This I doubt. They probably didn't even tell Sony they had other customers for this new Power core -- it is an product that IBM created to license to anybody interested. It is very likely they have another big customer for the core... and its not Apple. Apple might be interested, but without the rest of the Cell attached it is much less compelling.
Well the PPE could be very compelling for Apple and its laptop and low power requirements even without the SPE's.
Quote:
Remember that IBM said explicitly that the 970FX would be a laptop chip. That may have changed since 90nm didn't pan out the way everyone thought, but that doesn't change the fact that they were planning for it to scale down in terms of power.
Yes I heard that too, but I also heard some time ago that IBM and Apple had a large contract that covered more processors than just the 970 and one generation after that. One of these processors was specificlaly for the low power market.
As to the 970 in a laptop, I'm just not sure what happened there. I never was one to see the 970 going into any laptop that Apple would want to sell.
Quote:
"In the loop" is very different than being "involved in design". What Apple needed to contribute to the processor could be done between an Apple and an IBM engineer in about 5 minutes of verbal conversation. IBM didn't consult Apple on the design of the 970's core -- they already had it from the POWER4. Apple said "we need VMX", so IBM tacked it on. The bus design was the one point where Apple may have had more influence because they've designed the only 970 northbridge we know of. In the case of Cell, however, that is provided by RamBus.
Well you may want to think that Verbal communications where enough to get the 970 off the ground, knowing IBM though I suspect that every little thing had to be documented to the tee. As to the actual developemnt of the processor I would not be surprised to find Apple engineers stationed with IBM engineers or atleast doing alot of travel back and forth.
Quote:
Probably more. Have I said otherwise? My point is merely that Apple wasn't involved in the chip's development, not that they aren't a likely customer.
Well I geuss part of the proble is that maybe we have a different understanding of "involved". Considering IBM's past with respect ot AltVec/VMX I would have to suggest that Apple was involved heavily. As to the SPE's I wonder how much of them is even the work of Sony/Toshiba rather than an answer to Apples wants.
Obviously that is a bit of a stretch, but Apple has got to realize that VMX/AltVec is getting a little old in the tooth and that to move forward will require new capabilities here.
Quote:
http://www.realworldtech.com/page.cf...WT021005084318
This article and the one at ArsTechnica should tell you what you need to know. It seems to me that many people that read these articles miss half their content, so go back and read it again... it is packed with useful information.
Yeah I've read just about everything I can find on the net related to Cell. Unfortunately much of it is useless ramblings. The good stuff, such as Ars, is just not detailed enough to grasp fully what a SPE is capable of doing.
Quote:
These things are vector processors. It is probably going to be possible to run scalar code on them (IBM said that they were working with "open source compiler writers", i.e. GCC, about providing compilers for SPE), but that doesn't mean it'll run fast.
Yes I understand they are SIMD machines but they are also capable of running complete programs or atleast code fragments. If that is the case then there is an implication that they must be able to handle some scalar type code. The question is, is this a simple extension to the vector unit to give it a program counter and a branch unit or is it a more involved core such as an enhanced 440.
For a general purpose computer such as a Mac the question is important because it indicates just how flexible the SPE's are. Lets face it even a 440 core running at 4 GHz will do alot of general purpose work. Maybe the thing to do is to rummage through GCC's change logs to see if anything has been commited yet.
Dave
In other interesting notes, I happened across the inkling not too long ago that the 440's actually find use in some of Cisco's switches and routers. Neat, eh?
Originally posted by wizard69
Hi Brendon;
The idea while possible doesn't leverage the design of Cell at all. The first thing to realize about Cell as a chip is that it has a full PPC 64 implementation on die. It doesn't need a support processor to implement a Mac.
What is unique about this PPC core is that it has the support processors (SPE'S) on the same die to allow extremely fast communications between elements. This fast communications in my opinion is really what Cell is offering up. This fast communications permits the use of the limited funtionality SPE's in an effiecent manner.
As to what Apple gets do realize that I'm not associated with Apple at all and infact I live in NY. So I really don't know what they are up to. However; If I where an engineer or maybe even Jobs at Apple I would be positively dripping with excitement about this technology.
I say technology because there is no reason for Apple to use the spcific Cell chip we know about. I would expect them to want to implement the chip in the low cost / desktop end of the business fairly quickly. There is tremendous potential here but one should not get overly excited as there are always gotchas. It would be interesting to see how well the Mac performs with the greatly reduced context switching need to service all the media channels current in use on a Mac.
Dave
OK I guess that there are two groups one that says that Apple should utilize this technology in its purest form, to reap the full rewards. And another group that sees Apple not jumping into this water, but taking baby steps. OK the best question is would cell make a better video accelerator chip than what is on the market at that time? If so why wouldn't Apple transition into cell and not jump? Apple could make a daughter card for video acceleration and possible do it cheaper than what they are paying for video cards. Apple could also request that the current PPCs have the cell bus installed in them and for IBM to add VMX units and memory.
Originally posted by Brendon
OK I guess that there are two groups one that says that Apple should utilize this technology in its purest form, to reap the full rewards. And another group that sees Apple not jumping into this water, but taking baby steps.
Or we could look at it as two schools, one cautious the other inhabited by raving mad men. In any event I think that maybe you and many others mis one important point about Cell, that is that it is mudular and the PPE (in Cell) is a PPC core. The expectation is that many variants of Cell will exist, this has been indicated by STI. Of interest to Mac users is the PPE and it should be an interest if it is in Cell or not.
Quote:
OK the best question is would cell make a better video accelerator chip than what is on the market at that time?
I'm not sure why there is such a strong association with Cell and video hardware. Everyone here should realize that nVidia is making a graphics chip for Sony's PS3. Cell is not a video chip, rather think of it as a processor that can handle multiple threads in hardware. Some of those threads are specialized onto SPE's. Now just what those SPE's are capable of doing isn't completely clear, alot of focus has been placed on the SIMD component but I've yet to see a definitive explanation of what the SPE's have going for them.
Quote:
If so why wouldn't Apple transition into cell and not jump?
Frankly I don't know what Apple is up to and niether do the majority of the people on this list. But look at it this way and see if anything below adds up in your mind.
1.
Apple is in desperate need of a low power processor for both the portables and the small form factor machines. The PPE in Cell provides this. Apparently anyways as it does appear that deliberate mis-information has been planted with respect to the low power issue.
2.
Apple hardware has been wanting for some time in Audio performance. An SPE or two could immediately correct this.
3.
The world is moving to parallel processing, either at the application level or at the system level. The PPE fits right in here in two ways. First it is threaded and second it is small to the point of allowing more than one to a die.
4.
AltVec is getting long in the tooth. Alternative technology is needed here, no matter what you hear about backwards compatibility. Cell and the SPE's are a good approach to processor loading.
Quote:
Apple could make a daughter card for video acceleration and possible do it cheaper than what they are paying for video cards. Apple could also request that the current PPCs have the cell bus installed in them and for IBM to add VMX units and memory.
Why would Apple want to add a daughter card when Cell can already take the place of just about all of Apples current processors? Cell makes sense as a system processor for low cost high performance hardware. Now anything is possible but who would buy?
As to current PPC why owuld they do this when Cell already has a PPC in there? A PPC that IBM did add VMX to. As to memory units Cell was designed right from the start to be modular. It would not surprise me at all to see Cell with different buses. Infact the different buses would be required by STI's goals.
Thanks
Dave
spe in cell are thread and mac os x has a good multithreading stuff, plus with cocoa is easy to make threaded applications. Plus all coremedia can be acclererated with spe in trasparent mode to other developer, the good example of audio performance
and even if ppe are more simple that g5, don' t forget that play 3 will have 4 cell inside: can a powermac have less?
Originally posted by wizard69
Or we could look at it as two schools, one cautious the other inhabited by raving mad men. In any event I think that maybe you and many others mis one important point about Cell, that is that it is mudular and the PPE (in Cell) is a PPC core. The expectation is that many variants of Cell will exist, this has been indicated by STI. Of interest to Mac users is the PPE and it should be an interest if it is in Cell or not.
Raving I think could discribe one, yes. But the rest are over zealious IMHO. I'm a cautious one and I just don't have enough information about CELL and the variants of it to say "Yes Apple should build a computer around those" yet.
Quote:
I'm not sure why there is such a strong association with Cell and video hardware. Everyone here should realize that nVidia is making a graphics chip for Sony's PS3. Cell is not a video chip, rather think of it as a processor that can handle multiple threads in hardware. Some of those threads are specialized onto SPE's. Now just what those SPE's are capable of doing isn't completely clear, alot of focus has been placed on the SIMD component but I've yet to see a definitive explanation of what the SPE's have going for them.
Vector processing and video cards.
Quote:
Frankly I don't know what Apple is up to and niether do the majority of the people on this list. But look at it this way and see if anything below adds up in your mind.
1.
Apple is in desperate need of a low power processor for both the portables and the small form factor machines. The PPE in Cell provides this. Apparently anyways as it does appear that deliberate mis-information has been planted with respect to the low power issue.
2.
Apple hardware has been wanting for some time in Audio performance. An SPE or two could immediately correct this.
3.
The world is moving to parallel processing, either at the application level or at the system level. The PPE fits right in here in two ways. First it is threaded and second it is small to the point of allowing more than one to a die.
4.
AltVec is getting long in the tooth. Alternative technology is needed here, no matter what you hear about backwards compatibility. Cell and the SPE's are a good approach to processor loading.
Ding, ding, ding.
Quote:
Why would Apple want to add a daughter card when Cell can already take the place of just about all of Apples current processors? Cell makes sense as a system processor for low cost high performance hardware. Now anything is possible but who would buy?
OK maybe clearer, in this case you say CELL can replace the 970 and I hear SHELL can replacethe 970. From what I have heard the PPC in CELL is a shell of the 970, so there are some things it does not do well. Double precision for one if I remember correctly. So maybe, I think, Apple would like curtain aspects of cell but not all, especially if there is a 980 or a 970MP that Apple has its eyes on. So how to benefit from cell and still have the option for a PPC upgrade, now that IBM has had some time to adress issues with manufacturing. You see I see a deep valley to the other side, CELL, some say jump and that is fine but I don't think that Apple will jump. I feel that they will build a bridge, and that is the argument. What, how, and when will Apple build this bridge. So yes this may be great for a powerbook and an iBook and the Mini, but cell has some things that would be of intrest to the high end as well. How does Apple go there? Excuse my ignorance but daughtercard springs to mind or some kind of support chip that shares the same bus. At least this way the bridge is being built even if you say "hey I still can't use that to get to the other side" we are, at least, closer.
Quote:
As to current PPC why owuld they do this when Cell already has a PPC in there? A PPC that IBM did add VMX to. As to memory units Cell was designed right from the start to be modular. It would not surprise me at all to see Cell with different buses. Infact the different buses would be required by STI's goals.
Thanks
Dave
As I already mentioned, to get that 4GHz speed they stripped down a PPC of the things that would make adding speed more difficult, the bare essentials if you will. So is it really a good substitute for a 970 or a 970MP if we ever see one, which think we will. For curtain tasks yes, but so is the video accelerator chip better at doing curtain things than a general processor is.
Originally posted by Brendon
OK maybe clearer, in this case you say CELL can replace the 970 and I hear SHELL can replacethe 970. From what I have heard the PPC in CELL is a shell of the 970, so there are some things it does not do well. Double precision for one if I remember correctly.
Well I haven't heard anything at all about the PPE and its double precision capabilities. The SPE's do take a hit but that is relative to their single precision performance, the combined double performance of the Cell as a whole is still hard to match.
The bigger question in my mind is just how important this is to Apples low cost line? A PPE base processor would be a big win on any of Apples lower end equipment today. A complete Cell implementation would be an even bigger win, if software support is available.
So maybe, I think, Apple would like curtain aspects of cell but not all, especially if there is a 980 or a 970MP that Apple has its eyes on. So how to benefit from cell and still have the option for a PPC upgrade, now that IBM has had some time to adress issues with manufacturing. You see I see a deep valley to the other side, CELL, some say jump and that is fine but I don't think that Apple will jump.
Maybe maybe not, but Apple would be absolutely foolish to not pursue a PPE based processor for their own needs. Remember Cell is a derivation of IBM's PPE technology and is itself modular, a custom arraingement for Apple should be a snap. Maybe Apple won't call it Cell but it will similar technology.
I feel that they will build a bridge, and that is the argument. What, how, and when will Apple build this bridge. So yes this may be great for a powerbook and an iBook and the Mini, but cell has some things that would be of intrest to the high end as well. How does Apple go there? Excuse my ignorance but daughtercard springs to mind or some kind of support chip that shares the same bus. At least this way the bridge is being built even if you say "hey I still can't use that to get to the other side" we are, at least, closer.
Still you seem to be missing an importnat element, Cell has a PPC in it there is no bridge to be fashioned. The only thing Apple needs is the willingness to use the technology and a variant from IBM that is better suited to the memory systems PCs use.
As I already mentioned, to get that 4GHz speed they stripped down a PPC of the things that would make adding speed more difficult, the bare essentials if you will. So is it really a good substitute for a 970 or a 970MP if we ever see one, which think we will.
I'm not sure it is accurate to describe Cell in this manner. In any event we are all waiting for more details on the PPE in Cell. This should clear things up a bit with respect to its performance. It should be noted though that much of what is in the 970 goes to generating heat not performance, so IBM may have eliminated a liability more than anything.
For curtain tasks yes, but so is the video accelerator chip better at doing curtain things than a general processor is. [/B]
Originally posted by wizard69
It should be noted though that much of what is in the 970 goes to generating heat not performance
Err. This statement doesn't make sense. Everything that went into the 970 is for generating performance. That it requires a lot of power is a consequence of how it generates performance. The 970 has OOOE, instruction grouping, instruction "micro" code, 8 instruction fetch, 4+1 dispatch+branch, and lots of execution resources all in an effort to extract more performance per cycle.
You already know Cell PPE and SPEs take the opposite approach and achieve performance through high clock rate.
Originally posted by THT
Err. This statement doesn't make sense. Everything that went into the 970 is for generating performance.
Well I would agree that was the goal. It was not however the result.
That it requires a lot of power is a consequence of how it generates performance. The 970 has OOOE, instruction grouping, instruction "micro" code, 8 instruction fetch, 4+1 dispatch+branch, and lots of execution resources all in an effort to extract more performance per cycle.
Ahh yes the power needed to generate the performance. The problem is the very power required to generate this performance is what limits the preformance of the processor. Lets face it going to water cooling was an glaring indication that all that "POWER" was going out the window as heat.
One could look at the problems of scalling the 970 as a clear indication that everything that went into the 970 did not achieve the goals performance wise.
You already know Cell PPE and SPEs take the opposite approach and achieve performance through high clock rate.
Well not really high clock rate per say. The elimination of a great deal of over head allows for much better performance per watt. As a result of the simpler. lower power design they are able to achieve much higher clock rates. I do wonder if we will be able to make direct comparisons between the two technologies on Apple hardware.
Dave
Originally posted by wizard69
Well I would agree that was the goal. It was not however the result.
I really don't understand how you can come to this conclusion. The result is higher performance, not just "heat". The 970fx power consumption power and performance are in line with x86 processors of the same vintage. It does well on integer, fantastic on floating point, and well on SIMD ops for its class of CPU.
The only thing it really is short on are integer resources and a large backside L3 cache. IBM can fix that if they choose to. They could also optimize the 970 circuit design for better power efficiency as well, if they choose to. I'll hazard a guess that IBM was concentrating all its resources on producing processors for a couple of big customers other than Apple, and thusly, choose not to optimize the 970.
Ahh yes the power needed to generate the performance. The problem is the very power required to generate this performance is what limits the preformance of the processor. Lets face it going to water cooling was an glaring indication that all that "POWER" was going out the window as heat.
Apple could have easily used air cooling. They just preferred to do something more elegant in their minds. Otherwise, all those 100 Watt x86 processors must be using some sort of magic heat sink to cool just by air alone. Not only that, those x86 processors are all going dual core too. And lets remember those air blowers for the 2.3 GHz 970fx in the Xserves. That's two 2.3 GHz 970fx in a 1.75" enclosure which just use air cooling.
The only technical reason I can think of for Apple needing to use water cooling is some sort of problem with the 970fx ceramic packaging. It couldn't take the thermal cycling or something. If it is so, IBM should fix the packaging. Otherwise, the 970fx power profile of 100 Watts max at 2.5 GHz and 1.3V is in line with other processors of today, and Apple chose water cooling based on artistic rather than technical reasons.
One could look at the problems of scalling the 970 as a clear indication that everything that went into the 970 did not achieve the goals performance wise.
I'm pinning most of the blame on IBM's optimistic marketing of their underperforming 90 nm fab. I'll grant you that the 970 burns hotter than average for a 52 million transister chip at 90 nm. I don't think the hotter-than-average power consumption is limiting however.
Well not really high clock rate per say. The elimination of a great deal of over head allows for much better performance per watt. As a result of the simpler. lower power design they are able to achieve much higher clock rates. I do wonder if we will be able to make direct comparisons between the two technologies on Apple hardware.
I'm in a wait and see attitude for Cell. They are making a lot of compromises for that clock rate and there has been no acounting of how those compromises effect performance. So, performance per watt is still up in the air.
New Powermacs to use Cell Processor?
NO, not this year. Topic answered.
Originally posted by onlooker
NO, not this year. Topic answered.
The ones in 2006 will be "new" as well.
Originally posted by THT
I really don't understand how you can come to this conclusion. The result is higher performance, not just "heat". The 970fx power consumption power and performance are in line with x86 processors of the same vintage. It does well on integer, fantastic on floating point, and well on SIMD ops for its class of CPU.
Well this is the area of contention, that is "higher performance". From my point of view the only thing that the 970 series does well is floating point, integer and SIMD operations win mostly due to higher clock rate. Yeah there are some codes that the 970 performs well on, but it certainly doesn't justify the huge amount of realestate devoted to out of order execution.
One should just ask if all of that hardware is worth it when the only wins come from higher clock rates.
The only thing it really is short on are integer resources and a large backside L3 cache. IBM can fix that if they choose to.
Again you prove my point! The perception is that the 970 lacks integer resources yet it has a huge amount of die area devoted to trying to make use of the resources it has. In effect you have part of the chip spinning its wheels (producing heat) and accomplishing nothing.
They could also optimize the 970 circuit design for better power efficiency as well, if they choose to. I'll hazard a guess that IBM was concentrating all its resources on producing processors for a couple of big customers other than Apple, and thusly, choose not to optimize the 970.
Well this I tend to agree with, atleast from the standpoint that much more could be done for the 970. I suspect though that that energy went into the PPE, as it would solve many of the problems with the 970 right off the bat.
I'm pinning most of the blame on IBM's optimistic marketing of their underperforming 90 nm fab. I'll grant you that the 970 burns hotter than average for a 52 million transister chip at 90 nm. I don't think the hotter-than-average power consumption is limiting however.
I'd say it is pretty limiting if they didn't reach the projected speeds that they thought they would get. I really don't understnad how people cna deny that Apple/IBM did not make 3GHz and that heat was a problem there.
The reality is that he 970 is very hot for a chip its size. Further it really doesn't perform as well as some would like. Sure in axcells at some tasks but it doesn't lead the pack by most measures.
I'm in a wait and see attitude for Cell. They are making a lot of compromises for that clock rate and there has been no acounting of how those compromises effect performance. So, performance per watt is still up in the air.
Well we certainly don't have much info on Cell or the PPE's but I'd take a bunch of PPE's on one chip over the current 970's any day. Yes I'm saying that based on little information, only that that is public. The reality is that the industry is moving to SMP to support systems with a large number of processes and threads, this is where the PPE's would work wonders! Performance per watt is still important but keeping the power disapation under control is vital to allow printing large numbers of processors per die.
Dave
Originally posted by wizard69
. . . I'd take a bunch of PPE's on one chip over the current 970's any day. . .
From a photo of the Cell chip, it looks like IBM could replace the SPEs with two more PPE cores with 512KB L2 cache. (I remember rumors about a 3 core chip for Xbox.)
Edit: Oops!
Originally posted by wizard69
Well this is the area of contention, that is "higher performance". From my point of view the only thing that the 970 series does well is floating point, integer and SIMD operations win mostly due to higher clock rate. Yeah there are some codes that the 970 performs well on, but it certainly doesn't justify the huge amount of realestate devoted to out of order execution.
Out of order execution is necessary for extracting more scalar performance out of the deeper pipelined, higher clock rate architecture. With just 2 integer units, it has about the same integer performance at the same clock rate as the 744x which have 4 integer units. The expenditure of transistors for OOOE is justified. Having a deeper pipeline allowing it to clock higher, at the cost of more power consumption, allows it to have better overall integer and SIMD performance. The expenditure of transistors for a deeper pipeline is justified.
One should just ask if all of that hardware is worth it when the only wins come from higher clock rates.
It's worth it. Apple should be using the 970 in all of its desktops and at least a high end laptop. At minimum, a 1.4 GHz G5 should be in the Mac Mini and a 1.6 GHz G5 should be in the eMac. They only choose not to.
Again you prove my point! The perception is that the 970 lacks integer resources yet it has a huge amount of die area devoted to trying to make use of the resources it has. In effect you have part of the chip spinning its wheels (producing heat) and accomplishing nothing.
Again, the 970 is extracting about the same integer performance out of 2 integer units as the 744x does with 4 integer units at the same clock rate. That's a win for OOOE. Not only that it delivers more performance by having higher clock rate.
The perception that it lacks integer resources is because the instruction issue, fetch and dispatch can deliver more instructions than the current integer units can handle. This is mostly due to the lineage of the 970 with the Power4. Power4's market doesn't require absolute integer performance. The desktop market does. In the creation of the 970, IBM should have added 2 add-only integer units, or maybe a full one, along with the VMX unit, but the 970 integer performance was acceptable to Apple, and the only real change required was the addition of the SIMD unit.
I'd say it is pretty limiting if they didn't reach the projected speeds that they thought they would get. I really don't understnad how people cna deny that Apple/IBM did not make 3GHz and that heat was a problem there.
IBM is using a conservative fab technique, in terms of performance, not yield, for the 970fx. They'll get to 3 GHz with dual stress liners (strained silicon technique) and or low-k. Dual stress liners is purported to have a 24% improvement in transistor speeds at the same power levels. The low-k should yield transistor speed improvement on the same order.
The reality is that he 970 is very hot for a chip its size. Further it really doesn't perform as well as some would like. Sure in axcells at some tasks but it doesn't lead the pack by most measures.
It's in the same class at the Athlon 64 and Prescott in performance, die size and power consumption. That's doing pretty well. It clocks higher than the G4 on the same process and therefore has higher overall performance than the G4. That's a win for Apple.
In the future, well, we'll see what happens. There are many paths for the same result.
Well we certainly don't have much info on Cell or the PPE's but I'd take a bunch of PPE's on one chip over the current 970's any day. Yes I'm saying that based on little information, only that that is public. The reality is that the industry is moving to SMP to support systems with a large number of processes and threads, this is where the PPE's would work wonders!
The Cell processor has 230-some million transistors. That's enough transistors to create a 970-based quad-core processor with the same die size as the Cell. There is too much unknown about the PPE to make any judgements yet.
Originally posted by THT
Out of order execution is necessary for extracting more scalar performance out of the deeper pipelined, higher clock rate architecture. With just 2 integer units, it has about the same integer performance at the same clock rate as the 744x which have 4 integer units.
Yes and which one runs cooler?
The expenditure of transistors for OOOE is justified. Having a deeper pipeline allowing it to clock higher, at the cost of more power consumption, allows it to have better overall integer and SIMD performance. The expenditure of transistors for a deeper pipeline is justified.
Well at least you recognize the power usage now. The problem is that at a given clock rate the overall performance of the 970 series isn't better, this you seem to mis. The OOOE engine did not increase performance enough to justify the expnse in transistors and heat. In other words it was UN-RISCy.
It's worth it. Apple should be using the 970 in all of its desktops and at least a high end laptop. At minimum, a 1.4 GHz G5 should be in the Mac Mini and a 1.6 GHz G5 should be in the eMac. They only choose not to.
I can't honestly believe that you said that!!!!!!! First off how would you stick a 970 in a Mini? Second the performance would not be there, the 970 relies to a great extent on its high speed I/O bus for performance. Beyond the floating point unit that 970 is rather lackluster.
Considerign that Apple will have G4 follow ons available soon with integrated memory controllers the bandwidth limitations will be gone with respect to the G4. All of this will be available in a package that will actually run at a higher clock rate and much cooler than any 970 offering.
Again, the 970 is extracting about the same integer performance out of 2 integer units as the 744x does with 4 integer units at the same clock rate. That's a win for OOOE. Not only that it delivers more performance by having higher clock rate.
Well that is one way to look at it, do ralize I see the opposite. The OOOE excution engine is a heat producer, one that doesn't lead to work being done.
The perception that it lacks integer resources is because the instruction issue, fetch and dispatch can deliver more instructions than the current integer units can handle. This is mostly due to the lineage of the 970 with the Power4.
Agian how can you see the OOE being successful if the units being feed can't handle what is being presented to them.
Power4's market doesn't require absolute integer performance. The desktop market does. In the creation of the 970, IBM should have added 2 add-only integer units, or maybe a full one, along with the VMX unit, but the 970 integer performance was acceptable to Apple, and the only real change required was the addition of the SIMD unit.
I'm not sure it was acceptable at all to Apple! They got what could be produced in the fastest acceptable time frame. It is simple as that, Apple was desparate for more power they didn't have time for IBM to redo the entire architecture of the Power4.
IBM is using a conservative fab technique, in terms of performance, not yield, for the 970fx. They'll get to 3 GHz with dual stress liners (strained silicon technique) and or low-k. Dual stress liners is purported to have a 24% improvement in transistor speeds at the same power levels. The low-k should yield transistor speed improvement on the same order.
Yes they may very well do this and it is likely that such a chip will be produced for Apple. But like I said before, I'd much rather have the technology applied to a chip with multiple PPE's. Two or four would be a nice start.
It's in the same class at the Athlon 64 and Prescott in performance, die size and power consumption. That's doing pretty well. It clocks higher than the G4 on the same process and therefore has higher overall performance than the G4. That's a win for Apple.
Since when are the current G4's on 90nm? Also relative to AMD's chips the 970 is very power hungery especially if you count up the number of transistors on chip.
In the future, well, we'll see what happens. There are many paths for the same result.
The Cell processor has 230-some million transistors. That's enough transistors to create a 970-based quad-core processor with the same die size as the Cell. There is too much unknown about the PPE to make any judgements yet.
Yep Quad cores would be fantastic. By the same token though the PPE's would allow four cores on a much smaller die. These are likely to run cooler and faster than the 970's.
As to what Apple is getting on the next go around, it is likely to be a dual core 970 variant. That should be the avenue of least resistance given that process technology can be improved. But I would not be surprised to see IBM/Apple switch over to PPE based processors rahter quickly.
Dave
Originally posted by wizard69
Yes and which one runs cooler?
The 744x processor obviously. That doesn't negate the fact that 970 has higher performance than the 744x.
Well at least you recognize the power usage now. The problem is that at a given clock rate the overall performance of the 970 series isn't better, this you seem to mis. The OOOE engine did not increase performance enough to justify the expnse in transistors and heat. In other words it was UN-RISCy.
I've always recognized the power usage. The only thing I object to is saying that "much of what is in the 970 goes to generating heat not performance." That's simply an untrue statement. It generates heat yes, but it is in service of generating performance.
The 970 is Apple's highest performance processor. The lowest clock rate Apple ships the 970 at is 1.6 GHz, and it is on average as fast or faster than any single processor G4 ever shipped. That's the lowest performing 970 Apple uses. Apple has nearly another GHz of clock rate to work with above that. That's a win for Apple.
I can't honestly believe that you said that!!!!!!! First off how would you stick a 970 in a Mini? Second the performance would not be there, the 970 relies to a great extent on its high speed I/O bus for performance. Beyond the floating point unit that 970 is rather lackluster.
I wouldn't stick a 970 in a Mac mini. I would stick a 970 in a larger case than the mini, but it would still be $500. How much of a larger case, I don't know. Maybe cube sized. The mini's main attraction is its price, not form factor. Otherwise, the G4 cube would have been more successful.
The 970 with 1/3 bus ratio is fine. A 1.4 GHz CPU with 467 MHz FSB and it'll be faster than any G4 FSB ever shipped.
Considerign that Apple will have G4 follow ons available soon with integrated memory controllers the bandwidth limitations will be gone with respect to the G4. All of this will be available in a package that will actually run at a higher clock rate and much cooler than any 970 offering.
Yes, but I bet when those e600 chips ship, Apple will have 970 chips that will have higher clock rate and higher performance than those e600 chips. And I bet they would be cheaper as well.
Agian how can you see the OOE being successful if the units being feed can't handle what is being presented to them.
It is successful because the 970 manages the same integer IPC as the 744x with less integer execution resources and a pipeline with twice as many stages. Deeper pipelines result in a higher clock and therefore higher performance at each process node.
Integer performance is much more important for desktop uses. The perceived lack of integer resources is the result of that. If it had 4 integer units to match the dispatch width, it would be dominating (as opposed to competitive) in integer performance, no?
I'm not sure it was acceptable at all to Apple! They got what could be produced in the fastest acceptable time frame. It is simple as that, Apple was desparate for more power they didn't have time for IBM to redo the entire architecture of the Power4.
It's strange that IBM had the time to integrate an SIMD unit on it. That's a whole lot more complex than adding another integer unit.
In the end, they did get a nice chip out of it that has the same integer performance per clock as the G4, yet clocks higher. More optimization of the 970 core will make it better. IBM has had another 2 years to optimize since the G5 systems first shipped, but you and I agree that they have higher priorities other than the 970.
Since when are the current G4's on 90nm? Also relative to AMD's chips the 970 is very power hungery especially if you count up the number of transistors on chip.
Apple shipped 2 GHz 130 nm 970 systems 1.5 years ago. Motorola/Freescale has been shipping 130 nm 7447 chips for almost a year now. Where are those 2 GHz G4 chips? Likewise at 90 nm, the 970 will still have better performance than the G4-based chips.
130 nm Athlon 64 has 105 million transistors with 190 sq mm die size and a TDP of 90 Watts. 130 nm Athlon Barton has 54 million transistors with 101 sq mm die and something like 80 watts TDP for the 2.1 GHz. 130 nm 970 has 52 million transistors with ~120 sq mm die and something like 90 watts. Comparable to me.
"Very power hungry" is not the correct term.
Yep Quad cores would be fantastic. By the same token though the PPE's would allow four cores on a much smaller die. These are likely to run cooler and faster than the 970's.
If the PPE performance is in the end greater than the 970, I'm all for it, but there is zero information on it.
Originally posted by THT
The Cell processor has 230-some million transistors. That's enough transistors to create a 970-based quad-core processor with the same die size as the Cell. There is too much unknown about the PPE to make any judgements yet.
Bear in mind that when you say 230 mil transistors for a Cell processor, it's not like it is just another giant monolithic RISC monstrosity (in the same spirit as an Intel P4, for instance). What you get is a single "fancy/superscalar processor", teamed with 8 smaller coprocessors, each with their own SIMD pipeline. Therein lies the compelling design that you get for that big lump of transistors. So it's not a comparison of "1" Cell vs. a 4-core G5 chip for the silicon space you have available. It is more like a 1 + 8 coprocessor/SIMD cell team vs. a 4-core G5 chip. That's certainly not something to scoff at, either.
Think of it this way- if you have an office business with a large pool of office work (ranging from complex managerial tasks to simple, repetitive tasks) to get done, what's the most efficient way to configure your staff to get that work done as fast as possible? Do you hire a single "super manager" that can do all tasks at an accelerated pace (which would also demand a kingly premium salary, since he can do that), or do you hire a more moderate manager to handle the complex jobs as well as oversee a team of "smaller" workers that can handle mostly simple tasks, but can do large volumes of it? If the workload isn't that great, then perhaps the "single super manager" is not such a bad choice. On a certain level, it still doesn't make sense if you have that super manager chewing away on a pile of very simple work (thereby using a paltry 10% of his sheer processing potential) that could have been done by a smaller "associate"(s). At some point as the volume of simple, repetitive tasks scales up considerably, it then makes more sense to go with a medium manager with a team of associates. The team of associates also have the flexibility of being scaled in size to more accurately match the workload. You can get quite a bit more bang for your buck with a large, scaleable team, rather than blowing the budget on 4-super managers to do all the work. Therein lies the "magic" of this Cell topology, imo.
It would not surprise me a bit if the OoOE capability ends up actually still existing in practice in such a setup (albeit, not on the integrated hardware level, but maybe on a software sheduling level). The 8-coprocessors could somewhat be seen as "extended pipelines" of the main CPU. They aren't logically in series with the main CPU (so much as logically concurrent). However, this does not eliminate the possibility that task xyz2 could be sent off to a coprocessor in front of xyz1 because it may fit in a "processing bubble" more nicely. It's just OoOE happening on a different granular level.
Whew, that was a lot of typing! I guess a lot of it is fanciful thinking out loud, so don't take it as the gospel of the way things will be (as if I would be particular authoritative on the topic in the slightest). I just wanted to wax philosophical on the perks of a "processor team" and that the loss of conventional integrated OoOE does not necessarily mean that some level of OoO-ness cannot still exist. If all that OoOE plumbing can be substituted for additional functional processing unit relstate, with additional clock speed headroom as a perk, I think it is not so crucial a tradeoff that is suggested. It might be just a "wash", or it may yield some considerable advantages not possible before with more conventional approaches. Granted, this is not to say it will be invulnerable to any pathological computing scenario, but then all processors are subject to this to one degree or other. Ideally, you come out ahead for the type of work you are expecting, with the nonideal cases happening far and few between.