And the Magic Number is.... 3.2 GHZ

24

Comments

  • Reply 21 of 64
    xmogerxmoger Posts: 242member
    Quote:

    Originally posted by imiloa

    i feel the same way. there are far more game consoles sold each year than personal computers. so the market for IBM's PPC tech just increased dramatically, and the profit margin on the console chips is likely much higher (volume sales per R&D dollar).



    all this means more future R&D money for the PPC architecture, from which apple will benefit, even if as a small fish in the market pie.




    There are 30-40 million PCs sold every quarter. I think there have been 60-80 million PS2s sold since it was introduced. Granted, very few personal computers have IBM chips, so it may provide a decent boost to some CPU division. IIRC one of the issues Microsoft thinks will help costs of the xbox is they own the rights to their processor design and will not be beholden to IBM.
  • Reply 22 of 64
    programmerprogrammer Posts: 3,458member
    Quote:

    Originally posted by THT

    "going wrong at Fishkill" was perhaps hyperbole, but Cell was marketed in February to run at 4 GHz, not 3.2 GHz, on 90 nm with relatively low power consumption (50 to 80 Watts). Actually, I don't even remember 3.2 GHz even being in the low end of the clock range thrown about at the time. The high clock rate was a requirement for its uber-performance, no? Now at 3.2 GHz, it's lost nearly a quarter of its slated clock rate and something close to that in performance.



    Well it took their performance from something like 280 GFLOPS down to their new number of 218 GFLOPS. Plus that was the Cell being introduced at a technical conference, not the PS3 being marketed. This is the first announcement of the PS3, so it hasn't "lost" anything except question marks in the public's eyes.



    Quote:

    As far as a 200+ million transistor part at 3.2+ GHz, Intel is already doing that, shipping now, albiet in very low quantities. But that's really not that important to me, performance is what really matters. It doesn't matter how they get it (most of the time). We'll see how the Cell and Xenon performance are when the boxes are near shipping.[/B]



    Sure, and those Intel parts burn something like 130+ watts. Lets see you fit that in a game console. Oh, and lets see it deliver anything close to the Cell's (or even XBox 360's) quoted performance. Sure it'll run messy generic code as well or better, but these boxes are built for media processing and they're going to kick Intel butt when doing that.
  • Reply 23 of 64
    fieldorfieldor Posts: 213member
  • Reply 24 of 64
    thttht Posts: 5,447member
    Quote:

    Originally posted by Programmer

    Well it took their performance from something like 280 GFLOPS down to their new number of 218 GFLOPS. Plus that was the Cell being introduced at a technical conference, not the PS3 being marketed. This is the first announcement of the PS3, so it hasn't "lost" anything except question marks in the public's eyes.



    You know that the 218 GFLOPS is a purely theoretical number, aggregate across the SPEs, and any sort of software that can get close to that is quite specialized. Single threaded performance is going to be like a 1.6-ish GHz 970 or 1.6-ish GHz 7455. Cell would be analogous to a 9 node cluster of 1.6 or 1.8 GHz 970 CPUs. Only certain types of software can take advantage of hardware like that.



    I'll accept your point about conflating what IBM says and what Sony says, and beg forgiveness.



    But as far as 3.2 GHz versus 4 GHz, I'm still disappointed. I'm also disappointed that xBox couldn't get 3.5 GHz as well. I'll be curious to see what power numbers it has at 3.2 GHz, and I still have lot of other question marks about it such as what the performance penalties really are. The power numbers speculated for a 4 GHz cell was on the low end of the 50 to 80 Watts, so if PS3 form factor could only take a 3.2 GHz Cell, then it's power consumption is at the high end. And Fishkill still isn't doing a good job at 90 nm.



    Quote:

    Sure, and those Intel parts burn something like 130+ watts. Lets see you fit that in a game console. Oh, and lets see it deliver anything close to the Cell's (or even XBox 360's) quoted performance. Sure it'll run messy generic code as well or better, but these boxes are built for media processing and they're going to kick Intel butt when doing that.



    We're not talking game consoles. I'm sure the Cell and Xenon CPUs are great for game consoles and dedicated media hardware. But for Macintoshes, the jury is still undecided. This is where single threaded, messy generic code is the majority of code available. I keep on reminding myself that a 3.2 GHz Cell or PPE will be only as fast as a 1.6 to 2 GHz 970 in single threaded code. I think it would be best if every one else did too.



    And yes, a dual-core P4 840 would destroy Cell in single threaded code, possibly even compete in dual precision floating point. A 2+ GHz 970mp would be much the same.
  • Reply 25 of 64
    sunilramansunilraman Posts: 8,133member
    Quote:

    Originally posted by fieldor

    http://xbox360.ign.com/articles/615/615667p1.html



    JUst to let you know.




    holy sh1t that article completely never mentions the word 'powermac' or 'apple' or 'g5' despite showing a picture of what is obviously an apple powermac g5...!!!11!! the article just keeps referring to it as "the xbox alpha unit" that is giving shitty frame rates ..!!11!!



    oh man if i was in apple pR or marketing i'd have a bloody heart attack after reading that article.
  • Reply 26 of 64
    programmerprogrammer Posts: 3,458member
    Quote:

    Originally posted by THT

    You know that the 218 GFLOPS is a purely theoretical number, aggregate across the SPEs, and any sort of software that can get close to that is quite specialized. Single threaded performance is going to be like a 1.6-ish GHz 970 or 1.6-ish GHz 7455. Cell would be analogous to a 9 node cluster of 1.6 or 1.8 GHz 970 CPUs. Only certain types of software can take advantage of hardware like that.



    Yes, but we can compare purely theoretical numbers from the Intel and G5 chips as well... and they are only in the 30-40 GFLOPS range, at much higher power consumption levels. And yes, only certain kinds of software can take advantage of this, but I submit that this class is larger than you think and this is the design target for this chip.



    Quote:

    But as far as 3.2 GHz versus 4 GHz, I'm still disappointed. I'm also disappointed that xBox couldn't get 3.5 GHz as well. I'll be curious to see what power numbers it has at 3.2 GHz, and I still have lot of other question marks about it such as what the performance penalties really are. The power numbers speculated for a 4 GHz cell was on the low end of the 50 to 80 Watts, so if PS3 form factor could only take a 3.2 GHz Cell, then it's power consumption is at the high end. And Fishkill still isn't doing a good job at 90 nm.



    I'm disappointed as well, but not terribly surprised. 50-80 watts is just the CPU remember, it doesn't include the memory, the I/O system, or (most importantly) the GPU. Those little enclosures have to be quiet and reliable.



    Fishkill is getting terrific yields now... it is the 90 nm wall that is giving everyone problems.



    Quote:

    We're not talking game consoles. I'm sure the Cell and Xenon CPUs are great for game consoles and dedicated media hardware. But for Macintoshes, the jury is still undecided. This is where single threaded, messy generic code is the majority of code available. I keep on reminding myself that a 3.2 GHz Cell or PPE will be only as fast as a 1.6 to 2 GHz 970 in single threaded code. I think it would be best if every one else did too.



    I'm fully aware of that (believe me -- painfully aware of it), but for most code I think that a 1.6-2.0 GHz 970 is fast enough. It is the media intensive code where we need gobs of performance, and the Cell delivers that in spades. This is the direction hardware has been headed for a while, and there is no signs of it changing so software needs to change direction with it. That is going to take some time, unfortunately.
  • Reply 27 of 64
    cubistcubist Posts: 954member
    So if Steve Jobs were to trot out a Cell at 3.2GHz, he could be seen as keeping his 3GHz promise - even tho the performance of the machine on existing code was reduced. Sounds like a good deal
  • Reply 28 of 64
    sunilramansunilraman Posts: 8,133member
    Quote:

    Originally posted by cubist

    So if Steve Jobs were to trot out a Cell at 3.2GHz, he could be seen as keeping his 3GHz promise - even tho the performance of the machine on existing code was reduced. Sounds like a good deal



    good marketing deal my friend. ghz myth is alive and well...
  • Reply 29 of 64
    Quote:

    Originally posted by cubist

    So if Steve Jobs were to trot out a Cell at 3.2GHz, he could be seen as keeping his 3GHz promise - even tho the performance of the machine on existing code was reduced. Sounds like a good deal



    as long as he can ship them, will the muscle of Sony and Microsoft mean Apple are third in line for cell handouts....
  • Reply 30 of 64
    webmailwebmail Posts: 639member
    Uhhh NO. His deadline for 3ghz was up months and months and months ago...





    Quote:

    Originally posted by sunilraman

    good marketing deal my friend. ghz myth is alive and well...



  • Reply 31 of 64
    thttht Posts: 5,447member
    Quote:

    Originally posted by Programmer

    Yes, but we can compare purely theoretical numbers from the Intel and G5 chips as well... and they are only in the 30-40 GFLOPS range, at much higher power consumption levels. And yes, only certain kinds of software can take advantage of this, but I submit that this class is larger than you think and this is the design target for this chip.



    I'm sure that many applications today could be made to be properly multithreaded, but that goes against economics. Developers will do the minimum necessary to get an application running with a minimum of bugs. [That's the way it is with hardware too.] I'm not quite sure of what the amount of increased development time needed for making multithreaded application is, but I wouldn't be surprised if the development time increased more than linearly per thread.



    So processors with good single threaded performance offers a high reward for poor development practices. That will be difficult to overcome.



    If we are to pit a dual 2.5 GHz 970mp (4 total cores, 4 threads) versus a dual 4-core 4 GHz PPE system (8 total cores, 16 threads) given equal memory systems, I think the 970mp system wins for most usages. 4 threads are enough (maybe too much?) and single thread performance will be better.



    If there is a killer multithreaded application like something greater than HD encoding/decoding that becomes prominent in everyday computing, perhaps it can tip the scales. Then again, Apple can just move to PPE-derived systems be management fiat too.



    The power consumption debate will have to wait until we actually get actual power consumption numbers. I do think there will be engineering solutions (sleeper transistors, wire/voltage isolation et al) to make brainiac microachitectures run cooler. So, if IBM wanted to put in the effort to make the 970 cooler, they could. They could also make the integer units truly pipelined too, and add additional ones, but who knows why they do things.



    Quote:

    Fishkill is getting terrific yields now... it is the 90 nm wall that is giving everyone problems.



    Terrific yields for FSG. We've been waiting on low-k and DSL for a long time now which should push the 970 to 3 GHz and make the 970mp power consumption reasonable. Maybe Cell and Xenon are only at 3.2 GHz because Fishkill is only using FSG?



    I suppose it could also be because they need to be conservative since the Cell is also to be fabbed on Sony and Toshiba's unproven 65 nm fab.



    Quote:

    I'm fully aware of that (believe me -- painfully aware of it), but for most code I think that a 1.6-2.0 GHz 970 is fast enough.



    Is compilation time important to you? If so, can compilers be made multithreaded? (It would seem a very difficult problem, and I don't think there is a mass market compiler available that is parallelized.)
  • Reply 32 of 64
    cubistcubist Posts: 954member
    Quote:

    Originally posted by webmail

    Uhhh NO. His deadline for 3ghz was up months and months and months ago...



    Better late than never.



    Quote:

    Is compilation time important to you? If so, can compilers be made multithreaded? (It would seem a very difficult problem, and I don't think there is a mass market compiler available that is parallelized.)



    Compilers can be multithreaded. Parsing could be separated from code generation, etc. In addition, "make" (or whatever you're using) can be multiprocess - launch multiple steps in parallel. Links can be multithreaded, with different threads looking in different libraries. This stuff is quite easy to do, but because there is only a limited audience for them, they don't seem to ever get done.
  • Reply 33 of 64
    Quote:

    Originally posted by THT

    We've been calling it PPE or PPE-derived, which actually comes from the Cell ISSC conference presentation in February. The PPE is an acronym for the PowerPC Processing Element in the Cell. The microarchitecture of the PPE is said to be derived from a high CPU clock speed research project IBM had about 4 to 5 years ago.



    We know that PPE is a 2 instruction issue, 2-way SMT PowerPC core with a 128 bit VMX unit. We do not know if the VMX unit can execute all 162 AltiVec/Velocity Engine/VMX instructions or not, but it is likely to have custom SIMD instructions per Microsoft and Sony requests.



    We believe that Xenon (xBox 360 CPU) has 3 PPE cores because the Xenon cores are described to be 2 instruction issue, 2-way SMT custom PowerPC cores. That has to be the Cell PPE or something very close to it. They also both have very high clock rates (3+ GHz), which would be further evidence for Xenon having 3 PPEs.



    The PPE has no known relationship to the 970 microarchitecture. The 970 is a 4 instruction + 1 branch issue architecture with a lot of instruction-ordering logic and execution units. The PPE is "dumb" but clocks really fast.



    Cell has 1 PPE core and 8 SPE cores. Xenon has 3 PPE cores. They both have been announced to be at 3.2 GHz when the respective game consoles ship. At this point, we don't know if the cores are identical or slightly customized for each customer, but they definitely are in the same genus of the PowerPC Family or PowerPC Order. (Which reminds me that I need to update me PowerPC cladogram .)



    Nintendo is also using a PowerPC CPU to power their gamesole, but we have no idea what it is yet, just that it is backwards compatible w/all Nintendo games. The Nintendo GameCube used a ~450 MHz PPC 750 CPU with a few custom instructions. So, it may be 1.6 GHz 970 for all we know.






    THT



    I would like to see that cladogram, but that leads to a bigger question when will PAUP be OS X native (and finish its never ending beta cycle)
  • Reply 34 of 64
    ishawnishawn Posts: 364member
    Where can I get some information on cell technology and the new kind of processors that these companies are coming out with? I am a little lost...
  • Reply 35 of 64
    pbpb Posts: 4,255member
    Quote:

    Originally posted by iShawn

    Where can I get some information on cell technology and the new kind of processors that these companies are coming out with? I am a little lost...



    You can start here for example.
  • Reply 36 of 64
    programmerprogrammer Posts: 3,458member
    Quote:

    Originally posted by THT

    I'm sure that many applications today could be made to be properly multithreaded, but that goes against economics. Developers will do the minimum necessary to get an application running with a minimum of bugs. [That's the way it is with hardware too.] I'm not quite sure of what the amount of increased development time needed for making multithreaded application is, but I wouldn't be surprised if the development time increased more than linearly per thread.



    Its not just multi-threading that I'm thinking of -- it is using AltiVec/SSE/SPE as well. In fact, that is probably the bigger part of the equation. The tools for this kind of thing are likely to improve over the next couple of years as the potential payoff increases due to the new hardware.



    Quote:

    So processors with good single threaded performance offers a high reward for poor development practices. That will be difficult to overcome.



    Yes it will, but for a lot of code performance doesn't really matter so things don't need to change. Many apps spend all their time in support libraries anyhow, so the system providers can optimize those without changing the apps. The remaining apps will have to move forward to stay competitive.



    Quote:

    If we are to pit a dual 2.5 GHz 970mp (4 total cores, 4 threads) versus a dual 4-core 4 GHz PPE system (8 total cores, 16 threads) given equal memory systems, I think the 970mp system wins for most usages. 4 threads are enough (maybe too much?) and single thread performance will be better.



    Except that you'll have a bunch of SPEs as well. And since Cell seems to be reasonably modular, perhaps Apple will commission a design with a 970 derived core instead of a PPE (probably with a clock rate divider so that the SPEs can run fast and not be limited by the 970's frequency/power limits).



    Quote:

    If there is a killer multithreaded application like something greater than HD encoding/decoding that becomes prominent in everyday computing, perhaps it can tip the scales. Then again, Apple can just move to PPE-derived systems be management fiat too.



    There are plenty of these things already -- just accelerating CoreAudio, QuickTime, Quartz and OpenGL will affect everyone.



    Quote:

    The power consumption debate will have to wait until we actually get actual power consumption numbers. I do think there will be engineering solutions (sleeper transistors, wire/voltage isolation et al) to make brainiac microachitectures run cooler. So, if IBM wanted to put in the effort to make the 970 cooler, they could. They could also make the integer units truly pipelined too, and add additional ones, but who knows why they do things.



    Because there are much deeper issues than we are qualified to discuss.



    Quote:

    Terrific yields for FSG. We've been waiting on low-k and DSL for a long time now which should push the 970 to 3 GHz and make the 970mp power consumption reasonable. Maybe Cell and Xenon are only at 3.2 GHz because Fishkill is only using FSG? I suppose it could also be because they need to be conservative since the Cell is also to be fabbed on Sony and Toshiba's unproven 65 nm fab.



    For game consoles shipping in volume over the next year I suspect they will have the most conservative process possible to get the yields. That means well established and reliable. 90nm FSG. The Sony/Toshiba fabs will likely start with 90nm, just like Fishkill started with 130nm.



    Quote:

    Is compilation time important to you? If so, can compilers be made multithreaded? (It would seem a very difficult problem, and I don't think there is a mass market compiler available that is parallelized.)



    XCode is already multi-threaded. The compiler doesn't need to be multithreaded, you just have to be able to launch multiple instances of it. Code comes nicely divided into seperately compilable units.
  • Reply 37 of 64
    thttht Posts: 5,447member
    Quote:

    Originally posted by Programmer

    Its not just multi-threading that I'm thinking of -- it is using AltiVec/SSE/SPE as well. In fact, that is probably the bigger part of the equation. The tools for this kind of thing are likely to improve over the next couple of years as the potential payoff increases due to the new hardware.

    ...

    Yes it will, but for a lot of code performance doesn't really matter so things don't need to change. Many apps spend all their time in support libraries anyhow, so the system providers can optimize those without changing the apps. The remaining apps will have to move forward to stay competitive.




    You're quite optimistic about this Programmer. I think Apple is going to do their tradeoffs, and I see it as definitely possible that they would rather choose dual 970mp machines than PPE-derived machines, simply because they know that the payoff for Cell-type architectures is a long way off, and perhaps in the end, not advatageous enough over the loss of performance in current applications.



    Quote:

    Except that you'll have a bunch of SPEs as well. And since Cell seems to be reasonably modular, perhaps Apple will commission a design with a 970 derived core instead of a PPE (probably with a clock rate divider so that the SPEs can run fast and not be limited by the 970's frequency/power limits).



    I'm explicitly using a prospective 4-core PPE only chip for comparison. No SPEs. A hypothetical 4-core PPE Macintosh could at least make use of all of the cores right away with current software. If you are talking about a 970 core with SPEs? Yes, they could do that to make the transition easier.



    Quote:

    There are plenty of these things already -- just accelerating CoreAudio, QuickTime, Quartz and OpenGL will affect everyone.



    I'm still going to make the argument that 4 threads may already be enough for 95% of the code available and 95% of computer usage going into the future.



    Quote:

    XCode is already multi-threaded. The compiler doesn't need to be multithreaded, you just have to be able to launch multiple instances of it. Code comes nicely divided into seperately compilable units.



    You don't have a desire or a need for faster single threaded compiler performance?
  • Reply 38 of 64
    pbpb Posts: 4,255member
    Quote:

    Originally posted by THT

    If you are talking about a 970 core with SPEs? Yes, they could do that to make the transition easier.





    Add my vote too for that. I firmly believe that the next move for Apple is to add one or two SPEs to accelerate HD multimedia processing. Not as a standard CELL configuration (that is with a PPE), but rather as satellites to a 97x CPU.
  • Reply 39 of 64
    rickagrickag Posts: 1,626member
    Wouldn't adding a SPE, even one, to a 970 derived PPE be too hot? I would think that Apple/IBM will have to significantly modify the core of a 970 in order to add any SPE's to the point that it might not even be recognizable as derived from a 970.

    I mean, the current 970 has what, 8 execution units capable of retiring 5 instructions/cycle, can manage somewhere around 216 in flight insturctions. It almost seems that in the effort to execute instructions, the 970 was designed from the beginning to blow a lot of bubbles internally in order to feed the execution units.

    These two designs seem so at odds I doubt we'll see any SPE's attached to a 970 derived core.

    But then again, I'm really, really ignorant of the technical issues involved. but I do thoroughly enjoy both THT's and Programmer's posts.
  • Reply 40 of 64
    groovergroover Posts: 29member
    Yeah where did 3.2 Ghz come from. It is staggering that the PS3 runs at 2 TFLOPS and the cell has 7 cores. This is a console that will be out next year in the spring. This could make for some exciting surprises by Apple before the PS3 is released. If not I would love it if you could get a Linux kit for the PS3 like the one that was available for the PS2.
Sign In or Register to comment.