Cell XServe Render Stations?

2»

Comments

  • Reply 21 of 37
    onlookeronlooker Posts: 5,252member
    Duh! Where have you people been? IBM has been touting there customer customizable chips program for a number of years now. Why do you think they got contracts with Microsoft, and Sony for their new game consoles?
  • Reply 23 of 37
    smalmsmalm Posts: 677member
    Blade Prototype from IBM.
  • Reply 24 of 37
    programmerprogrammer Posts: 3,458member
    Quote:

    Originally posted by onlooker

    Which is why I'm liking the idea of the co-processor.



    This makes for a really convoluted system when it just isn't necessary. With a pair of Cell's at 4-5 GHz, Apple would have a killer PowerMac. Performance on existing scalar PowerPC code would be in approximately the same ballpark as the current 970-based machines, and anything bandwidth limited would improve significantly. Overall machine performance would jump up because Apple could offload all sorts of things (graphics, audio, networking for starters) to the SPEs.



    These SPEs are extremely fast -- apparently in most cases on vector code each SPE is faster than a Pentium4 of the same clock rate. The terrain rendering demo that IBM/Sony were showing at E3 was real-time ray-traced. It did not use the GPU. running at the display's refresh rate. I heard one claim that, if run on a 4-5 GHz Cell, it would be literally 100x faster than the optimized VMX version on a 2 GHz 970. And this demo left the PPE virtually unused!



    Most of the things that take lots of processing time in the system the SPEs are well suited to dealing with, and Apple has provided OS-level support for many of these tasks.



    To heck with co-processing, these things are the heart of the system. A 970 would just be a drain on resources and extra cost/heat.
  • Reply 25 of 37
    eric_zeric_z Posts: 175member
    Quote:

    Originally posted by Programmer

    These SPEs are extremely fast -- apparently in most cases on vector code each SPE is faster than a Pentium4 of the same clock rate. The terrain rendering demo that IBM/Sony were showing at E3 was real-time ray-traced. It did not use the GPU. running at the display's refresh rate. I heard one claim that, if run on a 4-5 GHz Cell, it would be literally 100x faster than the optimized VMX version on a 2 GHz 970. And this demo left the PPE virtually unused!



    Most of the things that take lots of processing time in the system the SPEs are well suited to dealing with, and Apple has provided OS-level support for many of these tasks.



    To heck with co-processing, these things are the heart of the system. A 970 would just be a drain on resources and extra cost/heat.




    While I agree that the SPEs are nice and all, there are some downsides to them. Like the non IEEE standard SP floats and the significant penalty for writing branchy code



    Allso I've got one question about the SPEs that I've not been able to find any information about yet, can they do "regular" code and not just SIMD code? I've skimmed through the RWT and Ars articles, but didn't manage to find anything.
  • Reply 26 of 37
    The PPE is not a full blown processor. It's a striped down in order high clock PowerPC (not based on POWER4 or POWER5) with a simplified Altivec (VMX) and as such the PPE even with 8 SPE's is not suitable for Macs. That's because a PPE at 3.2 GHz is roughly equivalent to a pair of of G4's (NOT G4+'s) at 1.4-1.6 GHz. The original G4 (7400 series) was more or less a G3+Altivec and that's what a PPE can kinda be compared too, not that they ever got up to 1.4 GHz. Modern G4's are G4+'s which are quite different from G4's. The reason the PPE more or less matches a pair of G4's is because of SMT (Simultaneous Multi Threading, like Intel's Hyperthreading but a better design). G3's and G4's are fairly in order processors so code written for them (like the vast majority of current Mac code) will run fairly well on a PPE. However newer code written for G4+'s and G5's will not run that well since the PPE is weak at out of order code, has poor branch prediction, and suffers a fairly large branch misdirection penalty. Therefore while it can be used, it wouldn't be the best solution in its current form. True, stuff can be offloaded to the SPE's but that will require additional programming efforts so you want the PPE by itself to get very close to 970 performance when introduced so current programs still run at the same speed. The nightmare of brand new systems running way slower then the just replaced models? Shudder. Luckily Cell is more of an architecture then a set of processors, and so the PPE can be modified, as can the number of SPE's.





    Apple has a bit of a problem with upcoming processors as I'm sure everyone has noticed. The 970 series hasn't seen an improvement since the FX model, and both the GX and the MP are not on the scene yet. IBM seems to be unable/unwilling to make a low power laptop processor, at least until all process improvements are up and running at 90nm. While the 7448 and the e600 series are on the horizon, Freescale is half a generation behind IBM in manufacturing, and may fall further behind given the difference in resource levels and targeted markets plus don't have the best track record. The e600 does look promising but again I wonder how well it will scale. Their embedded customers really have little need for constant improvements like Apple does.



    IBM's chip division uses a vast amount of the companies resources, but the return-on-investment sucks. Therefore they may have little to no interest in continuing to improve the 970's, or towards producing a POWER5 lite because it simply doesn't match up to their current already poor ROI. Or Apple is unwilling to invest the money required for them to continue.



    On the other hand Cell is going into probably a 100 million PS3's, and the PPE core will be in something like half that many Xbox 360's. Plus Nintendo is also using an IBM processor (could be a 970GX/MP if they come out, a PPE, a 440, a 300 if they're out, or (doubtfully) another 750). Although Sony and Tosiba will be using their own fabs when they come online at 65nm , and Microsoft will be sourcing their PPE production as soon as they find some 65nm fabs, IBM will probably be making a whole lot of Cell's/PPE's at 90nm and some at 65nm. Cash flow. Additionally if IBM wants to win the processors for the next next gen systems (and I'm sure they do) they'll be investing lots of money in the Cell and PPE and derivatives while selling the improved models to as many people as possible to keep the cash coming in while they wait for the really big orders from the next next gen systems.



    Therefore Apple jumping on a custom Cell will make IBM happy because it means moving upwards of 2 million Cell's a year for some time once all Apple systems are using variations of the Cell. Now say 6-10 million isn't a huge amount versus 200 million Sony/Microsoft (just the PPE's)/Maybe Nintendo but IBM won't be making most of those. Add in some workstation sales, and the money they've already made designing it for various customers and they should have plenty of money to keep working on the Cell/PPE for the next iteration of console wars around 2010.



    Jumping on the Cell will make Apple happy because the Cell is going to have dozens of times the resources thrown at then the 970's or some Apple custom POWER5 lite (remember Apple is basically the only customer for 970's and it would probably be the same with POWER5 lite, IBM sells a handful of 970's in a blade server and that's it). I'm sure a beefed up PPE with more out of order and better branch prediction OR a few PPE's clustered together will make a fine future system once you toss in a few SPE's as well. Additionally Cell would scale very easily from just a single PPE with maybe 1 SPE in a Mac Mini or iBook (heat budget), to multiple PPE's and SPE's in PowerMacs.



    By late 2006/early 2007 Apple would probably be able to buy Cells running at 4+ GHz at 65nm with their own heavily beefed up PPE since IBM prides themselves at custom creations.



    Although a 970MP or two would probably be able to match the performance at that point in time, going forward it would probably lose out real fast as it hits scaling limits (which the 970FX has run smack into right now) and the Cell climbs towards 5.6 GHz.



    That gives Apple one to two years to begin rewriting/optimizing for in order code, and adding code for those SPE's (Core Image/Video/Audio would translate easily, and just imagine decoding a dozen h.264 1080p streams at once) plus all the new internal bandwidth. Independent developer code won't see much of a hit or any depending on how beefed up the PPE(s) are and lots of apps will be able to benefit from the SPEs once some time has been spent rewriting them.



    Add the Sony-Toshiba vision of automatically networking Cell systems and sharing power and you can start hooking up your PS3 and Sony or Toshiba HDTV for extra power. Sony has already mentioned that if you rip your DVD's into the PS3 it will use idle cycles to upconvert them to HD quality levels. Imagine that kind of thing with your entire living room.



    To summarize; Cell is seriously cool. It's like Seymour Cray designing a scaleable architecture.
  • Reply 27 of 37
    From what a couple of people who work in the science field were saying over at Ars IEEE compliance is so useless that there is very little harm in not having it. That's the gist anyway, it was totally not my field. All I remember is that it's somewhere in the last 30 odd pages of the second perpetual Apple CPU thread, or the first couple of the third one. Sorry, but their search is almost useless so I couldn't narrow it down.
  • Reply 28 of 37
    eric_zeric_z Posts: 175member
    Quote:

    Originally posted by Electric Monk

    From what a couple of people who work in the science field were saying over at Ars IEEE compliance is so useless that there is very little harm in not having it. That's the gist anyway, it was totally not my field. All I remember is that it's somewhere in the last 30 odd pages of the second perpetual Apple CPU thread, or the first couple of the third one. Sorry, but their search is almost useless so I couldn't narrow it down.



    Perhaps, but mixing rounding standards, still seems like a bad idea to me. Then again perhaps I just worry to much, and code where you have to keep rounding errors etc in check aren't all that usual? And where it is needed you use DP floats anyway?
  • Reply 29 of 37
    programmerprogrammer Posts: 3,458member
    Quote:

    Originally posted by Eric_Z

    Perhaps, but mixing rounding standards, still seems like a bad idea to me. Then again perhaps I just worry to much, and code where you have to keep rounding errors etc in check aren't all that usual? And where it is needed you use DP floats anyway?



    This isn't really an issue -- the non-standard floating point is only on the SPEs so if you must have compliance, don't move your code to the SPEs. As mentioned above, this really isn't an issue much of the time.





    Quote:

    Originally posted by Electric Monk

    The PPE is not a full blown processor. It's a striped down in order high clock PowerPC (not based on POWER4 or POWER5) with a simplified Altivec (VMX) and as such the PPE even with 8 SPE's is not suitable for Macs.



    Not true -- the PPE is a fully capable processor that can run anything any other PowerPC can run. It is not a wide issue out-of-order core like the POWER4/5 machines, but as you point out neither are the G4/G4+.



    3.2 GHz is the speed that the PS3 will ship at, and the PS3 has much more stringent thermal constraints than a PowerMac does. More in line with the iMac, eMac, and iMac mini -- which means that those machines could use a Cell ~3 GHz and 4-8 SPEs which is a huge step up from where they are currently.



    It also means that the PowerMac could be clocked higher... more like a pair of 2 GHz G4s, with four very important differences:



    1) Huge memory bandwidth. The G4's biggest problem is a bus that supports less than about 1.5 GB/sec at the top end. The Cell starts at 16x that much.

    2) The SPEs can offload the expensive calculations (see my previous message).

    3) It is 64-bit.

    4) SMT can be turned off, resulting in having one potentially much higher speed thread (depending on the kind of code being run).



    Quote:

    has poor branch prediction, and suffers a fairly large branch misdirection penalty



    A lot of people claim it has poor branch prediction, but there is no actual released information about this. It is clear that many people misinterpreted IBM's statement that the SPEs have no dynamic prediction as applying to the PPE. Really IBM has said nothing about the PPE's branch predictor, so it isn't currently possible to say much about it. It is probably better than the G4s, and may be as good as the G5's.



    The branch mis-predict penalty is bad, however, no doubt about it (more reason for them to predict correctly). On the other hand, those cycles are whizzing by really fast so compared to a G4 or G5 in terms of absolute time (i.e. nanoseconds instead of cycles) it isn't so bad.



    Quote:

    The nightmare of brand new systems running way slower then the just replaced models? Shudder. Luckily Cell is more of an architecture then a set of processors, and so the PPE can be modified, as can the number of SPE's.



    Always an issue... but fortunately Apple provides a fair bit of the software for its own system and it is all the kind of stuff that can benefit from SPE acceleration. Imagine Final Cut Pro running 48 MPEG-2 streams like that Toshiba demo, and not using the PPE while doing it.



    Quote:

    By late 2006/early 2007 Apple would probably be able to buy Cells running at 4+ GHz at 65nm with their own heavily beefed up PPE since IBM prides themselves at custom creations.



    The modularity of the Cell comes from its on-chip bus. Customizing within a core is a long and expensive process, even on the Cell. Perhaps especially on the Cell because those cores are carefully hand-designed to achieve high clock rates, low power, and small footprint. So I wouldn't count on major changes to the PPE for Apple... especially since the PPE isn't nearly as bad as you think it is.



    I don't disagree with your dates though. I'd be surprised to see Apple adopt Cell before mid-2006. It might start at 90nm though since mass production (for PS3) will likely have a good effect on pricing. And remember that the Cell has a huge I/O bandwidth capability so it is going to benefit much more from having 2+ Cells in one machine than the 970 does. If Apple goes this route, one wonders if they'll get access to nVidia technology which is designed to connect to the Cell's I/O port.
  • Reply 30 of 37
    mikenapmikenap Posts: 94member
    Programer/everyone, what do you make of this? Would server performance give any indication of app performance in programs like PS or FCP? I dont have the main article link, i think it's on macsurfer somewhere. IBM was talking about putting 7 of these boards in one rack mountable inclosure... Seems like alot of performance, but what could it mean in the real world?



    IBM yesterday demonstrated a blade server prototype using dual Cell processors running at 2.4 to 2.8 GHz, along with a 1 GB of RAM. We've highlighted in blue one of the two "processors" (which are really more like daughterboards), with all ten APUs per processor visible. IBM estimates the performance of this board at about 400 gigaflops (400 billion floating point operations per second).









    The other two light-coloured square ICs to the right are 512 MB XDR RAM chips. This particular blade can run Linux 2.6.11, although much of the optimisation necessary to run either the kernel or userland at peak theoretical speed is incomplete, given Cell's complex architecture. Here's another photo of the board, this time with its massive heatsinks installed. Not quite a form factor ready for the rackmount blade market now, is it?



    The good news is that we finally have some real, workable, recognisable Cell hardware, with development on the architecture rapidly progressing at IBM's research labs. And yes, I wouldn't be the first analyst to predict that Apple's portable line might be treated to Cell, skipping over the bulky, hot G5 entirely. Cell is just downright more efficient and evolving quickly. Apple had better jump onboard soon.
  • Reply 31 of 37
    Quote:

    Originally posted by Programmer

    Not true -- the PPE is a fully capable processor that can run anything any other PowerPC can run. It is not a wide issue out-of-order core like the POWER4/5 machines, but as you point out neither are the G4/G4+.



    Yes it can run all PowerPC instructions, I didn't say it couldn't. Just that it's a lot leaner then anything since the G3 or the original G4 which imposes a fair amount of limits.



    Well, the G4+ made a few strides towards being more out of order, but yeah.



    Quote:

    3.2 GHz is the speed that the PS3 will ship at, and the PS3 has much more stringent thermal constraints than a PowerMac does. More in line with the iMac, eMac, and iMac mini -- which means that those machines could use a Cell ~3 GHz and 4-8 SPEs which is a huge step up from where they are currently.



    I'm hearing 50-80 watts at 4 GHz/1 PPE/8 SPE's and the rest of the overhead.



    An SPE seems to take about 4 watts at 4 GHz, the PPE is equal to 8 SPE's (i.e. 32 watts) and the overhead takes around 12 watts. None of the numbers are definitive or anything, but they're the most reasonable.



    The Xbox is water cooled (heatpipes or some such thing of course), but 3 PPE's plus overhead (buses, etc) has got to be 25% or so hotter then a Cell, and the PS3 is air cooled which provides a little bit of support anyway.





    Quote:

    It also means that the PowerMac could be clocked higher... more like a pair of 2 GHz G4s, with four very important differences:



    1) Huge memory bandwidth. The G4's biggest problem is a bus that supports less than about 1.5 GB/sec at the top end. The Cell starts at 16x that much.

    2) The SPEs can offload the expensive calculations (see my previous message).

    3) It is 64-bit.

    4) SMT can be turned off, resulting in having one potentially much higher speed thread (depending on the kind of code being run).



    Assuming a 4 GHz Cell right? And I have to note (just so everybody's clear) that these G4's it would be equivalent too would not be like a 7447, but rather a 7400.



    1) Absolutely. That's a huge Cell advantage.

    2) True, but again requires programming time. I'm talking, to some extent, about the transition period. I suppose that if Apple's code was all done for Cell then we could get by with a regular PPE.

    3) Agree.

    4) True, but I doubt it would work out to a single 4 GHz G4, so keeping SMT would likely result in better performance overall most of the time. Can SMT be turned off on the fly with little penalty? If so that would probably work.





    Quote:

    A lot of people claim it has poor branch prediction, but there is no actual released information about this. It is clear that many people misinterpreted IBM's statement that the SPEs have no dynamic prediction as applying to the PPE. Really IBM has said nothing about the PPE's branch predictor, so it isn't currently possible to say much about it. It is probably better than the G4s, and may be as good as the G5's.



    The branch mis-predict penalty is bad, however, no doubt about it (more reason for them to predict correctly). On the other hand, those cycles are whizzing by really fast so compared to a G4 or G5 in terms of absolute time (i.e. nanoseconds instead of cycles) it isn't so bad.



    I'm thinking that branch prediction is one of the easiest sacrifices to make for a primarily in order processor. But you're right, I shouldn't assume.



    The branch mis-predict looks like 12 cycles from the released diagrams, but STI is saying 8 cycles. There seems to be a few ways to account for the difference, but again it's one of those waiting for more information things. But yeah, speed is life.



    Quote:

    Always an issue... but fortunately Apple provides a fair bit of the software for its own system and it is all the kind of stuff that can benefit from SPE acceleration. Imagine Final Cut Pro running 48 MPEG-2 streams like that Toshiba demo, and not using the PPE while doing it.



    Yep, now think h.264 and 1080p and all will worship the Cell's floating point power I tend to agree with you here, but I'm trying to think of big independent developer stuff that would take a hit until SPE transition. Photoshop? Maya? Don't use them so I don't know, but how much stuff can be offloaded? I think Photoshop is old enough most of the code would be in order anyway though so? I just don't know how much non-Apple software is out there that would take noticeable hits from just running on a PPE until transition.



    Quote:

    The modularity of the Cell comes from its on-chip bus. Customizing within a core is a long and expensive process, even on the Cell. Perhaps especially on the Cell because those cores are carefully hand-designed to achieve high clock rates, low power, and small footprint. So I wouldn't count on major changes to the PPE for Apple... especially since the PPE isn't nearly as bad as you think it is.



    I like the PPE. I think it's a brilliant design for what it's supposed to do. Like I mentioned it's totally something Cray would have come with and I'm surprised how hand tuned it is given IBM's typical practice. But I worry about the transition period. Still a couple PPE's would probably take care of most of my worries, but then you're running into heat budget problems again.



    Quote:

    I don't disagree with your dates though. I'd be surprised to see Apple adopt Cell before mid-2006. It might start at 90nm though since mass production (for PS3) will likely have a good effect on pricing. And remember that the Cell has a huge I/O bandwidth capability so it is going to benefit much more from having 2+ Cells in one machine than the 970 does. If Apple goes this route, one wonders if they'll get access to nVidia technology which is designed to connect to the Cell's I/O port.



    I just don't think Apple will be willing to pay the 90nm costs since Cell was very specifically designed for 65nm.



    If Apple's willing to work with STI they'll get some of the other tech. Especially if Sony or Toshiba wants to make a Cell based computer and have to choose between OS X and Linux. If Apple is willing to do limited listening I'd think Sony/Toshiba would be willing to pay the fees to get OS X on the desktop versus Linux on the desktop.







    We really need to make a Perpetual Apple future CPU thread, rather then having a dozen threads going on. Mods?
  • Reply 32 of 37
    programmerprogrammer Posts: 3,458member
    Quote:

    Originally posted by mikenap

    Programer/everyone, what do you make of this? Would server performance give any indication of app performance in programs like PS or FCP? I dont have the main article link, i think it's on macsurfer somewhere. IBM was talking about putting 7 of these boards in one rack mountable inclosure... Seems like alot of performance, but what could it mean in the real world?



    This is what IBM meant by a "Cell workstation". They meant these blades. Yes, it is a lot of performance.



    Quote:

    IBM yesterday demonstrated a blade server prototype using dual Cell processors running at 2.4 to 2.8 GHz, along with a 1 GB of RAM. We've highlighted in blue one of the two "processors" (which are really more like daughterboards), with all ten APUs per processor visible. IBM estimates the performance of this board at about 400 gigaflops (400 billion floating point operations per second).



    APUs? You mean the SPEs? Those are on the Cell chip and not visible on the board. No, those 10 chips lined up next to each Cell will be the 1 GB worth of XDRAM. The other two big white chips are most likely the southbridges (I/O mostly).



    Quote:

    The good news is that we finally have some real, workable, recognisable Cell hardware, with development on the architecture rapidly progressing at IBM's research labs. And yes, I wouldn't be the first analyst to predict that Apple's portable line might be treated to Cell, skipping over the bulky, hot G5 entirely. Cell is just downright more efficient and evolving quickly. Apple had better jump onboard soon.



    I'm a bit doubtful that the Cell would appear first in a notebook (okay, I'm a lot doubtful). Even at 3.2 GHz like the game consoles it is probably too much for a portable, at least on the current 90nm process.
  • Reply 33 of 37
    programmerprogrammer Posts: 3,458member
    Quote:

    Originally posted by Electric Monk

    Yes it can run all PowerPC instructions, I didn't say it couldn't. Just that it's a lot leaner then anything since the G3 or the original G4 which imposes a fair amount of limits

    ....

    Assuming a 4 GHz Cell right? And I have to note (just so everybody's clear) that these G4's it would be equivalent too would not be like a 7447, but rather a 7400.




    "Stripped down" kind of implies that something functional is missing. And remember, the original G4's performance was better per clock than the G4+. There are enough differences between the G4/G4+ and the PPE, however, that the analogy is pretty weak.



    Quote:

    2) True, but again requires programming time. I'm talking, to some extent, about the transition period. I suppose that if Apple's code was all done for Cell then we could get by with a regular PPE.



    Fortunately the old programmer's rule of thumb says that 80% of the time is spent in 20% of the code. So not nearly all of the code needs to be moved, and IBM/Sony keep talking about how it is easy to write code for these things.



    Quote:

    4) True, but I doubt it would work out to a single 4 GHz G4, so keeping SMT would likely result in better performance overall most of the time. Can SMT be turned off on the fly with little penalty? If so that would probably work.



    Well if it is anything like the POWER5's SMT (and it probably is) then the threads are given a priority value that controls how often instructions are dispatched to each thread. This means not only can it be turned off on the fly, it can be tuned on the fly.



    Quote:

    I'm thinking that branch prediction is one of the easiest sacrifices to make for a primarily in order processor. But you're right, I shouldn't assume.



    No, branch prediction is more desirable for an in-order processor. It is easy to sacrifice in a short pipeline machine, and the PPE is definitely not a short pipe machine.



    Quote:

    The branch mis-predict looks like 12 cycles from the released diagrams, but STI is saying 8 cycles. There seems to be a few ways to account for the difference, but again it's one of those waiting for more information things. But yeah, speed is life.



    Probably doesn't matter a lot in the end anyhow... main memory latency will completely dominate at this clock rate.



    Quote:

    Yep, now think h.264 and 1080p and all will worship the Cell's floating point power I tend to agree with you here, but I'm trying to think of big independent developer stuff that would take a hit until SPE transition. Photoshop? Maya? Don't use them so I don't know, but how much stuff can be offloaded? I think Photoshop is old enough most of the code would be in order anyway though so? I just don't know how much non-Apple software is out there that would take noticeable hits from just running on a PPE until transition.



    The PPE probably does fine on any code that is composed of unrolled loops, software pipelining, and/or VMX. It is the spagetti code that the PPE won't do so well with, and (fortunately) that kind of code usually isn't the bottleneck. No doubt Word and Excel will suffer, but they do okay on a 1 GHz G4 so a 4 GHz PPE should do fine. Put 2 Cells into a PowerMac and you've got 4 PowerPC hardware threads.



    Quote:

    I like the PPE. I think it's a brilliant design for what it's supposed to do. Like I mentioned it's totally something Cray would have come with and I'm surprised how hand tuned it is given IBM's typical practice. But I worry about the transition period. Still a couple PPE's would probably take care of most of my worries, but then you're running into heat budget problems again.



    Not any worse than the current 2.7 GHz dual. Probably better, actually, since the chip is physically larger so it likely doesn't suffer from such a severe heat density problem.



    Quote:

    I just don't think Apple will be willing to pay the 90nm costs since Cell was very specifically designed for 65nm.



    It was designed to scale to 65nm from an initial 90nm version. That hardly means that the 90nm isn't an attractive option. Apple ran for quite a while on the 130nm 970 before switching to the 90nm 970FX. The Cell design will probably yield very well at 90nm since that's what Sony is putting in the PS3. Sony's use of 7 out of the 8 SPEs "to improve yields" clearly shows that this is a useful strategy -- and it means they now have a 2-dimensional yield chart rather than just the old clock rate scale. Apple could sell you Macs which have 1-8 SPEs per Cell, 1-2 Cells, and clock rates from 2.4 to 4.6 GHz... all depending on what you are willing to pay and how big a box & cooling system you are willing to put up with. That's a lot of BTO possiblities.
  • Reply 34 of 37
    Quote:

    Originally posted by Programmer

    "Stripped down" kind of implies that something functional is missing. And remember, the original G4's performance was better per clock than the G4+. There are enough differences between the G4/G4+ and the PPE, however, that the analogy is pretty weak.



    My bad on the terminology I'll admit - I was just trying to imply that it was dissimilar to the 970's design (wide out of order) and as such was saving a lot of transistors. A weak analogy it is true, but at least it's a fairly functional one and we just don't know enough about the PPE. When we start getting some bench marks in I'll drop it, but for now I think it's usable given how little most people know about the Cell. It's not like even those of us who've read a lot about the Cell know a huge amount either I'd kill for some real world performance.



    Quote:

    Well if it is anything like the POWER5's SMT (and it probably is) then the threads are given a priority value that controls how often instructions are dispatched to each thread. This means not only can it be turned off on the fly, it can be tuned on the fly.



    Very nice. And I agree with you, I'm sure IBM is grabbing features from whatever they already have to put into it despite it not being a POWER5 itself. I've heard it's based on IBM's upcoming PowerPC 300 series, but I haven't much (or anything) about them. In fact there seems to be the odd debate about their existence so who knows what IBM's up to.



    Quote:

    No, branch prediction is more desirable for an in-order processor. It is easy to sacrifice in a short pipeline machine, and the PPE is definitely not a short pipe machine.



    You're right, I'm an idiot for some reason I forgot that it was deep pipelined - for a while the PPE was looking like it had a fairly short pipeline and I've still been thinking about it like that. If STI would just release all the info on the thing already.





    Quote:

    The PPE probably does fine on any code that is composed of unrolled loops, software pipelining, and/or VMX. It is the spagetti code that the PPE won't do so well with, and (fortunately) that kind of code usually isn't the bottleneck. No doubt Word and Excel will suffer, but they do okay on a 1 GHz G4 so a 4 GHz PPE should do fine. Put 2 Cells into a PowerMac and you've got 4 PowerPC hardware threads.



    It does have that simplified VMX though. Though AFAIK the 970's VMX doesn't support some stuff that the G4+'s supports (IBM's docs are amusing in that respect) so it might not be too much of a problem. And if it is, IBM stuck on the VMX-128 onto the X360's PPE so they should be able to do the same for Apple if they cough up a little dough. Speaking of which the Cell would be a golden opportunity to use deploy a new Altivec.



    Quote:

    Not any worse than the current 2.7 GHz dual. Probably better, actually, since the chip is physically larger so it likely doesn't suffer from such a severe heat density problem.



    I was thinking more for portables then the PowerMacs - Apple has enough space to play around with any cooling system they want in those but the 'books and the Mini have a lot less space to play around with and the 12" gets hot with the current 7447's and they sure haven't been able to fit a 970 in there.





    Quote:

    It was designed to scale to 65nm from an initial 90nm version. That hardly means that the 90nm isn't an attractive option. Apple ran for quite a while on the 130nm 970 before switching to the 90nm 970FX. The Cell design will probably yield very well at 90nm since that's what Sony is putting in the PS3. Sony's use of 7 out of the 8 SPEs "to improve yields" clearly shows that this is a useful strategy -- and it means they now have a 2-dimensional yield chart rather than just the old clock rate scale. Apple could sell you Macs which have 1-8 SPEs per Cell, 1-2 Cells, and clock rates from 2.4 to 4.6 GHz... all depending on what you are willing to pay and how big a box & cooling system you are willing to put up with. That's a lot of BTO possiblities.



    I was thinking that with the various motherboard expenses (and boy, Apple needs new motherboards in almost everything) Apple would prefer to save some money on the chip itself. Also there are at least two 65nm Cell fabs (Sony's and Toshiba's) versus (I think) just IBM's line at Fishkill at 90nm.



    The 90nm in the PS3 (and the X360) is just until they can get it at 65nm. After all Sony's spent a billion and change on Nagasaki so I'd think they'd want their money's worth



    I interpret the use of 7 out of 8 SPE's to improve yields as saying it either doesn't yield well at 90nm or by disabling one SPE they're saving money via a disproportionate increase in yields; but I agree with you on the new ability to manipulate yields.



    Oh yes, I'm just as excited about the possibilities for Cell in Macs as you are - we just disagree on a few details



    Heck with any luck the next Powerbook I buy will be running Cell. C'mon Apple, I can wait about 3 years.
  • Reply 35 of 37
    programmerprogrammer Posts: 3,458member
    Quote:

    Originally posted by Electric Monk

    [B]A weak analogy it is true, but at least it's a fairly functional one and we just don't know enough about the PPE. When we start getting some bench marks in I'll drop it, but for now I think it's usable given how little most people know about the Cell. It's not like even those of us who've read a lot about the Cell know a huge amount either I'd kill for some real world performance.



    "real world performance"... like what? Benchmarks don't tell you much. Measuring processor performance is always a problem, unless you have a specific piece of software to evaluate. The real world performance of the PPE on non-vector code is going to be about half that of a 970 at the same clock rate... but it'll be able to run two threads at almost that speed. On reasonably pipelined code it'll do better.



    Quote:

    It does have that simplified VMX though. Though AFAIK the 970's VMX doesn't support some stuff that the G4+'s supports (IBM's docs are amusing in that respect) so it might not be too much of a problem. And if it is, IBM stuck on the VMX-128 onto the X360's PPE so they should be able to do the same for Apple if they cough up a little dough. Speaking of which the Cell would be a golden opportunity to use deploy a new Altivec.



    The VMX is only simplified in that they removed the out-of-order capability and made it 2-way issue. That's not really all that different from the 970 though and since vector code typically pipelines well the PPE shouldn't do too badly given the clock rate and/or SMT advantage.



    Forget about VMX128 -- you will never see it on anything non-Microsoft. It is Microsoft's design changes, and it removes things from the ISA that Apple needs and must achieve addressing of 128 registers in a hack-ish way.



    Quote:

    The 90nm in the PS3 (and the X360) is just until they can get it at 65nm. After all Sony's spent a billion and change on Nagasaki so I'd think they'd want their money's worth



    Yeah, but I don't expect to see mass production at 65nm until at least late 2006.



    Quote:

    I interpret the use of 7 out of 8 SPE's to improve yields as saying it either doesn't yield well at 90nm or by disabling one SPE they're saving money via a disproportionate increase in yields; but I agree with you on the new ability to manipulate yields.



    The only yield problem it has at 90 nm is the fact that it is so big (221 sq mm, IIRC). Certainly it yields better than the 970 did on 90 nm until recently. If not Sony & Toshiba couldn't afford to use it in their consumer productions. Saving a few watts of power/heat is a good thing as well.
  • Reply 36 of 37
    webmailwebmail Posts: 639member
    Those shots have been pre-rendered, and enhanced with anti-aliasing.. So those aren't what you will see when you play the actual game on your console ;-)



    There was kinda an uproar about this at E3





    Quote:

    Originally posted by onlooker

    I am so sold on that PS3 it's not even funny.



    Look at the trailer for motor storm on the PS3 here

    ( sometimes there is a short advertisement commercial first. If you see mud bogs that's it! )



    http://www.gametrailers.com/player.php?id=5841&type=mov



    This is freaking outrageous.



    I hope Apple does something spectacular like use 2x 970MP's, and a CELL processor for..... whatever... That game is freaking so awesome looking. The closest thing I've seen on the 360 was Ghost Recon 3, and that doesn't look much different from Republic Commando.




  • Reply 37 of 37
    programmerprogrammer Posts: 3,458member
    Quote:

    Originally posted by webmail

    Those shots have been pre-rendered, and enhanced with anti-aliasing.. So those aren't what you will see when you play the actual game on your console ;-)





    Well considering there is no final PS3 hardware yet it is a little premature to say that you won't see that on the console. Same applies to a lesser extent to the XBox 360.
Sign In or Register to comment.