Cell based server revealed.

Posted:
in Future Apple Hardware edited January 2014
Here is a page showing photographs of an IBM cell based server. Apparently it is running at 2.8Ghz, but the story says that 3Ghx has been achieved in lab conditions. There are no benchmarks, so it really gives us no clue as to how interesting this would be to us.



WWDC could be interesting.
«13

Comments

  • Reply 1 of 43
    newnew Posts: 3,244member
    "If operated at 3 GHz, Cell's theoretical performance reaches about 200 GFLOPS, which works out to about 400 GFLOPS per board,







    doesn't the G5 do 8 Gflops?
  • Reply 2 of 43
    henriokhenriok Posts: 537member
    Quote:

    Originally posted by New

    doesn't the G5 do 8 Gflops?



    A [email protected] GHz does peak 11 Gflops at scalar operations but 22 Gflops on SIMD operations (AltiVec).



    Remember that the G5 is quite stong on floting point operations if compared to other mainstream processors like Athlon/Opteron or Pentium/Xeon. But these Cell processors are just insanely strong on single precision floting point operations due to their 8 SPUs. Most applications seem to favour double precision though and Cell isn't as good there.
  • Reply 3 of 43
    newnew Posts: 3,244member
    Quote:

    Originally posted by Henriok

    A [email protected] GHz does peak 11 Gflops at scalar operations but 22 Gflops on SIMD operations (AltiVec).



    Remember that the G5 is quite stong on floting point operations if compared to other mainstream processors like Athlon/Opteron or Pentium/Xeon. But these Cell processors are just insanely strong on single precision floting point operations due to their 8 SPUs. Most applications seem to favour double precision though and Cell isn't as good there.




    so what tasks will the cell be strong at?
  • Reply 4 of 43
    Encoding/decoding for example: HD stuff - imagine 8 h.264 1080p streams at once just using the SPE's. Or 48 MPEG-2 streams.



    And it's not bad at double floating point, still better then the G5, but single point is where it shines.



    It's perfectly fine at regular stuff using the PPE though. Well, ok it sucks at out of order code and suffers a big branch mis-predict penalty, but using SMT a 3.2 GHz PPE runs about as well as a pair of 1.4-1.6 GHz G4's. Run it at 4 GHz, stick a couple PPE's together, toss in some SPE's and Cell will match or better G5's on regular stuff, and make it scream for mercy at floating point operations. Not to mention it has a heck of a lot more memory bandwidth, so anything that's bottlenecked there loves the Cell.
  • Reply 5 of 43
    programmerprogrammer Posts: 3,457member
    Quote:

    Originally posted by New

    so what tasks will the cell be strong at?



    Video/image/audio/signal processing, 2D & 3D graphics, ray tracing, anything AltiVec does well, numerical processing, etc. In other words, most things that take a lot of compute time. The main trick is that software developers will have to code specifically for the Cell because it is a new execution model and existing software won't "just work".
  • Reply 6 of 43
    newnew Posts: 3,244member
    Quote:

    Originally posted by Programmer

    Video/image/audio/signal processing, 2D & 3D graphics, ray tracing, anything AltiVec does well, numerical processing, etc. In other words, most things that take a lot of compute time. The main trick is that software developers will have to code specifically for the Cell because it is a new execution model and existing software won't "just work".



    so could an apple cell server compliment, say the virginia tech supercluster, in any effective way?
  • Reply 7 of 43
    mi0immi0im Posts: 8member
    Quote:

    Originally posted by Henriok

    Most applications seem to favour double precision though and Cell isn't as good there.



    CELL design team won't agree with you.

    http://www.epcc.ed.ac.uk/scicomp/abs...ay.php?Abst=77
  • Reply 8 of 43
    henriokhenriok Posts: 537member
    Quote:

    Originally posted by mi0im

    CELL design team won't agree with you.

    http://www.epcc.ed.ac.uk/scicomp/abs...ay.php?Abst=77




    That link of yours.. Was that supposed to mean anything?

    I didn't say Cell was bad, it seems to excel in that respect too, I did say that it wasn't _as_good_ on double floats.



    Quoting from BM's Cell pages:

    Peak performance (single precision): > 256 GFlops

    Peak performance (double precision): >26 GFlops




    I am standing by that Cell isn't as good on double precision floats as it is on single precision. Don't you agree?



    But on the other hand, you did perhaps contend my other statement, that applications seemed to favour double precision flotas. That's just an observation I've made, I can't back that up with a quote.
  • Reply 9 of 43
    mi0immi0im Posts: 8member
    Quote:

    Originally posted by Henriok

    I am standing by that Cell isn't as good on double precision floats as it is on single precision. Don't you agree?



    It seems you love big number blindly. But, HPC applications need adequate B/F (bandwidth to flops) ratio. CELL's DP fp peak number balances its memory bandwidth.
  • Reply 10 of 43
    programmerprogrammer Posts: 3,457member
    Quote:

    Originally posted by mi0im

    It seems you love big number blindly. But, HPC applications need adequate B/F (bandwidth to flops) ratio. CELL's DP fp peak number balances its memory bandwidth.



    This is extremely dependent on the algorithm(s) involved. Some algorithms do very little computation per memory operation (or they are just poorly coded to work that way). Better are those that manage to do a lot of ops per memory fetch/store. These machines can typically manage a theoretical peak rate of 50-100 single precision FLOPS per cacheline read from memory, assuming proper streaming and prefetching... that is a lot of computation and typically programmers don't come even remotely close to this.
  • Reply 11 of 43
    boemaneboemane Posts: 311member
    Quote:

    Originally posted by Programmer

    Video/image/audio/signal processing, 2D & 3D graphics, ray tracing, anything AltiVec does well, numerical processing, etc. In other words, most things that take a lot of compute time. The main trick is that software developers will have to code specifically for the Cell because it is a new execution model and existing software won't "just work".



    So if the Cell does everything that AltiVec does, and does it well (better than altivec), could apple utilise the Cell as a co-processor and divert code that the Cell cpu excels at to it (kinnda like what the altivec unit is used as now).



    I know the Cell CPU is probably a lot more expensive than the AltiVec unit in the G5, but I'm thinking for top-of-the-line PowerMacs. having this kind of processing power would make it a LOT faster than its competitors on the WinTel side...



    Dual 3GHz G5's where each G5 has its own Cell co-processor runnig at 2.8 - 3 GHz.



    I know this is probably way to expensive, but on the other hand PS3 and XBox 360 trows in 6 of these Cell CPU's in a package for under 400 dollars...



    Just a thought
  • Reply 12 of 43
    programmerprogrammer Posts: 3,457member
    Quote:

    Originally posted by BoeManE

    So if the Cell does everything that AltiVec does, and does it well (better than altivec), could apple utilise the Cell as a co-processor and divert code that the Cell cpu excels at to it (kinnda like what the altivec unit is used as now).



    Why is everyone so fixated on the Cell as a coprocessor? It would be a lot better as a Mac central processor.



    Quote:

    I know the Cell CPU is probably a lot more expensive than the AltiVec unit in the G5, but I'm thinking for top-of-the-line PowerMacs. having this kind of processing power would make it a LOT faster than its competitors on the WinTel side...



    Actually the Cell is designed to go into low cost game consoles, TVs and other consumer electronics.
  • Reply 13 of 43
    brendonbrendon Posts: 642member
    Quote:

    Originally posted by Programmer

    Why is everyone so fixated on the Cell as a coprocessor? It would be a lot better as a Mac central processor.



    For me it is the fear that if Apple does use Cell for a main processor, it will be like or worse than Altivec, where it is great technology but few outside of Apple will use it for quite some time. It would be great if there was a way for Apple to put a 970 front end on a Cell like processor. Baring that it would be great for Apple to be able to use Cell technology to add altivec like, SPEs, to a 970. That way if a programmer had the time to learn about the cell features and utilize them that would be great for them. If time did not permit then all would not be lost, they would have to worry about wheather the program would work well given the knowledge that they have for 970PPC programming. For me this is a transition step coprocessor or better for me is incorporation cell like technology into the 970 line to get the ball rolling.
  • Reply 14 of 43
    programmerprogrammer Posts: 3,457member
    Quote:

    Originally posted by Brendon

    For me it is the fear that if Apple does use Cell for a main processor, it will be like or worse than Altivec, where it is great technology but few outside of Apple will use it for quite some time.



    Three points:



    1) There is no worry about nobody programming for Cell, given that Sony, Toshiba and IBM are pushing it. Game consoles and super computers... that will attract a fair bit of developer interest. Add Apple and you've got desktops as well.



    2) Even it is only used by the OS (i.e. only Apple codes for it) it will be a big win.



    3) The developers who aren't going to use SPEs are the same ones that won't bother to learn how to make the 970 go fast, so 4-5 GHz PPE vs. 2-2.5 GHz 970 won't make much difference.

  • Reply 15 of 43
    mi0immi0im Posts: 8member
    Quote:

    Originally posted by Programmer

    These machines can typically manage a theoretical peak rate of 50-100 single precision FLOPS per cacheline read from memory, assuming proper streaming and prefetching... that is a lot of computation and typically programmers don't come even remotely close to this.



    Two points:

    (1) CELL SPE has no cache.

    (2) I'm talking about DP math used for scientific applications. This presentation describes typical B/F ratio of such applications.
  • Reply 16 of 43
    Quote:

    Originally posted by mi0im

    (1) CELL SPE has no cache.



    Technically yes. What they have are local stores, which are similar to, but not exactly like a cache.
  • Reply 17 of 43
    programmerprogrammer Posts: 3,457member
    Quote:

    Originally posted by mi0im

    Two points:

    (1) CELL SPE has no cache.

    (2) I'm talking about DP math used for scientific applications. This presentation describes typical B/F ratio of such applications.




    Right -- slide 13 is what I'm talking about. The B/F of an algorithm. This can be strongly affected by how the algorithm is encoded, and often lazy/naive programmers over-utilize bandwidth. By improving how efficiently bandwidth is used, the B/F ratio is shifted toward FLOPS.



    Keep in mind as well that the Cell's SPEs have an extremely high bandwidth local store, and inter-SPE bus. For algorithms that can be arranged to stream through these local stores, the B/F ratio of the hardware can be shifted significantly toward bandwidth... as long as the aggregate bandwidth to main memory doesn't exceed the XDRAM's capability (or the I/O ports, depending on where data is going to/from).





    Oh, and the lack of cache in SPEs is considered a good thing. If bandwidth is what you're worried about then finer control over it lets you take maximum advantage of what you have. Caches do not give you fine control. An asynchronous DMA engine to local memory does.

  • Reply 18 of 43
    mikenapmikenap Posts: 94member
    sounds like a bunch of B/F to me! hahahahahaha!
  • Reply 19 of 43
    brendonbrendon Posts: 642member
    Quote:

    Originally posted by Programmer

    Right -- slide 13 is what I'm talking about. The B/F of an algorithm. This can be strongly affected by how the algorithm is encoded, and often lazy/naive programmers over-utilize bandwidth. By improving how efficiently bandwidth is used, the B/F ratio is shifted toward FLOPS.



    Keep in mind as well that the Cell's SPEs have an extremely high bandwidth local store, and inter-SPE bus. For algorithms that can be arranged to stream through these local stores, the B/F ratio of the hardware can be shifted significantly toward bandwidth... as long as the aggregate bandwidth to main memory doesn't exceed the XDRAM's capability (or the I/O ports, depending on where data is going to/from).





    Oh, and the lack of cache in SPEs is considered a good thing. If bandwidth is what you're worried about then finer control over it lets you take maximum advantage of what you have. Caches do not give you fine control. An asynchronous DMA engine to local memory does.




    OK, FP is important to me and I think to the life of Apple in the scientific community. Double precision is where that is at and it would great news for me if the SPEs could do Double precision FP.
  • Reply 20 of 43
    programmerprogrammer Posts: 3,457member
    Quote:

    Originally posted by Brendon

    OK, FP is important to me and I think to the life of Apple in the scientific community. Double precision is where that is at and it would great news for me if the SPEs could do Double precision FP.



    Well then be happy -- it is 100% certain that the SPEs and the PPE do double precision floating point. The Cell's aggregate performance at DP, however, is only about 2.5x that of a single 2.7 GHz 970 so its nothing to write home about. On the other hand, a great many algorithms that currently use double precision do so out of laziness instead of any actual need for that level of precision.
Sign In or Register to comment.