Power5

1235

Comments

  • Reply 81 of 106
    bigcbigc Posts: 1,224member
    Is that why the new cases have four memory slots?
  • Reply 82 of 106
    [quote]Originally posted by Programmer:

    <strong>Its worth pointing out here, I think, that just because the FSB will capable of 6.4 GB/sec the memory subsystem doesn't need to. Yes it would be nice if it can, but frankly if the economics and logistics of such a system don't make sense then so be it.



    Apple has used dual-banked architectures in several past machines though, so its not out of the question.</strong><hr></blockquote>



    Well, this forum is constantly on about the deficiencies of MaxBus causing a bottle neck for the G4. It supposedly can't get data fast enough to show it's true colours at times. Not providing the full bandwidth in the memory system of the FSB will do the same to the 970. Sure it would work and be a little cheaper but *not much* cheaper. Chipsets are cheap compare to the rest of the motherboard and the whole system. At least in the Wintel world.



    And just because Apple is probably developing their own doesn't mean anything - they would develop their own single channel chipset too. For them the cost diff would be worth it if it meant getting the most of the Whiz Bang New Thing they got to show the crew. :cool:



    MM
  • Reply 83 of 106
    Since this is the Power5 thread:

    The mention of the Power5 being up to 4 times faster than the Power4 may be misleading. The Power5 is supposed to have some new hardware to help in network tasks and other things that are usually done in software. These may be the areas of "4x faster". For general purpose computing such as what we are interested in I doubt the Power5 is going to maintain the 4x lead. I expect probably some more modest increase in instructions per cycle (say, 1.3x) and greater clock speeds. From what I remember of the early details, the rest of the chip wasn't changing much from the Power4.



    Of course, most of this goes out the window if they have a good SMT implementation. Ie, better than the Pentium4's which only gives ~20% average increase for multi threaded apps.



    MM
  • Reply 84 of 106
    [quote]Originally posted by MartianMatt:

    <strong>Well, this forum is constantly on about the deficiencies of MaxBus causing a bottle neck for the G4. It supposedly can't get data fast enough to show it's true colours at times. Not providing the full bandwidth in the memory system of the FSB will do the same to the 970. Sure it would work and be a little cheaper but *not much* cheaper. Chipsets are cheap compare to the rest of the motherboard and the whole system. At least in the Wintel world.

    </strong><hr></blockquote>



    But if building a dual bank system requires double the number of traces on the motherboard, for example, then it does impact overall cost. I freely admit that I don't know. That wasn't my point. My point was simply that just because the 970 can handle 6.4 GB/sec of bandwidth doesn't mean that Apple has to deliver 6.4 GB/sec of bandwidth. Dual banked DDR333 would give us 5.4 GB/sec and would be just fine. In a dual processor 970 they are very unlikely to give us 12.8 GB/sec but that's what the processors could handle.
  • Reply 85 of 106
    [quote]Originally posted by MartianMatt:

    <strong>Since this is the Power5 thread:

    The mention of the Power5 being up to 4 times faster than the Power4 may be misleading. The Power5 is supposed to have some new hardware to help in network tasks and other things that are usually done in software. These may be the areas of "4x faster". For general purpose computing such as what we are interested in I doubt the Power5 is going to maintain the 4x lead. I expect probably some more modest increase in instructions per cycle (say, 1.3x) and greater clock speeds. From what I remember of the early details, the rest of the chip wasn't changing much from the Power4.



    Of course, most of this goes out the window if they have a good SMT implementation. Ie, better than the Pentium4's which only gives ~20% average increase for multi threaded apps.

    </strong><hr></blockquote>



    The original "4x faster" quote didn't say anything about in what way it was four times faster. It could include SMT, multiple cores, higher clock rate, more work per clock, VMX, and FastPath. We don't know yet, but you're right not to interpret it as "we can do 4x as much per clock cycle!". That's just foolish.
  • Reply 86 of 106
    [quote]Originally posted by Programmer:

    <strong>



    The original "4x faster" quote didn't say anything about in what way it was four times faster. It could include SMT, multiple cores, higher clock rate, more work per clock, VMX, and FastPath. We don't know yet, but you're right not to interpret it as "we can do 4x as much per clock cycle!". That's just foolish.</strong><hr></blockquote>



    I don't doubt what you're saying, but I would have thought that VMX alone could provide much greater than 4X speed on SIMD-friendly operations. Or perhaps the FP units in the 970 are so good that the SIMD factor is diminished a little?
  • Reply 87 of 106
    [quote]Originally posted by Programmer:

    <strong>

    In a dual processor 970 they are very unlikely to give us 12.8 GB/sec but that's what the processors could handle.</strong><hr></blockquote>



    The usual disclaimer: I am not an engineer.



    With that out of the way, I presume that you're just extrapolating to a best-possible case peak throughput here. For sustained throughput figures, I would have thought that there's a law of diminishing returns in effect, a kind of tax on additional processors. Is 10% per extra processor a reasonable ball-park figure?



    If so, we might suppose that a dual 1.8GHz 970 system would make good use of up to 11.5 GB/s bandwidth. Yowza!



    Come to think of it, I seem to recall seeing analyses to the effect that system overheads would limit the 6.4 GB/s figure to a mere (ahem) 5.6GB/s or thereabouts. In which case we could talk about a dual 1.8 970 wanting about 10GB/s to feel satisfied.
  • Reply 88 of 106
    [quote]Originally posted by boy_analog:

    <strong>Come to think of it, I seem to recall seeing analyses to the effect that system overheads would limit the 6.4 GB/s figure to a mere (ahem) 5.6GB/s or thereabouts. In which case we could talk about a dual 1.8 970 wanting about 10GB/s to feel satisfied.</strong><hr></blockquote>

    Keep in mind that's 6.4GB/s aggregate bandwidth. The FSB consists of two 3.2GB/s one-way buses so unless you are saturating the bus both ways you don't get the aggregate throughput.



    Just like "full duplex" 100BaseT does not actually give you 200Mb/s throughput -- both because of signalling overhead and because of the nature of how data is processed (sustained throughput is relatively rare).
  • Reply 89 of 106
    [quote]Originally posted by Tomb of the Unknown:

    <strong>

    Keep in mind that's 6.4GB/s aggregate bandwidth. The FSB consists of two 3.2GB/s one-way buses so unless you are saturating the bus both ways you don't get the aggregate throughput.



    Just like "full duplex" 100BaseT does not actually give you 200Mb/s throughput -- both because of signalling overhead and because of the nature of how data is processed (sustained throughput is relatively rare).</strong><hr></blockquote>



    True, true.



    That aspect did cross my mind, but I presumed that when we're talking about RAM (as earlier respondents were), it is more relevant to talk about the aggregate figure. Once again, me no engineer nosireebob, so I could be misunderstanding the situation.



    P.S. BTW, my guesstimate of a 10% per-processor overhead is obviously directed at relatively conventional SMP setups. Obviously, this couldn't be an absolute limitation or it would make massively parallel systems slower than simpler setups. Or perhaps the overhead doesn't scale linearly with the number of processors?



    [ 02-23-2003: Message edited by: boy_analog ]</p>
  • Reply 90 of 106
    [quote]Come to think of it, I seem to recall seeing analyses to the effect that system overheads would limit the 6.4 GB/s figure to a mere (ahem) 5.6GB/s or thereabouts. In which case we could talk about a dual 1.8 970 wanting about 10GB/s to feel satisfied. <hr></blockquote>

    I believe the 6.4gb/s number is the true amount of throughput after factoring in system overhead. Total throughput of the 970's busses are somewhere around 7.2gb/s.
  • Reply 91 of 106
    stoostoo Posts: 1,490member
    [quote]The thing is does the Power5 have a SIMD unit or do we care.<hr></blockquote>



    Power5: maybe not; PowerPC 9x0: we (desktop users) definitely care, so it will definitely have SIMD. Apple probably won't use the Power5 itself but rather it's close PowerPC relative (980?). Of course, we'll have to wait and see how the Power5 line is organised to see what distinctions there are between Power and PowerPC.



    [quote]By the way, folks, I've already ordered the glass table with chrome legs for my Uber 970 POWERMac.<hr></blockquote>



    Such preparation is vital.



    [quote]Apple knows full well that the first 970 machines are going to sell like mad, and so the first release of such chips will probably be single CPU's.<hr></blockquote>



    I'd expect that lower GHz/single 970s will shipfirst, then duals a few weeks later, like the current 12"/17" PowerBook situation.



    More bake-offs at MacWorlds/WWDC?

    Return of the Apple Hype-er drive?



    [quote]So I predict that Apples first 970 system will have dual channel DDR support. This is not much more expensive than single channel. Look at the NVidia nForce chipset which has dual DDR. It is expensive for a chipset but is &lt;~$70 and includes good graphics and great sound. <hr></blockquote>



    I suspect that the nForce has two separate DDR channels (one for rhe graphics chipset, one for the CPU), rather than two going to the same place. If you have two sets of traces in the same place that can carry 3.2GB/s each then the motherboard will cost more (more traces require PCBs with more layers.



    That doesn't meant that Apple won't use it . And while they're at it, 5.1 sound would be nice too.
  • Reply 92 of 106
    [quote]Originally posted by boy_analog:

    <strong>I don't doubt what you're saying, but I would have thought that VMX alone could provide much greater than 4X speed on SIMD-friendly operations. Or perhaps the FP units in the 970 are so good that the SIMD factor is diminished a little?</strong><hr></blockquote>



    IBM doesn't usually express much interest in VMX -- if they say a "4x speedup" the last thing I'd expect is that they are talking about VMX. They are probably refering to SPECmarks or some other standard server/transation benchmark.



    FYI: the floating point speed up possible by using AltiVec vs. FPU on the 970 will probably be about half. The 970 has two floating point units compared to the G4's single one. This means that while the AltiVec unit is still doing 4 multiply-add instructions per cycle, the FPU(s) are now doing 2 instead of 1. This will only be true to the first approximation, mind you, because of a bunch of other factors like deeper pipelines, better completion/retirement resources, inter-instruction stalls, etc.
  • Reply 93 of 106
    [quote]Originally posted by boy_analog:

    <strong>The usual disclaimer: I am not an engineer.



    With that out of the way, I presume that you're just extrapolating to a best-possible case peak throughput here. For sustained throughput figures, I would have thought that there's a law of diminishing returns in effect, a kind of tax on additional processors. Is 10% per extra processor a reasonable ball-park figure?



    If so, we might suppose that a dual 1.8GHz 970 system would make good use of up to 11.5 GB/s bandwidth. Yowza!



    Come to think of it, I seem to recall seeing analyses to the effect that system overheads would limit the 6.4 GB/s figure to a mere (ahem) 5.6GB/s or thereabouts. In which case we could talk about a dual 1.8 970 wanting about 10GB/s to feel satisfied.</strong><hr></blockquote>



    The 970's bus isn't shared so, assuming the memory controller(s) can handle it, the net aggreggate bandwidth is 6.4 GB/sec per processor. Since the processors are "SMP capable" its probably a good bet that they have an efficient protocol to exchange data, most likely with the "companion chip" acting as a router.



    As mentioned above, the actual bandwidth the FSB has is 7.2 GB/sec but the overhead cuts it down to a paltry 6.4 GB/sec. :cool:



    And the point about it actually being 3.2 GB/sec in each direction simultaneously is a good one as well -- some algorithms won't see much better than 3.2 GB/sec and thus will get almost the full bandwidth from a single DDR333 channel. A dual channel setup could not be fully taxed by a program that was only reading or only writing.



    All good points, but oh what a happy circumstance to find oneself in after being stuck at ~1 GB/sec for the last couple of years.
  • Reply 94 of 106
    sc_marktsc_markt Posts: 1,402member
    <a href="http://www.eetimes.com/semi/news/OEG20030224S0052"; target="_blank">IBM weaves multithreading into Power5</a>



    [ 02-24-2003: Message edited by: sc_markt ]</p>
  • Reply 95 of 106
    [quote]Originally posted by Stoo:

    <strong>



    I suspect that the nForce has two separate DDR channels (one for rhe graphics chipset, one for the CPU), rather than two going to the same place. If you have two sets of traces in the same place that can carry 3.2GB/s each then the motherboard will cost more (more traces require PCBs with more layers.



    That doesn't meant that Apple won't use it . And while they're at it, 5.1 sound would be nice too. </strong><hr></blockquote>



    The nForce has 2 channels from the memory controller to the memory. It then has the standard 200-333 Mhz FSB with 1.6-2.7 GB/s to the processor plus other connections to the system. The channels are not split out in the way you are implying.
  • Reply 96 of 106
    [quote]Originally posted by Tomb of the Unknown:

    <strong>

    Keep in mind that's 6.4GB/s aggregate bandwidth. The FSB consists of two 3.2GB/s one-way buses so unless you are saturating the bus both ways you don't get the aggregate throughput.

    </strong><hr></blockquote>



    Good point. I was going from mem and forgot that point. Thanks for remining me/us. <img src="embarrassed.gif" border="0">



    MM
  • Reply 97 of 106
    [quote]Originally posted by Programmer:

    <strong>

    As mentioned above, the actual bandwidth the FSB has is 7.2 GB/sec but the overhead cuts it down to a paltry 6.4 GB/sec. :cool:

    </strong><hr></blockquote>



    Yes, but I just realised after getting the reminder from TotU that the P4 has from 3.2 to 4.26 now and is going to 6.4 (800 MHz) later this year. Is that each way or is that also an agregate figure? If that is each way then IBM need to release the 3.2 GHz 970s realy fast to match FSB speed!



    [quote]

    <strong>

    All good points, but oh what a happy circumstance to find oneself in after being stuck at ~1 GB/sec for the last couple of years.</strong><hr></blockquote>



    Rather! Still, I'm writing this from an Athlon system and haven't extensively used an Apple since the late 80's at school (Apple IIc). I'm looking to be a "switcher" though, starting with a PB and then perhaps a 970.



    MM



    [ 02-25-2003: Message edited by: MartianMatt ]</p>
  • Reply 98 of 106
    "This will come as interesting news to Apple fanatics too. IBM is producing a cut down version of the Power5 with just a single core. However, that single core should be capable of multithreading, another boost for Apple quite apart from the advantages of going to 64bit."



    <a href="http://www.theinquirer.net/?article=7973"; target="_blank">inquirer</a>
  • Reply 99 of 106
    [quote]Originally posted by Producer:

    <strong>"This will come as interesting news to Apple fanatics too. IBM is producing a cut down version of the Power5 with just a single core. However, that single core should be capable of multithreading, another boost for Apple quite apart from the advantages of going to 64bit."



    <a href="http://www.theinquirer.net/?article=7973"; target="_blank">inquirer</a></strong><hr></blockquote>



    Oh ... My ... God ...
  • Reply 100 of 106
    costiquecostique Posts: 1,084member
    [quote]Originally posted by Producer:

    <strong>"IBM is producing a cut down version of the Power5 with just a single core."</strong><hr></blockquote>

    I must be going crazy. :eek: Did you notice the present continuous tense in IBM is producing?
Sign In or Register to comment.