New Ars Technica Write-up on 970

Posted:
in Future Apple Hardware edited January 2014
Just wanted to pass along Hannibal's write-up of the PPC970.



Linkage: <a href="http://arstechnica.com/cpu/02q2/ppc970/ppc970-1.html"; target="_blank">Here</a>



He's probably one of the better technical writers around, as he has a rare gift to make complex problems and concepts simple to understand.



Keep in mind that this is different than his initial news item after the MPF
«134

Comments

  • Reply 1 of 77
    marcukmarcuk Posts: 4,442member
    great article, cant wait for pt2.



    BTW, also check out the Viagra article, WTF are people doing with seal penises (peni?) anyway?
  • Reply 2 of 77
    bigcbigc Posts: 1,224member
    Looks like it now has two altivec (SIMD) units.
  • Reply 3 of 77
    amorphamorph Posts: 7,112member
    So does the G4e.



    However, I don't believe that either one is a complete AltiVec unit on either chip. One unit is designed to handle one subset of related AltiVec instructions, and the other is designed to handle another. To anything outside the CPU, they look and act like one unit.



    Similarly, outside the CPU, the CPU seems to be executing instructions one at a time, in order, while inside it's cracking them, assigning them to any number of execution units, executing them out of order and speculatively and in parallel, and then recombining them.



    However, since the 970 is designed with SMP in mind, it's quite possible that a 970-powered machine would have two whole AltiVec units - or four, or sixteen...
  • Reply 4 of 77
    [quote]Originally posted by Bigc:

    <strong>Looks like it now has two altivec (SIMD) units.</strong><hr></blockquote>



    It does... just like the current G4. The problem being that they are not *complete* altivec units. Each of these together make up a "fully operational" VMX unit.



    edit: Oh sure... the moderator beats me to the punch <img src="graemlins/lol.gif" border="0" alt="[Laughing]" />



    [ 10-28-2002: Message edited by: visigothe ]</p>
  • Reply 5 of 77
    outsideroutsider Posts: 6,008member
    [quote]Originally posted by visigothe:

    <strong>



    It does... just like the current G4. The problem being that they are not *complete* altivec units. Each of these together make up a "fully operational" VMX unit.



    edit: Oh sure... the moderator beats me to the punch <img src="graemlins/lol.gif" border="0" alt="[Laughing]" />



    [ 10-28-2002: Message edited by: visigothe ]</strong><hr></blockquote>



    Don't you know moderators can insert their posts before others.
  • Reply 6 of 77
    amorphamorph Posts: 7,112member
    [quote]Originally posted by Outsider:

    <strong>



    Don't you know moderators can insert their posts before others.</strong><hr></blockquote>



    Since when?
  • Reply 7 of 77
    outsideroutsider Posts: 6,008member
    [quote]Originally posted by Amorph:

    <strong>



    Since when?</strong><hr></blockquote>



    Since I willed it into being. (And since I was joking)
  • Reply 8 of 77
    ompusompus Posts: 163member
    From the article:



    [quote]On a final note, I should point out one reason that the 970's pipeline is a little shorter than the P4's: the 970's pipeline lacks the "drive" stages, which the P4 inserts in order to allow signals to propagate across the chip. Inserting whole pipeline stages just to account for wire delay is necessary only if you plan to push a design to insanely high clock speeds. The PowerPC 970's GHz rating, then, will top out at a much lower number than will the P4's.



    This difference in clockspeed headroom reflects a fundamental difference in the approaches of the PPC 970 and the Pentium 4. As is evidenced by the enormous power requirements that go along with its high clock speed, the P4 is really aimed at single-processor desktop systems. Sure, Intel sells the P4 Xeon for use in 2-way and 4-way server setups, but the price and power consumption of such machines restrict them primarily to the server closet. The PowerPC 970, on the other hand, is designed from the ground up with multiprocessing in mind--IBM intends to see the 970 used in 4-way or higher desktop SMP systems. So instead of increasing the performance of desktop and server systems by using a few narrow, very high clockspeed, high power CPUs IBM would rather see multiple, slower but wider, lower power CPUs ganged together via very high-bandwidth connections.<hr></blockquote>



    Will anyone care that a 1.8 GHz 970 is slower than a 3.6 Ghz p4 if you've got the ability to string 970s together like popcorn?
  • Reply 9 of 77
    bigcbigc Posts: 1,224member
    hmm... like popcorn
  • Reply 10 of 77
    [quote]Originally posted by Ompus:

    <strong>From the article:



    Will anyone care that a 1.8 GHz 970 is slower than a 3.6 Ghz p4 if you've got the ability to string 970s together like popcorn?</strong><hr></blockquote>



    I think that is exactly it. Currently the G4, while slower than the P4, isn't a near-order-of-magnitude slower as some benchmarks would expect. The 970 seems [based on what little we know] to be in a similar way. IBM has a history of being rather conservative with their estimates, and we can only wait until actual product ships to get real benchmarks/real-world performance ideas.



    Also, this thing [the 970] will top out *initially* at 1.8GHz. This has nothing to do with what the chip will do in the future. Will this chip ever beat the P4 in terms of clockspeed? Of course not. Will it beat the P4 in terms of throughput and calculation? Well, that has yet to be discovered, but I wouldn't be surprised if the 970 initially gives the P4 a run for its money, maybe besting it in some areas.



    Soon we'll see a .09 process, as well as multiple 970s in a machine [note: I don't believe the 970 will ever be dual core. The dual core will be another processor... a "generation 2" 970], so I don't think we'll have any problem keeping up with the intel crowd.



    The future's so bright, I gotta wear shades :cool:



    [ 10-28-2002: Message edited by: visigothe ]</p>
  • Reply 11 of 77
    pfflampfflam Posts: 5,053member
    I'm a complete ignoranti . . . but I have a question. Does the 970s 64 bit processing mean that it will excel at rendering large bits of data, as in video effects rendering, etc . . . as opposed to simply openning things faster?



    Is that the benefit of 64bit?
  • Reply 12 of 77
    matsumatsu Posts: 6,558member
    Popular logic suggests that the G4 would be a whole world better if it had a faster FSB, and that especially for the demands of serving two chips, the difference would be dramatic.



    This could be the reason behind PPC970's huge-big FSB bandwidth. Stringing two or 4 of these guys together, even where memory technology can't quite keep up, should still allow for fast inter-communication between the seperate CPU's ?? I think. So maybe no multi-core chips, but fast multi-CPU cards instead? Cards that should take much better advantage of multi-CPU arrangements than the current G4 allows, maybe? iDunno.
  • Reply 13 of 77
    mrmistermrmister Posts: 1,095member
    i just want to take a moment and recognize what a trully kick-ass article that was...man, I feel like the first 100 questions I had have been answered.
  • Reply 13 of 77
    [quote]Originally posted by pfflam:

    <strong>I'm a complete ignoranti . . . but I have a question. Does the 970s 64 bit processing mean that it will excel at rendering large bits of data, as in video effects rendering, etc . . . as opposed to simply openning things faster?



    Is that the benefit of 64bit?</strong><hr></blockquote>



    The benies of 64bit have been talked about thoroughly in several other topics, including the monster thread [30++ pages and counting]. But since I don't want to bother to link, it goes like this:



    64bit addressing will allow *HUGE* amounts of ram to be addressed. Think larger numbers than you can think about without imploding.



    64bit only means "faster" if instructions previously needed more precision than a 32bit word could handle [dark mojo required to make a 64bit word out of 2 32bit words]



    This chip will be fast, but it has very little to do with the size of the instruction word. The speed comes from the clock, the throughput and the additional integer and floating point units over the G4



    I hope this helps
  • Reply 15 of 77
    [quote]Originally posted by Matsu:

    <strong>Popular logic suggests that the G4 would be a whole world better if it had a faster FSB, and that especially for the demands of serving two chips, the difference would be dramatic.



    This could be the reason behind PPC970's huge-big FSB bandwidth. Stringing two or 4 of these guys together, even where memory technology can't quite keep up, should still allow for fast inter-communication between the seperate CPU's ?? I think. So maybe no multi-core chips, but fast multi-CPU cards instead? Cards that should take much better advantage of multi-CPU arrangements than the current G4 allows, maybe? iDunno.</strong><hr></blockquote>





    While popular logic would indeed suggest the G4 would be worlds better, I would differ. Yes, it would be faster than it currently is, but even with the small increase in FSB [166.66MHz] we haven't seen anything more than a linear speed increase in compute intensive operations [note: this is different than I/O and bus operations]. Would we see a 100% improvement if it were on a true DDR bus, no. We need more ALUs and FPUs to pull that off. The SIMD stuff would get faster, but not everything is SIMD-friendly



    As to your other questions, Yes. The 970 *IS* meant to live in a multi-proc environment. The huge bandwidth architecture available to the chip allows multiple chips to exist without compromise. The bottleneck then goes back to RAM technologies, rather than the chip itself, as the G4 currently is.



    The 970 is *all* about bandwidth and making sure it doesn't screw up it's predictions. Huge amounts of transistors are used to insure that the correct branches are taken, so the iops can complete as fast as they can float down the pipe [rather than stalling, and re-popping]
  • Reply 16 of 77
    krassykrassy Posts: 595member
    [quote]Originally posted by visigothe:

    <strong>





    While popular logic would indeed suggest the G4 would be worlds better, I would differ. Yes, it would be faster than it currently is, but even with the small increase in FSB [166.66MHz] we haven't seen anything more than a linear speed increase in compute intensive operations [note: this is different than I/O and bus operations]. Would we see a 100% improvement if it were on a true DDR bus, no. We need more ALUs and FPUs to pull that off. The SIMD stuff would get faster, but not everything is SIMD-friendly



    As to your other questions, Yes. The 970 *IS* meant to live in a multi-proc environment. The huge bandwidth architecture available to the chip allows multiple chips to exist without compromise. The bottleneck then goes back to RAM technologies, rather than the chip itself, as the G4 currently is.



    The 970 is *all* about bandwidth and making sure it doesn't screw up it's predictions. Huge amounts of transistors are used to insure that the correct branches are taken, so the iops can complete as fast as they can float down the pipe [rather than stalling, and re-popping]</strong><hr></blockquote>



    i haven't seen any benchmarks of the 1.25Ghz but the 1Ghz-dual doesn't perform much better than the 133mhz-bus-1Ghz-dual because it has only 1MB L3 cache instead of the 2MBL3 of the older 1Gig-dual. however - the 1.25Ghz G4with 166Mhz FSB also has 2MB of L3 cache so i would be very interested in any benchmarks of this machine ....
  • Reply 17 of 77
    I liked the arse article.



    Looks like the 970 will be a real powerhouse for multimedia.



    lemon Bon bon
  • Reply 18 of 77
    Am i the only one it seems to bother that intel is releasing chips with sse 3 next year and we still have Altivec 1...will it ever be updated? I remember reading internal documents from apple trying to servey the science area about what additional altivec like instructions would be helpful to them...this was years ago...so it seems at the time they planned on updating it....
  • Reply 19 of 77
    bartobarto Posts: 2,246member
    Ok, so instructions (well, IOPS anyway) are dispatched in groups of 5. The PowerPC 970 has enough execution units (8 excluding the CCR and Branch) to leave a few of them empty. We will have to wait until more details emerge to get an actual picture of the average IOPS/Cycle.



    The Pentium4 is geared towards spitting out 1 µop a cycle, with very little cycles unused.



    The PowerPC 970 has 2 Velocity Engine units, like the G4. The G4e has 4 units, btw. The G4e splits the Simple Int, Complex Int and FP vector unit up into three seperate units.



    However, the PowerPC 970 has 6.4GB/s (12.8GB/s in a DP System and so-on) of usable bandwidth, so unlike the G4e the 970's SIMD units should have better access to data waiting to be processed.



    It sounds like the PowerPC 970 should be at least 2-3 times faster clock-for-clock than the Pentium4.



    2003, where trinity shall return.



    Barto
  • Reply 20 of 77
    [quote]Originally posted by Producer:

    <strong>Am i the only one it seems to bother that intel is releasing chips with sse 3 next year and we still have Altivec 1...will it ever be updated? </strong><hr></blockquote>



    Well, one of the major differences between SSE and Altivec/VMX is that the IBM/Moto folk actually throw hardware at the problem. The P4, while it has SIMD instructions, and SIMD registers, it doesn't actually have a separate unit on the chip. This allows Intel to release new SSE instructions whenever it feels like, because nothing much on the chip needs to change. Also, the Altivec/VMX instructions are *quite* good for what they do. The only real "flaw" is that they are not double precision. Quite frankly, I am not too sure the additional space on the chip needed for DP would be cost-effective.



    If it ain't broke, don't fix it.
Sign In or Register to comment.