x86 vs. PPC

pbpb
Posted:
in Future Apple Hardware edited January 2014
I am not sure if here is the correct place to post this, I just found an interesting article discussing some aspects of the x86 vs. PPC controversy. What I find interesting for future processor designs, is that the author states that soon, x86 (and especially Intel) will hit its head to the heat wall. This comes from a report by the publishers of Microprocessor Report. In this respect at least, the PPC architecture seems much more promising for the future. What do you think?
«1

Comments

  • Reply 1 of 36
    gamblorgamblor Posts: 446member
    People thought the Pentium, then the Pentium Pro would be the end of the line for x86, too. Those chips were what, three and four generations ago? Don't underestimate Intel's ability to squeeze even more blood out of the turnip that is x86...
  • Reply 2 of 36
    I read through that article when it came out and felt it was missing a lot. It seemed to be a mostly hand-waving "you will now believe in the MHz Myth" kind of article, lacking in attention to crucial differences in instruction set, cache size and organization, bus architecture, and many other factors which make up the most significant differences between the architectures.



    * The reason X86 has been so successful is not despite the ugliness of the instruction set, but because of it. As he correctly points out, the original 80386 instruction set is not what the Pentium 4, Athlon or any other modern x86 processor speaks on the inside, but this actually adds to the flexibility of the design. Rather than designing the processor directly to the ISA (as is essentially done with the PPC), the x86 vendors write to their own micro-operations and translate the x86 instructions into micro-operations - thus giving more flexibility for deep pipelining ala the Pentium 4.

    * The 8 GPRs of the x86 ISA are an abomination, but his article ignores the existance of rename registers which allow the processor to save data in registers without going to L1 or L2 cache. These 8 GPRs are also why x86 is so easy to emulate on other architectures and why a PowerPC emulator on X86 would be more difficult - and also leads back into the point about X86 giving processor designers more flexibility.

    * Power consumption is a red herring in this article. It's absolutely irrelivant to a discussion about performance. It is relevant to a discussion of the marketplace; however, there are more factors in the marketplace than just power consumption.

    * The instruction sets in question make a huge difference in terms of performance. For instance, the floating-point multiply-add, which should be familiar to DSP-types as "multiply and accumulate", is present in the PPC but on x86. If all floating point operations took an equal number of cycles to complete on x86 and ppc, a 2GHz PPC could perform twice the number of floating-point multiply-adds that a 2GHz x86 could. The dual floating-point units of the 970 further provide the 970 a boost in floating-point operation.

    * Altivec was crippled by bus bandwidth when working with large data sets, but the level 3 cache of the 7450 offset this. Even though the 970 has more bandwidth, the latency involved with a memory transfer greatly exceeds that of a transfer from level three cache. Thus, if your AltiVec code works on a data set size between 0 and 256K, you should expect to see equal performance assuming equal clock rates. Between 256K and 512K, expect the 970 to have a slight lead per clock cycle. Between 512K and 1/2MB (depending on the size of your L3 cache), the 7450 has a slight per-clock lead, and after that it depends on the willingness of the voodoo spirits in your memory controller to transfer your data with high enough bandwidth and low enough latency.

    * AltiVec is far easier to write than SSE/SSE2 code. The Altivec programming model is really quite well designed and a pleasure to code for. SSE2, on the other hand, is annoying to write for and much less capable than AltiVec. If you can afford to AltiVec your code, expect to see the 970 (especially dual 970 systems) far exceed P4 and Athlon systems in price/performance.

    * Using SPEC as a benchmark is always unfair to somebody. It's a dicksize war, to be frank, and the goal is to create the compiler that spits out the closest thing to hand-tuned assembly from the SPEC benchmarks. Nobody cares. What's more important to me is the combination of programmer time and processor time - if I need to do a one-shot simulation, and it takes 18 hours to run on a P4 3.2 and 22 on a 2.0GHz 970, but in four hours I can AltiVec it to run in 10 on the 970 (and in four hours I might just have figured out how to use SSE2), then the 970 is the clear winner.

    * Simultaneous Multi-Threading will never, ever offer 100% speed improvement - the best SMT is simply a dual-core processor (ala POWER4), and even going SMP doesn't offer 100%. There's always some overhead. In SMT, however, you're cutting down on the number of functional units in the processor to accomidate multiple threads - thus slowing things down wherever there's instruction-level parallelism.

    * If the trend towards fanless, quiet, low-power, and small x86 systems continues, in a year's time the average Mac may well be faster than the average PC



    Any other points I missed?
  • Reply 3 of 36
    zapchudzapchud Posts: 844member
    Good post, Karma



    As an addition to why the best desktop CPU implementing the PPC-ISA (G5), I just want to note that it, while having both the fastest and best FPUs and the fastest/best SIMD-tech, it can execute both these kinds of operations simultaneously. This is different to, at least, the Pentium 4. Shared FPUs/SSE2 is much slower than explicit units for each type of operation. In fact, Intel wants developers to do FP-ops in SIMD (because of the notorious nature of x87/x86-FP), and as the author of the AK points out, it is not cool to write. Extra developer hassle.
  • Reply 4 of 36
    mr. memr. me Posts: 3,219member
    Quote:

    Originally posted by Anonymous Karma

    ....



    * The reason X86 has been so successful is not despite the ugliness of the instruction set, but because of it. As he correctly points out, the original 80386 instruction set is not what the Pentium 4, Athlon or any other modern x86 processor speaks on the inside, but this actually adds to the flexibility of the design. Rather than designing the processor directly to the ISA (as is essentially done with the PPC), the x86 vendors write to their own micro-operations and translate the x86 instructions into micro-operations - thus giving more flexibility for deep pipelining ala the Pentium 4.



    .....



    Any other points I missed?




    It seems that you missed the early years of processor design. What you described is the very definition of CISC. You ought to read up on the Motorola 88000 if you want to know how quickly a clean-sheet RISC processor can be designed. Look at how quickly Apple, IBM, and Motorola reduced the original POWER chip set to a single chip. Since then, they have added Altivec and gone through POWER II, POWER III, POWER 4, and five generations of the PowerPC, including the PPC 970. All the while, they have maintain ISA-compatibility with the original POWER chip set.



    Look around at the other RISC architectures. You see similar progress with MIPS and SPARC. Flexibility, thy name is RISC.
  • Reply 5 of 36
    amorphamorph Posts: 7,112member
    Also, the idea that the CPU should not run on microcode is a RISC design philosophy, not something intrinsic to the PPC's ISA. IBM has had no trouble "cracking" PPC instructions for internal processing in the POWER4 and the 970, so nothing about the PPC's ISA limits flexibility in the sense you're describing. It's just that RISC sought to greatly clean up and simplify CPU design by (among other things) eliminating microcode. And, I should add, they succeeded.
  • Reply 6 of 36
    addisonaddison Posts: 1,185member
    As little as 4-5 monthss ago there were people on these boards and some journalists who were urging Apple to go x86. Now all I can hear is .....





    silence.
  • Reply 7 of 36
    placeboplacebo Posts: 5,767member
    Don't say that too loud, Addison...any moment now Existence will come and tell us about how Centrino has a better clockspeederformance ratio than the G5...
  • Reply 8 of 36
    stoostoo Posts: 1,490member
    Quote:

    In fact, Intel wants developers to do FP-ops in SIMD



    That's why SSE (or is it SSE 2?) support 128 bit vectors composed of 2x 64bit floating point numbers...
  • Reply 9 of 36
    sc_marktsc_markt Posts: 1,393member
    Regarding SPECint2000 and SPECfp2000 testing, why not use SIMDSSE/SSE2 or Altivec if it improves either of these scores? Seems fair to me. The way I look at it, however you can get a given processor to improve these scores, then that is what the processor is capable of.
  • Reply 10 of 36
    programmerprogrammer Posts: 3,409member
    Quote:

    Originally posted by Anonymous Karma

    I read through that article when it came out and felt it was missing a lot. It seemed to be a mostly hand-waving "you will now believe in the MHz Myth" kind of article, lacking in attention to crucial differences in instruction set, cache size and organization, bus architecture, and many other factors which make up the most significant differences between the architectures.





    Good post.



    Quote:

    * The reason X86 has been so successful is not despite the ugliness of the instruction set, but ...



    IMHO the reason X86 has been so successful can be summarized in one word: money.



    Quote:

    * The 8 GPRs of the x86 ISA are an abomination...



    The article does talk about rename registers, although it doesn't do a very good job of it, IMO. The lack of registers (and various other features of the x86 ISA), however, directly impacts how consisely a program can implement an algorithm. If a bunch of move instructions are required to juggle things in registers they are just noise, they are not really contributing to the solution of the problem. The same applies to the setting of the condition code register(s), and various other low-level activities.



    Quote:

    * Power consumption is a red herring in this article. It's absolutely irrelivant to a discussion about performance.



    Except for one thing -- given a fixed power budget (e.g. any more and you'll melt your chip) how much performance do you get? So far only the x86 crowd has been trying to find that out. The 970 is the first PowerPC since the 604e which is approaching that particular budget limit.



    Quote:

    * The instruction sets in question make a huge difference in terms of performance.



    This is very similar to my first point.



    Quote:

    * Using SPEC as a benchmark is always unfair to somebody.



    Especially since pretty much everybody cheats, and even if they don't they are accused of cheating.



    Quote:

    * Simultaneous Multi-Threading will never, ever offer 100% speed improvement.



    Not true. You are assuming that the processor can be fully utilized in single-threaded mode. In many many cases this is not only false, it is very false. Imagine a situation where you have a thread that is waiting on memory 75% of the time, but are capable of tracking 4 threads in the same core. This would let you have 100% processor utilization and run all 4 threads at the same speed as they would have each run seperately on a single threaded processor.



    Quote:

    * If the trend towards fanless, quiet, low-power, and small x86 systems continues, in a year's time the average Mac may well be faster than the average PC



    PowerPC is already well established in the low power market, so adding the G5 gets them back in the high-end game. With the advantages of a better instruction set and architecture this should allow IBM/Apple to compete while spending far less money than Intel and AMD. Intel can afford this. AMD can't.
  • Reply 11 of 36
    overtoastyovertoasty Posts: 439member
    Quote:

    Originally posted by Anonymous Karma





    * The reason X86 has been so successful is not despite the ugliness of the instruction set, but because of it. As he correctly points out, the original 80386 instruction set is not what the Pentium 4, Athlon or any other modern x86 processor speaks on the inside, but this actually adds to the flexibility of the design. Rather than designing the processor directly to the ISA (as is essentially done with the PPC), the x86 vendors write to their own micro-operations and translate the x86 instructions into micro-operations - thus giving more flexibility for deep pipelining ala the Pentium 4.





    OK, I'm no guru, but I still don't see how enforced code cracking actually adds to the speed of the processor vs a processor who's instruction set doesn't require (much)cracking?



    So we have a whole bunch of symbols that come in, which we then have to translate into something else ... sure that "something else" may be a whole lot faster that what might have been done with the original symbols - but unless that "something faster" makes up for breaking both Moore's and Amdahl's law, I don't see how this is of any real benefit.



    Moore's - instead of using all those transitors for code cracking, they could be better used for actual processing elsewhere, thus speeding things up.



    Amdahl's - let's assume we had two infinitely fast processors, one with the fastest code cracking front end that intel could come up with, and one that just accepted instructions directly ... the one without the code cracking front end would still be infinitely faster.



    Now I think I understand the logic A.K. was getting at, that if enforced code cracking allowed the chip much more freedom to take advantage of improved methods of processing - above and beyond what a chip that couldn't code crack could do, in other words, enough to make up for the problems with Moore's and Amdahl's law that code cracking introduces - then yes, code cracking is a huge advantage: and in the case of x86 instructions, it certainly does more than make up for what is lost to Mr. Moore and Mr. Amdahl ...



    The problem with this way of looking at things is it doesn't do much when the incoming code is close to as good (or perhaps even better) as what any cracker could have come up with anyway ... then there's no escaping Moore's (original) law ... and any transisters not needed for cracking can be better put to use elsewhere actually processing.



    Now I'm aware there's a fine line between "actually processing" and "re-ordering or translating incoming instructions into the most efficient means possible" ... but if the incoming instructions are a whole lot closer to the ideal in one system over another (as in PPC's over x86), I fail to see how even this fuzzy line offers much of an advantage.



    Hell, maybe I should just stick to philosophy?



    http://homepage.mac.com/charliemacchia/



  • Reply 12 of 36
    ast3r3xast3r3x Posts: 5,012member
    haha nice video...i want to see more like it!





    what did you make it with?
  • Reply 13 of 36
    Quote:

    Originally posted by ast3r3x

    haha nice video...i want to see more like it!





    what did you make it with?




    Thanks



    It's just me, FCP3, my PowerBook, and bicycle ...
  • Reply 14 of 36
    ast3r3xast3r3x Posts: 5,012member
    Quote:

    Originally posted by OverToasty

    Thanks



    It's just me, FCP3, my PowerBook, and bicycle ...




    well its nice, make more, i found myself laughing every once and a while
  • Reply 15 of 36
    Quote:

    Originally posted by OverToasty

    [snip]

    Amdahl's - let's assume we had two infinitely fast processors, one with the fastest code cracking front end that intel could come up with, and one that just accepted instructions directly ... the one without the code cracking front end would still be infinitely faster.



    [snip]

    Hell, maybe I should just stick to philosophy?







    Heh, I'm a philosophy student too. What's that thing sticking out of your head at the start of the video?



    (Just teasing, what I've seen so far looks nice.)



    But wrt your statement of Amdahl's law, there's a few more infinities there than I'm comfortable with. Surely this can be remedied by one of the engineers here?
  • Reply 16 of 36
    programmerprogrammer Posts: 3,409member
    Quote:

    Originally posted by OverToasty

    OK, I'm no guru, but I still don't see how enforced code cracking actually adds to the speed of the processor vs a processor who's instruction set doesn't require (much)cracking?



    The x86 makers gained advantage from enforced code cracking because it forced them to spend the money to keep getting faster. The RISC guys haven't had to work as hard to stay competitive until relatively recently. As a result the x86 crowd could suddenly leverage this sophisticated deeply piplined, speculative, superscalar, OoOE core and push it a lot farther and do it faster than the RISC crowd. Now that IBM has put in the same effort they are once again very competitive because they aren't saddled with the baggage of decoding a stupid, backward, hacked together instruction set.
  • Reply 17 of 36
    overtoastyovertoasty Posts: 439member
    Quote:

    Originally posted by Programmer

    The x86 makers gained advantage from enforced code cracking because it forced them to spend the money to keep getting faster. The RISC guys haven't had to work as hard to stay competitive until relatively recently. As a result the x86 crowd could suddenly leverage this sophisticated deeply piplined, speculative, superscalar, OoOE core and push it a lot farther and do it faster than the RISC crowd. Now that IBM has put in the same effort they are once again very competitive because they aren't saddled with the baggage of decoding a stupid, backward, hacked together instruction set.



    Wow!



    There's a principle in here someplace isn't there?



    A - Money makes a big difference.



    B - If you've got the resources, sometimes a big disadvantage can actually force you to innovate.



    So what are the odds that Microsoft will try to write an emulator for Windows for the G5 (or 6), and attempt to migrate their user base over?
  • Reply 18 of 36
    overtoastyovertoasty Posts: 439member
    Quote:

    Originally posted by boy_analog

    What's that thing sticking out of your head at the start of the video?





    Proof that re-shoots suck?

  • Reply 19 of 36
    overtoastyovertoasty Posts: 439member
    ... oh, and FWIW ...



    I thought the original article was very well written, especially for me (a coder who doesn't know all that much about chips) I certainly found it to be an excellent explanation of the performance diffs between x86 and PPC. I'd probably point anybody who's confused or is going on and on about how Apple forged the SPEC marks, to just check it out.



    Now, after reading a few things in here, I agree that it didn't cover everything, but if anybody can point to a better, more concise explanation of the diffs between x86 and PPC (that is, under ten pages) - that includes the G5 - hey, please let me know.
  • Reply 20 of 36
    rickagrickag Posts: 1,626member
    Quote:

    Originally posted by OverToasty

    Wow!



    There's a principle in here someplace isn't there?



    A - Money makes a big difference.



    B - If you've got the resources, sometimes a big disadvantage can actually force you to innovate.



    So what are the odds that Microsoft will try to write an emulator for Windows for the G5 (or 6), and attempt to migrate their user base over?




    First off, remind me not to argue with you, your reasoning power is impressive.



    The odds of Microsoft writing an emulator, I don't know, but look to A. Microsoft is in a pickle because of the shear size of their market stranglehold. If they break compatibility with their old software, well no explanation needed.



    I do have a question though. I have been told that Windows XP is based on the NT operating system which at one time did run on PPC, is this true, somewhat true or complete hogwash?
Sign In or Register to comment.