PowerPC v. Intel CPU performance

nak · June 15, 2005 9:57AM

One thing I'm confused about is the speed of Intel's CPUs.

On the PowerPC, if it says 700MHz or 1.67Ghz, it runs at those speeds. We all saw Steve's PowerMac running at 3.6GHz during his keynote, but does an Intel CPU actually run at that speed? Is it really that much faster?

Edit: PowerPC v. Intel, not Inter. clumsy fingers...

the one to rescue · June 15, 2005 11:31AM

Quote:

Originally posted by Nak

One thing I'm confused about is the speed of Intel's CPUs.

On the PowerPC, if it says 700MHz or 1.67Ghz, it runs at those speeds. We all saw Steve's PowerMac running at 3.6GHz during his keynote, but does an Intel CPU actually run at that speed? Is it really that much faster?

Edit: PowerPC v. Intel, not Inter. clumsy fingers...

Intel CPUs actually run at that speed. But frequency is not the key factor. Current Intel x86 chips have crappy FPUs (Floating Point Units) and SIMD units (Single Instruction Multiple Data... that's the vector unit of the CPU. In the PPC, it is called Altivec/VMX/Velocity Engine and on x86 it's called SSE3).

Beyond performance, one of the hottest (no pun intended) problems in the chip industry is to reduce the heat that is dissipated by the chips. And the Pentium 4 is damn bad when it comes to this (so is the G5, by the way).

Hope it'll help you...

programmer · June 15, 2005 11:34AM

Quote:

Originally posted by Nak

One thing I'm confused about is the speed of Intel's CPUs.

On the PowerPC, if it says 700MHz or 1.67Ghz, it runs at those speeds. We all saw Steve's PowerMac running at 3.6GHz during his keynote, but does an Intel CPU actually run at that speed? Is it really that much faster?

Edit: PowerPC v. Intel, not Inter. clumsy fingers...

Yes the Pentium 4 is available at up to about 3.6 GHz. The current G5 is available to up to 2.7 GHz. It really does run using a faster clock. This doesn't mean as much as you might think it does, however. Different chips get different amounts of work done each clock cycle, so you cannot just compare GHz ratings. This is the basis of the "MHz Myth".

The "MHz Myth" is not an Apple invention trying to excuse lower PPC clock ratings. Consider, for example, that a 2 GHz Pentium-M can outperform a 3.6 GHz Pentium 4 in many cases. Similarly, the AMD Athlon and Operton run at around 2 - 2.4 GHz and in many tests absolutely stomp the Pentium 4 @ 3.6 GHz. For the same reason the Power core in a 4 GHz Cell will not perform as well as a 2 GHz G5 in most cases.

Obligitory car analogy: your can't compare car performance based soley on the engine RPM. All sorts of other factors like torque and gearing affect the performance, and there are a variety of kinds of performance to consider: off-the-line acceleration, acceleration when passing on the highway, braking, top speed, fuel economy, reliability, etc.

onlooker · June 15, 2005 11:39AM

Quote:

Originally posted by Nak

One thing I'm confused about is the speed of Intel's CPUs.

On the PowerPC, if it says 700MHz or 1.67Ghz, it runs at those speeds. We all saw Steve's PowerMac running at 3.6GHz during his keynote, but does an Intel CPU actually run at that speed? Is it really that much faster?

Edit: PowerPC v. Intel, not Inter. clumsy fingers...

If you mean with that higer clocked frequency (3.6GHz) wil the CPU's perform faster than a 2.7GHz PPC? Answer is no, but I'm surprised you never reasearched the MHz myth before.

Apple is betting on future CPU's from intel besides. I don't think the current ones are what compelled Apple to switch, because they are not that impressive performance per clock cycle. Look at AMD smoke'en'em at 2.6 GHz.

nak · June 15, 2005 11:59AM

Thanks to all for clearing that up.

No, I actually never even heard of the MHz Myth before.

I did, however, know that Intel chips and the G5 are notorious for their heat output (which is why I recommend AMD to people looking for a non-mac computer configuration).

sillyfool · June 15, 2005 1:34PM

Quote:

I don't think the current ones are what compelled Apple to switch, because they are not that impressive performance per clock cycle. Look at AMD smoke'en'em at 2.6 GHz.

I wish that we would avoid these kinds of over generalizations. It's simply not true that AMD 'smokes' Intel or that

Quote:

Current Intel x86 chips have crappy FPUs (Floating Point Units) and SIMD units (Single Instruction Multiple Data...

These suppossedly 'crappy' chips are in fact powering most of the servers on the internet and most of the servers that run the software that is used to design all of the chips that we use today. And I can tell you that those of us who earn our living designing chips are very happy with their performance.

We really don't need to keep bashing on Intel for problems they just don't have. This is the kind of thing that tends to make Apple fans look silly.

chevaliermalfet · June 15, 2005 2:49PM

While I agree with the not "hating" sentiment; the internet server analogy isn't a great one for your arguement, since it's much more heavily dependent on integer performance, which traditionally has been the Pentium's strong point.

And from a technical standpoint at least, MMX, SSE, and SSE2 were much less powerful than Altivec.

atomicham · June 15, 2005 3:56PM

Quote:

Originally posted by The One to Rescue

Intel CPUs actually run at that speed. But frequency is not the key factor. Current Intel x86 chips have crappy FPUs (Floating Point Units) and SIMD units (Single Instruction Multiple Data... that's the vector unit of the CPU. In the PPC, it is called Altivec/VMX/Velocity Engine and on x86 it's called SSE3).

Well, I wouldn't say that the current P4's have a crappy FPU. In fact, the SPECfp_base2000 for a 3.46 P4 is 1714 (from spec.org). Apple's published number for the 2.0 G5 (I can't find anything about the 2.7 GHz) is 840. The P4 has over twice the FPU power (at least for those tests performed by SPEC) than the 2.0 G5.

Now, I realize everyone is going to say SPECmarks are BS, etc., but, of course, we were all trumpeting them when the G5 was the FPU leader.

EDIT: These numbers are from last summer. Also, the POWER5 has an amazing FPU number: SPECfp_base2000 = 2576 (!!!)

nak · June 15, 2005 3:59PM

Quote:

Originally posted by atomicham

we were all trumpeting them when the G5 was the FPU leader.

of course

sillyfool · June 15, 2005 5:16PM

People who try to dismiss SPEC clearly don't design processors for a living. The people who design CPUs use SPEC ase the gold standard for comparing performance.

The SPEC report for FP performance for the IBM PowerPC 970 (the chip that Apple calls the G5) is here: http://www.spec.org/osg/cpu2000/resu...011-03438.html

CPU designers pay attention to the 'base' numbers, not the peak numbers. Base numbers are less dependent on compilers.

The SPECfp2000 base number was: 1178

That's about the same as a 2.8 GHz Intel CPU.

The POWER5 numbers are in the 2500 range. That's about the same as the Intel Itanium. Of course the POWER5 is a HUGELY different SET OF CHIPS. The POWER5 is not one chip. It's multiple chips housed on a very complex and expensive ceramic MCC (multi-chip module).

POWER5 has as much to do with PowerPC as Itanium has to do with Pentium.

sillyfool · June 15, 2005 5:23PM

Quote:

Originally posted by ChevalierMalFet

While I agree with the not "hating" sentiment; the internet server analogy isn't a great one for your arguement, since it's much more heavily dependent on integer performance, which traditionally has been the Pentium's strong point.

And from a technical standpoint at least, MMX, SSE, and SSE2 were much less powerful than Altivec.

My point was that in terms of BOTH interger and FP performace, the x86 performs very well. But you're right, I could have been more clear about how I said that.

Altivec is more like a general purpose vector unit than any of the SSE units. And as such, you can always find things that Altivec will do better than SSE. But in the end, it doesn't seem to matter all that much. Itanium and x86 pretty much dominate the Engineering and Scientific applications markets.

wmf · June 15, 2005 7:09PM

Quote:

Originally posted by sillyfool

Of course the POWER5 is a HUGELY different SET OF CHIPS. The POWER5 is not one chip. It's multiple chips housed on a very complex and expensive ceramic MCC (multi-chip module).

Well, you get 8 cores and a lots of L3 cache on that MCM. Alternately, the low end POWER5 machines use a more conventional package that has one chip (two cores).

Quote:

POWER5 has as much to do with PowerPC as Itanium has to do with Pentium.

I don't think that's fair given that POWER5 and the various PowerPCs all implement the same instruction set and thus are all compatible. Meanwhile Itanium and x86 are totally different instruction sets.

sillyfool · June 15, 2005 11:51PM

I'm not aware of any single chip POWER5s. The lowest end POWER5 is a DCM (dual chip module). That's basically the Celeron of POWER5s.

It's not really true to say that the POWER5 and the PowerPC are ISA compatible. For example the POWER5 has SMT but it relies on special features of IBMs operating system for processor level execution resoure allocation. That's one of the big reasons why IBM was very reluctant to even discuss the concept of what other people were calling 'POWER5 lite', or give any hint that POWER5 could possible be used by Apple without major reworking of OS X.

Nor does POWER5 have a vector unit. You can see IBMs thinking in John Stokes' article on the POWER5: http://arstechnica.com/articles/paedia/cpu/POWER5.ars/5

Quote:

I asked Pattnaik about the possibility of bringing vector computing to the POWER5, and more specifically about a DOE paper that occasioned considerable speculation in the hardware community. Regarding the general issue of VMX + POWER5, his answer can be summarized as follows: they're reviewing the possible addition of vector technology to see if it makes sense for their customers, "because [IBM's] goal is to build a system that is balanced." But this isn't really saying much, because he went on to note they've been reviewing it in some form or other since the POWER2 days. So no specific plans to add VMX to the POWER5 were revealed or even really hinted at. And with respect to any potential POWER5 derivative with VMX added, see the question about POWER5 derivatives.

Regarding the DOE paper in specific, the answer was slightly more involved. For those not familiar with the paper, it mentions ganging together eight POWER5 cores in software to implement a sort of virtual vector processor on the POWER5. What's going on here is that the DOE was migrating their software from an architecture that supported vector processing, so they had to work with IBM to implement a custom, one-off vector processing implementation for their particular POWER5 system. Hence the use of vector processing on this supercomuter is a DOE-related issue, and not evidence of some grand strategy of IBM's to bring VMX to the POWER5.

Given the tremendous differences in design goals, micro-acrhitecture, and implementation, between the POWER5 and the PowerPC; I don't feel that I was off base.

junkyard dawg · June 16, 2005 1:35AM

Does the PPC 970 have an FPU superior to the Intel chips? Do AMD CPUs have better FPUs?

I'm just curious where this "Intel's FPU sucks" meme got started. I remember reading in some gaming publication that Intel's FPU totally rocked, and now I read at Mac forums pretty often that it actually sucks.

henriok · June 16, 2005 5:10AM

MHz is not a measure of speed (actual work done), it's ameasure of rate (how often it does stuff). One does small things very fast, it might not be as productive as another one doing large things slower.

The SPEC scores for 970 is from code compiled with gcc, a not that optimized compiler. The scores for P4 is compiled with Intel's compilers that are specifically tuned to make benchmark applications like SPEC run faster.

Apple did (ordered one from an independent institute) an apple to apple comparison and measured the scores of G5 and x86 using the same compilers (gcc and NAGWare Fortran), while applying the appropriate tuning (like disableing Hyper Threading) and found that G5 was more powerful. And.. it's known that gcc does a better job compiling for x86 than it does for PowerPC. I don't know the quality of NAGWare code.

If one would measure the optimal performance of processors it would be more appropriate to use IBM's compilers (xlc and xlf) instead, and perhaps try to make use of AltiVec since the SPEC tests run om P4 used SSE.

This _is_ known by persons which design processors for a living.

tht · June 16, 2005 9:01AM

The SPEC scores being talked about are all for single cores. This includes the Power5's 2500+ SPECfp2000 score. Every single SPECint2000 or SPECfp2000 run is single threaded and only runs on one core of a dual-core processor or only runs on one processor of a multiprocessor system. If you want to compare multiprocessor or multicore performance, you need to look at the SPECint_rate/fp_rate scores (I think).

A Power5 core is just simply an evolution of the Power4 core. The 970 is essentially a Power4 core with an SIMD unit. They all have the same 2 ALUs, the same 2 FPUs, and the same 2 load/store units. The 970 has an additional SIMD unit while the Power4/5 are dual-core processors. But as far as SPECint/fp2000 are concerned, it only runs on one core or one processor. In terms of execution units and SPEC, the three are nearly identical.

Why the huge difference in SPEC cores between Power and PPC 970 processors? Memory subsystem (assuming the same compiler).

Power4 has an inline L2 cache, L3 cache and main memory system. Data from main memory has to go through and off-die 32 MB L3 cache, then a shared on-die 1.5 MB L2 cache, then L1 and the registers. A Power4 system running SPEC could have 32 MBytes or upwards of 128 MBytes of L3 cache to use. That's huge memory resources and memory performance.

Power5 improves on the Power4's memory system with an on-die main memory controller, a backside L3 cache, lower latency cache at all levels, and more register resources. Imagine Power4 memory performance but 2x as fast.

PPC 970 has none of that. It's just a Power4 stripped down to 1 core and 0.5 MB L2 cache with a high latency FSB.

SPEC is hugely memory subsystem sensitive (and compiler sensitive), tend to exaggerate real world differences in performance, and the SPEC scores reflect this even though all of these PPC processors have the same LSUs, ALUs and FPUs.

When you look at the P4 and PPC 970 SPECfp2000 scores and see the huge difference, take it with a little perspective. It does not reflect real world performance sometimes, and in terms of FPU between the P4 and 970 this is probably the case. The P4 has 1 FPU unit. The 970 has 2 FPU units, 2 that are capable of doing multiply-adds, while the P4's one does not (I think).

The 2.3 GHz 970 FPU performance is better than or the same as a 3.6 GHz P4 given the right optimizations. The P4 is not 1.5 to 2 times faster as the SPECfp2k scores suggest. The 970 simply has more FPU resources than the P4 does. This is why the Xserve is very nice for LINPACK-type application clusters, like the Virginia Tech one, and makes up for the clock rate differential pretty well.

Integer is a different story though. The SPECint2000 scores between the P4 and PPC 970 may reflect reality, at least for a some specific codes. They both have the same integer resources, so it'll boil down to memory and compiler performance. But again it isn't going to be 2x as fast as the SPECint2k scores suggest.

The 970 is a competitive architecture. If it had one more ALU, fully pipelined ALUs, and a better memory subsystem - 1 MB L2 cache and backside L3 - I think it would outperform Opteron and Xeon hands down. But of course, IBM doesn't seem to really care about it, so you have to do what you have to do.

sillyfool · June 16, 2005 9:55AM

Henriok, did you even read the SPEC report that I posted? That report was submitted 15 months AFTER the VeriTest report.

The SPEC report was submitted by IBM. IBM choose the compilers. And unless you think that they're idiots, you should start with the assumption that they choose good compilers. OK, so what compilers did they use:

Quote:

Compiler: XL C/C++ Enterprise Edition V7.0 for AIX

XL Fortran Enterprise Edition V9.1 for AIX

Other Software: ESSL for AIX V4.2

And who makes those compilers? Well that would be IBM: http://www-306.ibm.com/software/awdtools/ccompilers/

So this is just like the case where Intel publishes SPEC results using Intel compilers.

And if you notice, the IBM SPEC results are 1178 and the VeriTest results are 840.

The IBM SPEC results are for a 2.2 GHz PPC 970, the VeriTest results are for a 2.0 GHz PPC 970.

That means that IBM got much better results than VeriTest did.

The only way that VeirTest got the PPC 970 to look good was by intentionally making the x86 look bad. The VeriTest results might appeal to someone who wants to believe the Apple Marketing Department's spin, but it's been dismissed by the rest of us.

sillyfool · June 16, 2005 10:47AM

THT, I can't imagine why you think that

Quote:

Every single SPECint2000 or SPECfp2000 run is single threaded and only runs on one core of a dual-core processor or only runs on one processor of a multiprocessor system.

That's not even remotely true. There's quite a lot of multithreading in these codes. In fact, this gets brought up all the time in discussions of Intel's Hyperthreading technology.

Quote:

If you want to compare multiprocessor or multicore performance, you need to look at the SPECint_rate/fp_rate scores (I think).

All that SPEC_rate does is run multiple instances of SPEC. If does give some insight into what might happen on a server running multiple jobs, but it tells you very little about interactions between threads within the same process.

Quote:

A Power5 core is just simply an evolution of the Power4 core.

Not even close. POWER5 is a huge step forward. People interested in a more detailed analysis of POWER5 should check out some of Paul DeMone's articles, like this one: http://realworldtech.com/page.cfm?Ar...WT100404214638

Quote:

But of course, IBM doesn't seem to really care about it, so you have to do what you have to do.

Actually, IBM put quite a lot of time, money, and brain power into the PPC and what they discovered was that hardly anyone else cared about it.

IBM and Moto have been key members of SPEC since Day-1. IBM is proud of the their SPEC scores for the POWER5 but they're not proud of this:

IBM results for 2.2 GHz PowerPC 970:: SPECint_base2000 = 986 SPECfp_base2000 =1178

Intel results for 3.8 GHz Pentium 4:: SPECint_base2000 = 1666 SPECfp_base2000 =1839

IBM never really liked the Altivec. John Stokes' interview with IBM reminds us of this, but you can see evidence of this in many other places. IBM has always believed that the relatively simple vector units in PCs are not the right path for them. IBM designed a more general purpose vector unit than Intel. But that costs them something; it costs them precision. The Altivec simply runs at lower precision then Intel's SSE units. That means that Altivec is not a good solution for many Engineering and Scientific applications.

powerdoc · June 16, 2005 12:43PM

Since both PPC and Intel cpu already exist, this thread belong to general discussion and not future hardware.

Moved here

sillyfool · June 16, 2005 1:16PM

Quote:

The 2.3 GHz 970 FPU performance is better than or the same as a 3.6 GHz P4 given the right optimizations. The P4 is not 1.5 to 2 times faster as the SPECfp2k scores suggest. The 970 simply has more FPU resources than the P4 does.

Having more resources really isn't the point. For example, more FP units don't do you any good unless the number of instructions that you can ISSUE and the number of instructions that you can RETIRE can make good use of those resources. And more to the point, there must be a balance between the resources and the requirements. So even if PPC could issue and retire enough instructions to keep the units fully loaded (which it can not), the reality is that there would be very little benefit in doing so. Which is why the architects didn't make it possible to issue and retire enough instructions to keep all the units running in parallel.

Quote:

This is why the Xserve is very nice for LINPACK-type application clusters, like the Virginia Tech one, and makes up for the clock rate differential pretty well.

Just to throw more fuel on the "SPEC doesn't predict 'Real World' performance" fire; let's take a look at SPEC numbers and the TOP500.org numbers.

If you compare the results for the VT cluster with the NCSA 'Tungsten' cluster, you'll see roughly a 40% advantage for the VT cluster. Not bad considering that the NCSA cluster is more than a year older than the VT cluster.

Those Linpack results are about what you'd predict using SPECfp_base2000 results. The SPECfp_base2000 results for the PowerPC 970 are about 15% better than the SPECfp_base2000 results for the 3.06 GHz Xeon that is used in the NCSA cluster. However, SPECfp_base2000 is actually the weighted average of 14 sub-tests. If you look at the sub-tests, you'll see that in some cases the PPC has a clear advantage (like 179.art, which can make use of Altivec, and 183.equake, and 200.sixtrack). And you'll see that in most other cases, the two CPUs are roughly equal.

So when you're using SPEC as an aid to predicting how well YOUR APPLICATION may run on different platforms, you always need to look at the sub-tests.

Also, the interconnect technology used in these clusters has a large impact on how well they perform. The VT cluster is using a better interconnect than the NCSA cluster.

And to be fair, Linpack is a useful benchmark but it's more specialized than SPEC. Linpack results are basically testing the performance of a sub-set of your math libraries. SPEC shows the performance of complete software packages. Granted that those packages spend most of their time executing a small fraction of their code, but it's a difference worth noting.

IOW, the Linpack results on a cluster is less 'real world' the SPEC because SPEC is actually running fully functional software and Linpack is testing math libraries.

tht · June 16, 2005 2:16PM

Quote:

Originally posted by sillyfool

THT, I can't imagine why you think that That's not even remotely true. There's quite a lot of multithreading in these codes. In fact, this gets brought up all the time in discussions of Intel's Hyperthreading technology.

Take a look at all 3000+ SPEC2000 results published on spec.org. Every single one of them reports single core or single processor results. All 26 CPU2000 bench programs are single threaded.

In the case of P4 HT on versus off, the P4 performed worse when HT was turned on, most likely due to the loss of resources which were shared with some other thread of execution when HT was on.

Quote:

All that SPEC_rate does is run multiple instances of SPEC. If does give some insight into what might happen on a server running multiple jobs, but it tells you very little about interactions between threads within the same process.

It's virtually the only way to assess multi-core, multiprocessor performance using the SPEC CPU bench. If it tells very little, then we should stop using it.

Quote:

Not even close. POWER5 is a huge step forward. People interested in a more detailed analysis of POWER5 should check out some of Paul DeMone's articles,

So, what are the difference exactly between the Power5 and Power4? What was the huge step forward? What does the RealWorldTech article say that I don't say?

I'm really curious about it.

Quote:

[b]Actually, IBM put quite a lot of time, money, and brain power into the PPC and what they discovered was that hardly anyone else cared about it.[b]

No, IBM didn't really care about it. The only perspective on this Appleinsider forum is from the perspective of Apple. Truly, you could tell in 1999 that both IBM and Motorola were never going to put in the resources necessary to compete against Intel. The one true advantage that IBM and Motorola had in the mid-90s was that they beat Intel or was at parity with in getting to 350 and 250 nm fabs. That's over now. In the succeeding 3 or 4 years, Apple had to rely on SIMD to market their computers. Everything else was always an uphill fight for Apple.

That's what I call not caring about it.

PowerPC v. Intel CPU performance

Comments