Powerlogix 2x1.6 GHz G4's???

cuneglasus · July 31, 2003 3:30PM

Quote:

Originally posted by Zapchud

I stand corrected

I'm fully aware of the CR unit on the G5, I was simplifying the issue, because I've been led to believe that the CR unit does not to a whole lot. But that's probably just rubbish, since they actually dedicated a full unit to it. We'll see, maybe...

I dont think anyone will be dissappointed in the G5 as long as we keep expectations within reason.Thats why I want to stress once again that we not make exagerated performance claims.Some of these claims simply amount to trashing the g4,and it doesnt deserve that.Thr G5's real performance should be impressive enough without that.

i, fred · July 31, 2003 3:38PM

While all of these dueling benchmarks are all fine and good, the question is what will Apple do with these faster G4's: Use them for another round of G4 stuff, or abandon them as fast as possible for G5's? Or both?

cuneglasus · July 31, 2003 3:39PM

Quote:

Originally posted by Powerdoc

We can say only thing : even for integer Spec int and MIPS are in favor at equal mhz of the G5. Skidmark appears quite irrelevant for a G5 (no optimisation, and remember the 7450 without optimization was prettly lame compared to the old 7400 design).

so i have no doubt that the G5 will be slighty superior for integer, and will kick the ass for fp. [/B]

We dont know that Skidmark GT wasnt optimized for G5,thats a piece of info no one here is privy to.The 7450 never has achieved the clock per clock integer performance of the old 4 stage design even after "optimisation".Pipline depth is a signifigant issue in real world code.As for Spec,well you need a box of iodized salt to use it.

cuneglasus · July 31, 2003 3:42PM

Quote:

Originally posted by I, Fred

While all of these dueling benchmarks are all fine and good, the question is what will Apple do with these faster G4's: Use them for another round of G4 stuff, or abandon them as fast as possible for G5's? Or both?

The worst that could happen is that Apple will treat them like it treated the G3.Like a red headed step chid.

The consumer items need a massive upgrade right now.the imac should have a 1.25 and 1.42 G4 as we speak,not the pitiful speeds apple is shipping.They are not likely to go G5 till next year.I would like to see duel and single 1.6 G4 xserves.Powerbook should go to 1.4 at least.

tht · July 31, 2003 7:05PM

Quote:

Originally posted by I, Fred

While all of these dueling benchmarks are all fine and good, the question is what will Apple do with these faster G4's: Use them for another round of G4 stuff, or abandon them as fast as possible for G5's? Or both?

I think it really depends on how much a 1.3 GHz 7457 costs compared to a 1.3 GHz 970. If a 1.3 Ghz 7457 is more expensive than a 1.3 Ghz 970, I think the Apple will abandon the G4 as quickly as possible.

If we see a 0.512 MB L2 cache 970 and a 1 MB L2 cache 970 shipped at the same time, then say goodbye to the G4

It's only all good since the existance of the 970 only drives down the price of the 7457 for Apple.

tht · July 31, 2003 7:22PM

Quote:

Originally posted by cuneglasus

We dont know that Skidmark GT wasnt optimized for G5,thats a piece of info no one here is privy to.The 7450 never has achieved the clock per clock integer performance of the old 4 stage design even after "optimisation".Pipline depth is a signifigant issue in real world code.As for Spec,well you need a box of iodized salt to use it.

The Dhrystone MIPS comparison indicates that the 970 is faster than the 7457 on generic integer code, a nontrivial 25% faster. It's a pure CPU benchmark and does not include the effects of system bandwidth. The code exists in cache. The only way the 7457 would beat the 970, given proper code for both processors, is when an instruction mix primarily consists of integer add ops and said mix is properly scheduled so that the 7457 can issue 3 integer ops at a time. If multiply ops are included in the mix, then the 970 will start to perform better than the 7457. If the mix is multply dominated, than the 970 will perform much better.

As far as real life is concerned, your warnings are correct cuneglasus, but mostly because 7450 optimize code will perform terribly on the 970 due to a few instructions that make the 7450-based system perform better, but would cause stalls on the 970. Once those apps are optimized for the 970, the app benchmarks should follow the Dhrystone benchmarks for most integer code.

kenneth · July 31, 2003 7:59PM

any upgrade card for MDD/FW800 owners?

ziploc · August 1, 2003 7:53AM

PowerLogix discountinued all the Dual upgrades including the Dual 800 MHz upgrade for the cubes because of heat issues. And now they are going to do a dual 1.6 GHz, I doubt it.

-zip

cuneglasus · August 1, 2003 1:18PM

Quote:

Originally posted by THT

The Dhrystone MIPS comparison indicates that the 970 is faster than the 7457 on generic integer code, a nontrivial 25% faster. It's a pure CPU benchmark and does not include the effects of system bandwidth. The code exists in cache. The only way the 7457 would beat the 970, given proper code for both processors, is when an instruction mix primarily consists of integer add ops and said mix is properly scheduled so that the 7457 can issue 3 integer ops at a time. If multiply ops are included in the mix, then the 970 will start to perform better than the 7457. If the mix is multply dominated, than the 970 will perform much better.

As far as real life is concerned, your warnings are correct cuneglasus, but mostly because 7450 optimize code will perform terribly on the 970 due to a few instructions that make the 7450-based system perform better, but would cause stalls on the 970. Once those apps are optimized for the 970, the app benchmarks should follow the Dhrystone benchmarks for most integer code.

Here is a warning about mips ratings from arstechnica.

http://www.arstechnica.com/cpu/2q99/benchmarking-1.html

The same warning applies to all synthetic benchmarks like spec cpu,and even Skidmark GT we had been discussing.

I am afraid you are oversimplifying the performance issue by only looking at the functional units and what they excel at (adds or multipies).What is much more important to real performance is how those units get fed and a much longer pipeline has serious drawbacks for clock per clock performance.Some here have been rather dismissive of the issue and I dont know why.Maybe its because Apple used it in its explaination of the megahertz myth so people assume it is not true or overstated but that is not the case.It is a real engineering tradeoff that makes a signifigant difference in real world performance.Its not just a ppc vs. x86 issue,it is the primary reason the p3,pentium m and athelon are so much faster per clock than the p4.So the same will apply when comparing the 970 to the G4.Code that runs in the alu is more varied than the code that runs in the FP and vector units.That code tends to be large and repetative databases of aligned data with few branches so that pipeline depth doesnt have as much of an impact.Some simple integer code can be like that but mostly it is full of branches,averaging about 15% of the code.Sometimes you hear that the branch prediction unit of whatever processor can predict 90%-95% of these branches so everyone assumes that it is not an issue.But even a very small percentage of misprediction can badly hurt the performance of a long pipelined processor.So the idea that the 970 will be faster at real complex integer code,like emulation,doesnt seem likely.It's faster fsb wont help that much because this kind of code is register and cache bound.In fact most code spends 90+ % of its time in the caches.If the 970 can get about 86% of the alu performance of the G4 like skidmark predicts that would be excellent and nothing to worry about.

tht · August 1, 2003 3:55PM

Quote:

Originally posted by cuneglasus

Here is a warning about mips ratings from arstechnica.

http://www.arstechnica.com/cpu/2q99/benchmarking-1.html

The same warning applies to all synthetic benchmarks like spec cpu,and even Skidmark GT we had been discussing.

I am afraid you are oversimplifying the performance issue by only looking at the functional units and what they excel at (adds or multipies).

Oh, I'm very confident in my statements and understand full well what benchmarks mean. Given the proper optimizations for both processors, properly scheduled code et al, on a per clock rate basis, the 7457 will be slightly advantageous in add dominated code, they will be about even in mixed code, the 970 will be faster in multiply dominated code, and 970 machines running 7450 optimized code will perform poorly. This is at least, what the two architectures are telling me.

Quote:

What is much more important to real performance is how those units get fed and a much longer pipeline has serious drawbacks for clock per clock performance.Some here have been rather dismissive of the issue and I dont know why.Maybe its because Apple used it in its explaination of the megahertz myth so people assume it is not true or overstated but that is not the case.It is a real engineering tradeoff that makes a signifigant difference in real world performance.

It's dismissed because branch prediction is effective. Most everyone recognizes the issues you're talking about and address the issue by giving the 970 a better branch prediction unit than the 7457. It should be good enough to mitigate the disadvantages of the longer pipeline depth.

In the future, this issue even goes away when speculative multithreading and predication become common in consumer desktop processors.

Quote:

So the idea that the 970 will be faster at real complex integer code,like emulation,doesnt seem likely.It's faster fsb wont help that much because this kind of code is register and cache bound.

So, you admit that for "easy-to-predict" integer code, the 970 will be competitive on a per clock rate basis?

For code that has a lot of branches, yes, I think we all see that it will penalize long pipeline designs. Now, what percentage of the code an Apple user runs would be such code? What other codes are you thinking about?

cuneglasus · August 1, 2003 8:09PM

Quote:

Originally posted by THT

Oh, I'm very confident in my statements and understand full well what benchmarks mean. Given the proper optimizations for both processors, properly scheduled code et al, on a per clock rate basis, the 7457 will be slightly advantageous in add dominated code, they will be about even in mixed code, the 970 will be faster in multiply dominated code, and 970 machines running 7450 optimized code will perform poorly. This is at least, what the two architectures are telling me.

It's dismissed because branch prediction is effective. Most everyone recognizes the issues you're talking about and address the issue by giving the 970 a better branch prediction unit than the 7457. It should be good enough to mitigate the disadvantages of the longer pipeline depth.

In the future, this issue even goes away when speculative multithreading and predication become common in consumer desktop processors.

So, you admit that for "easy-to-predict" integer code, the 970 will be competitive on a per clock rate basis?

For code that has a lot of branches, yes, I think we all see that it will penalize long pipeline designs. Now, what percentage of the code an Apple user runs would be such code? What other codes are you thinking about?

Well I think I have said all that can be said about branch prediction.Certainly the 970 has far more branch prediction resouces than any other desktop processor...but will it be effective enough to mitigate the pipeline that is over twice as long.Its difficult to say,but I hope so.Remember the P4 has much better branch prediction than the p3,but to no avail.It still cant compete clock per clock with the earlier design even after 2.5 years on the market.Still the 970 has shorter pipelines than the p4 AND much better branch prediction.By comparison the branch history table of the p4 is 4k in size and the 970 uses three 16 k bht's.Still going back to my original coments,I have never said the 970 would be a dog at integer code,just that statements like it will be 1.7 times as fast as a G4 clock per clock was wildly unrealistic.I dont know what you consider competative but if the 970 could pull 86% of the performance of the G4 clock per clock on integer code that certainly sounds competative to me.As for your last question,I guess you mean what part of the code an Apple user uses will have enough branches to visibly affect the performance when useing a longer pipelined design? Most of it.How about the OS for starters.Lots of branches are the norm in real world integer code.Thats why so much is put into branch prediction.This "simple,easy to predict" code is basically only in benchmarks.I am just going to leave it at that,I think I'm getting repetative and I'm sure there is something more interesting for you guys to read.

smalm · August 2, 2003 4:50PM

Quote:

Originally posted by cuneglasus

I am just going to leave it at that,I think I'm getting repetative and I'm sure there is something more interesting for you guys to read.

Checked the forum - no

cuneglasus · August 2, 2003 6:53PM

Quote:

Originally posted by smalM

Checked the forum - no

No actually I got bored there to! Maybe new powerbooks will come out tomorrow.I read it online so it must be true!

martianmatt · August 2, 2003 7:17PM

Quote:

Originally posted by cuneglasus

Remember the P4 has much better branch prediction than the p3,but to no avail.It still cant compete clock per clock with the earlier design even after 2.5 years on the market.Still the 970 has shorter pipelines than the p4 AND much better branch prediction.

I think one thing you are missing by saying the P4 is still not competetive Hz v Hz is that the architechture of the P3 limited it 1GHz at the time that the P4 was released. And since then the P4 has hit 3.2 GHz and the P3 has only gotten to 1.3 GHz. Pentium M makes it higher because of some rejigging of the pipeline (it's longer than the P3). ie, the longer pipeline enables greater performance via more frequency headroom. The same can be said for the 970 vs the G4. The G4 is stuck at 1.4 GHz atm while the 970 easily makes 2 GHz and will likely be at 2.4 GHz by Jan, when the G4 may make it to 1.6 GHz, even though motorola have only acknowledged 1.3 GHz 7457s will be sold.

If motorola can fix any production issues they may have, can put the G4 on 90nm, give it a fast bus like RIO, clock it at 2.5 GHz, and add an extra FPU, all while keeping its pipeline at 7 stages then it will be competetive with the 970 and successors. But then it won't be a G4 anymore.

MM

big mac · August 7, 2003 2:18AM

The fact is, in reference to the previous comment, not only would all of those changes need to be made in order to make the G4 competitive, it would most definitely need a pipeline deepening. The 4 stage 7400/7410 only reached 500MHz. Motorola has been to draw out a good number of MHz from 7 stages, although that's partially artifical (because the 1.42 isn't listed by Mot, as we know.) So let's be kind and say the 74xx is going max out at 1.5GHz -- that's a pretty great 3X return on three stages. Perhaps the 970 could have gotten away with 10 or 13 stages instead of 16, but why limit one's options that much in terms of the 970? The 980 may succeed it after 2.8-3.0GHz, but then the 970 can move to the consumer line and continue scaling by virtue of the fact its pipeline is sufficiently deep.

user tron · August 7, 2003 4:58AM

Quote:

Originally posted by THT

It's dismissed because branch prediction is effective. Most everyone recognizes the issues you're talking about and address the issue by giving the 970 a better branch prediction unit than the 7457. It should be good enough to mitigate the disadvantages of the longer pipeline depth.

In the future, this issue even goes away when speculative multithreading and predication become common in consumer desktop processors.

Well branch prediction may relatively effective but it's still only good guessing! It seems that making the pipe longer gives you more advantages than disadvantages simply because every chip maker goes that route. IMO this rule will be questioned more and more as code get more parallel. Basic problem is that modern cpus have execute sequencial code parallel. Maybe we see a different approach with more shorter pipes instead, where pipes share some execution units. This approach puts more burden on the compilers (and hopefully the programmers) but they "know" the code much better than the cpu. (Isn't it amazing how much effort is put in out of order execution, branch prediction etc. and how little most programmers know about putting their statements in right order to help the compiler and cpu?) I know there's lot of legacy code out there that prevents radical changes but maybe the embedded space will show more variations in VLSI designs.

End of Line

drboar · August 7, 2003 7:18AM

To start picking on the seed of this thread

Quote:

In about six weeks, PowerLogix will follow up with a dual 1.6-GHz processor upgrade card, which will make the Cube faster than Apple's current dual 1.4-GHz Power Macs.

If there is any thruth in the complaints of the G4 bus dual G4 1.6 on a 100 MHz bus ought to be slower than 1.42 G4 on a 167 MHz bus. That is assuming that the increase in L1 cache from 256 to 512K will not offset the bus disadvantage.

The 8.5x bus of the DP 1.42 G is a record for the G4 the current crop of 1GHz G4 runs on a 133 MHz bus 7.5 and the iBook 9x bus just edge out the Dp 1.42 as well as the oldtimer 9600/350 that had a 7x bus ratio. A 16x bus ratio seem a tad problematic. Having two CPUs on the same bus will not help

16x is what you have on a G3/800 in a 7500 with a 50 Mhz bus or a 1GHz G4 in a beige G3 with 66 Mhz bus. Should not dual 1.6 in cubes /G4AGP be as bus starved as a single G4 1.6 in a 7500

??

brunobruin · August 7, 2003 8:19AM

Quote:

Originally posted by Big Mac

Motorola has been to draw out a good number of MHz from 7 stages, although that's partially artifical (because the 1.42 isn't listed by Mot, as we know.)

I believe the 1.42 part IS now listed for sale by Moto. I assumed that the >1GHz chips were not listed before because Apple had exclusive access to them for a while. OWC and Giga are currently shipping 1.4MHz upgrades.

What I'm still trying to work out is these 1.25GHz chips. I thought they (and the 1.42s) were a high-voltage variant that Moto produced for Apple and that they required the massive cooling in the MDD models. But Sonnet and Powerlogix have been shipping 1.25 upgrades for quite a while as well, and with substantially smaller sinks. Now I wonder if these weren't in fact the first low-k chips from Moto. Anyone know?

It's all enough to make your head spin.

tht · August 7, 2003 11:09AM

Quote:

Originally posted by User Tron

Well branch prediction may relatively effective but it's still only good guessing!

Most of the time, that is all that is needed. This is business, not art.

Quote:

It seems that making the pipe longer gives you more advantages than disadvantages simply because every chip maker goes that route.

It has several advantages. A resonant marketing and performance effect as the design is fabbed on more advanced processes. With each new fab process, the clock rate of a long pipelined design goes increasingly higher than a shorter pipelined processor. A higher rate of clock rate improvement means easier marketing combined with better performance. For example, 180 nm Pentium 4 was barely competitive with 180 nm Athlon, but 130 nm Pentium dominated 130 nm Athlon.

Or maybe we should have this thought problem. If the Motorola fabbed a 7400-based CPU at the same time as a 7450-based CPU, which CPU would be more popular? We have somewhat of a related case with the PPC 750 and the 7450. How much faster is a prospective 1 GHz 750fx (none are shipping yet as far as we know) compared to a 1 GHz 7455? The general knowledge is that at the same clock rate, the 750 and 7450 have about the same performance, no? And please note the processes at which these processors are fabbed.

The one big tradeoff is power consumption and yield. Deeper pipelined processors will demand more power and have less CPUs per wafer due to its larger size and higher clock. Maybe power consumption becomes a problem as we hit 90 nm with too much leakage, but problems like that can also be mitigated.

Quote:

IMO this rule will be questioned more and more as code get more parallel. Basic problem is that modern cpus have execute sequencial code parallel. Maybe we see a different approach with more shorter pipes instead, where pipes share some execution units. This approach puts more burden on the compilers (and hopefully the programmers) but they "know" the code much better than the cpu.

This is pretty much the Itanium design. 8 stage pipeline with massage execution resources and heavily dependent on compiler technology. We have yet to find out if Itanium will be successful. After all these years, only some 10,000 Itanium systems have been sold. Itanium does hold the FPU lead, but just imagine what the SpecFP of the 970 would be if it ran at 3 GHz or if the P4 had 2 full FPUs. It'll be close, but guess which chip would be ten times as cheap to produce.

Quote:

(Isn't it amazing how much effort is put in out of order execution, branch prediction etc. and how little most programmers know about putting their statements in right order to help the compiler and cpu?) I know there's lot of legacy code out there that prevents radical changes but maybe the embedded space will show more variations in VLSI designs.

Programmers are lazy.

Or, the economics of software prevents software from being optimized while it is the reverse for hardware. It's simply cheaper to make software faster by using faster hardware, hence, OOOE, BP, and in the future, predication, SMT and speculative SMT, are put into processors while programmers concentrate on more features using higher order languages.

tht · August 7, 2003 11:17AM

Quote:

Originally posted by BrunoBruin

What I'm still trying to work out is these 1.25GHz chips. I thought they (and the 1.42s) were a high-voltage variant that Moto produced for Apple and that they required the massive cooling in the MDD models. But Sonnet and Powerlogix have been shipping 1.25 upgrades for quite a while as well, and with substantially smaller sinks. Now I wonder if these weren't in fact the first low-k chips from Moto. Anyone know?

The 1.42 GHz 7455B is a 1.85V chip, and I think a low-k as well. As far as I know that hasn't changed. The 1.25 GHz chip was probably a 1.85V chip using the same low-k process, but maybe Moto has got enough yield at 1.6V, or even 1.3V, to ship in massive quantities. Apple itself is also using the 1.25 GHz in the PowerMac G4....

Powerlogix 2x1.6 GHz G4's???

Comments