[quote]Originally posted by Outsider:
<strong>Yeah, but isn't there an inherent short-coming to making processors that use longer pipelines (a la the MHz myth animation thing Jobs did last July)? That is, with the shorter pipelines, current G4 chips greatly outpace Intel and AMD chips at similar clock speeds (and even moderately higher clock speeds), because they use much longer pipelines. So if we go in that direction with the G5, aren't we basically shooting ourselves int he foot...creating faster clock speeds at the expense of real-world performance?
This is true... to a point. But the G5 compensates by adding a bigger cache, faster bus, and more execution units and also by processing more per cycle. Look at it this way: you give some you lose some. I think over all the G5 will be at almost parity with the G4 when it comes to MHz for MHz processing power. The G5 will just have the high Mhz advantage.</strong><hr></blockquote>
I disagree -- I think the G5 will have a larger per-cycle advantage over the G4. If you think of a pipeline as making the chip "longer", then increasing the number of execution units (i.e. the number of pipelines) is making it "wider". The speed of the machine is the overall area, which is "length" times "width".
Increasing pipeline length means each stage of the pipeline does less, and thus is simpler. It is easier to make simpler things run at higher clock rates. The downside of longer pipelines is primarily the problem encountered at branches, as described above. If you have 20 stages and you guess a branch wrong, you have to toss out about 18 stages worth of instructions... and then the later stages have to wait for up to 18 cycles before the next instruction reaches them.
Increasing pipeline length can also run into dependency problems. If one instruction needs the results of a previous instruction, but the previous instruction hasn't gotten far enough through the pipeline to produce results, then a wait has to be introduced and you see a "bubble" in the pipe where the pipe stage is idle.
The PowerPC has always had a branch prediction unit as well, by the way. Even on shorter pipes it is useful. The longer your pipelines, the more circuitry its worth throwing at the prediction. Any branch prediction is based on what has happened before, so if your code always tends to branch the same way (looping over lots of data, for example) then you can predict it pretty well. If your decisions are essentially "random", however, the branch prediction doesn't work worth beans. I've seen the P4 fall over real bad on code like this.
Going "wider" (i.e. more execution units) means you try to get more instructions executing in parallel at the same time. The G4 has a pair of simple integer math units, a complex integer math unit, a branch unit, a load/store unit, a floating point unit, and a couple of vector units. The exact breakdown changed in the 7450 to try and counter-act the problems with longer pipes. It was largely successful as long as the code is compiled with the 7450 in mind.
My guess about the G5 is that they are going to throw lots of circuitry at making the chip both wider and longer, as well as optimizing how data moves between areas of the chip. If you have lots of transistors to throw around (and they should have at least double the 7450's), then you can start getting extravagent about where you use them. I'd be surprised to see longer than 10 pipe stages -- many of the Athlon's stages are just decoding the x86 instructions which doesn't take so much effort on the PPC. I hope to see another floating point execution unit or two, more integer units, and better load/store throughput.
Still not betting on when we'll see any of this though.
<strong>Yeah, but isn't there an inherent short-coming to making processors that use longer pipelines (a la the MHz myth animation thing Jobs did last July)? That is, with the shorter pipelines, current G4 chips greatly outpace Intel and AMD chips at similar clock speeds (and even moderately higher clock speeds), because they use much longer pipelines. So if we go in that direction with the G5, aren't we basically shooting ourselves int he foot...creating faster clock speeds at the expense of real-world performance?
This is true... to a point. But the G5 compensates by adding a bigger cache, faster bus, and more execution units and also by processing more per cycle. Look at it this way: you give some you lose some. I think over all the G5 will be at almost parity with the G4 when it comes to MHz for MHz processing power. The G5 will just have the high Mhz advantage.</strong><hr></blockquote>
I disagree -- I think the G5 will have a larger per-cycle advantage over the G4. If you think of a pipeline as making the chip "longer", then increasing the number of execution units (i.e. the number of pipelines) is making it "wider". The speed of the machine is the overall area, which is "length" times "width".
Increasing pipeline length means each stage of the pipeline does less, and thus is simpler. It is easier to make simpler things run at higher clock rates. The downside of longer pipelines is primarily the problem encountered at branches, as described above. If you have 20 stages and you guess a branch wrong, you have to toss out about 18 stages worth of instructions... and then the later stages have to wait for up to 18 cycles before the next instruction reaches them.
Increasing pipeline length can also run into dependency problems. If one instruction needs the results of a previous instruction, but the previous instruction hasn't gotten far enough through the pipeline to produce results, then a wait has to be introduced and you see a "bubble" in the pipe where the pipe stage is idle.
The PowerPC has always had a branch prediction unit as well, by the way. Even on shorter pipes it is useful. The longer your pipelines, the more circuitry its worth throwing at the prediction. Any branch prediction is based on what has happened before, so if your code always tends to branch the same way (looping over lots of data, for example) then you can predict it pretty well. If your decisions are essentially "random", however, the branch prediction doesn't work worth beans. I've seen the P4 fall over real bad on code like this.
Going "wider" (i.e. more execution units) means you try to get more instructions executing in parallel at the same time. The G4 has a pair of simple integer math units, a complex integer math unit, a branch unit, a load/store unit, a floating point unit, and a couple of vector units. The exact breakdown changed in the 7450 to try and counter-act the problems with longer pipes. It was largely successful as long as the code is compiled with the 7450 in mind.
My guess about the G5 is that they are going to throw lots of circuitry at making the chip both wider and longer, as well as optimizing how data moves between areas of the chip. If you have lots of transistors to throw around (and they should have at least double the 7450's), then you can start getting extravagent about where you use them. I'd be surprised to see longer than 10 pipe stages -- many of the Athlon's stages are just decoding the x86 instructions which doesn't take so much effort on the PPC. I hope to see another floating point execution unit or two, more integer units, and better load/store throughput.
Still not betting on when we'll see any of this though.

Providing grist for the rumour mill since 2001.
Providing grist for the rumour mill since 2001.








