7450 recompile = 25% speed boost
Apparently over at xlr8yourmac, there is an article about an RC5 recompile that gives a 25% boost on the 7450 (G4+) processor,
There was alot of big talk when the 7450 was released that apps would need a recompile to make more efficient use of the 7 pipelines, I am wondering if this is the result of such?
But what chance is there of many apps being recompiled when the 7450 hassuch a low overall %'age of the mac market, we can't even get developers to write more altivec.
Maybe the G5's true performance will be hampered by devs not recompiling for its 10 stages?
There was alot of big talk when the 7450 was released that apps would need a recompile to make more efficient use of the 7 pipelines, I am wondering if this is the result of such?
But what chance is there of many apps being recompiled when the 7450 hassuch a low overall %'age of the mac market, we can't even get developers to write more altivec.
Maybe the G5's true performance will be hampered by devs not recompiling for its 10 stages?
Comments
I would think amount of cache on the chip (level 1, 2, 3 ...) would have the most effect. How does the on chip cache change in the 7450?
But, at the end of the day, someone recompiled specifically for the 7450 and got a 25% performance increase, which makes me think that the early G4 733's didn't show the jump we expected over the 533's because the 7400 code was inneficient for the chip.
I've never heard of a compiler caring about the nuber of pipes in the CPU? But I'm no expert.
I would think amount of cache on the chip (level 1, 2, 3 ...) would have the most effect. How does the on chip cache change in the 7450?</strong>
The 7400/7410 has 32 KB of L1 instruction cache and 32 KB of L1 data cache. It's L2 cache was a backside (off-chip) cache of varying sizes up to 2 MB on a 64 bit data bus at varying CPU-to-cache ratios.
The 7450 has 32 KB of L1 instruction cache and 32 KB of L1 data cache. It's L2 cache is an on-die L2 cache with a 256 bit wide data bus. It's L3 cache is a backside (off-chip) cache of up 2 MB on a 64 bit data bus at varying CPU-to-cache ratios. (The 7440 has everything but the L3 cache).
If RC5 code fits inside the on-die L2 cache of the 7450/7440, than it'll run faster.
Another thing is that the 7400/7410 has a vector permute unit and combined vector integer/float/complex unit for its AltiVec implementation. The 7450 has a vector permute unit, a vector integer unit, a vector float unit and a vector complex unit for its AltiVec implementation. Only 2 instructions per cycle could be dispatched to those units in both processors.
The advantage with the 7450 is that it can dispatch a vector integer and float instruction in the same clock cycle, while the 7400 has to do it in two clock cycles to dispatch those two intructions because it computes integer, float and complex instructions with a combined unit while they are independent in the 7450.
Another compiler dependent feature is that the 7450 can fetch and dispatch 1 more instruction per clock cycle than the 7400 along with having 2 more simple integer execution units and pretty much double the resources to keep 2 times as many instructions in flight.
So yes, a bit of assembly magic can increase the speed of certain code on the 7450.
[ 12-14-2001: Message edited by: THT ]</p>
<strong>7450 recompiles aren't to take advantage of deeper pipelines, but to effectively utilize the on chip cache. The 7450 required different microcode to make effective use of the hardware changes. </strong><hr></blockquote>
Microcode? Just what the hell are you talking about? This is a RISC chip -- there is no decoding of the instructions into microcode.
<strong>
Microcode? Just what the hell are you talking about? This is a RISC chip -- there is no decoding of the instructions into microcode.</strong><hr></blockquote>
And what the heck are you talking about? All instructions must be decoded into 'microcode' to be executed by the processor. When you tell the PowerPC processor to 'ADD' two registers, it doesn't just magically happen. There's microcode to fetch the instruction, decode it, add the two registers, and store the result somewhere (and that's a simplification).
The last one I tried was the most recent pre-release. I see they've added a new official release.
It's not pipeline stages or cache. It's all due AltiVec unit's ability to issue more instructions per cycle, I think.
EDIT: I seemed to have missed THT's comments on this matter.
[ 12-14-2001: Message edited by: Eugene ]</p>
<strong>
And what the heck are you talking about? All instructions must be decoded into 'microcode' to be executed by the processor. When you tell the PowerPC processor to 'ADD' two registers, it doesn't just magically happen. There's microcode to fetch the instruction, decode it, add the two registers, and store the result somewhere (and that's a simplification).</strong><hr></blockquote>
Unless I'm missing something, the whole point of RISC was not to have microcode or µops:
<a href="http://www.arstechnica.com/cpu/4q99/risc-cisc/rvc-4.html" target="_blank">http://www.arstechnica.com/cpu/4q99/risc-cisc/rvc-4.html</a>
[quote]
...First, researchers realized that anything that could be done with microcode instructions could be done with small, fast, assembly language instructions. The memory that was being used to store microcode could be just be used to store assembler, so that the need for microcode would be obviated altogether. Therefore many of the instructions on a RISC machine corresponded to microinstructions on a CISC machine.<hr></blockquote>
And some more:
<a href="http://www.ceng.metu.edu.tr/~e106170/instruction.html" target="_blank">http://www.ceng.metu.edu.tr/~e106170/instruction.html</a>
[quote]Thus, RISC machine instructions should be no more complicated than, and execute about as fast as, microinstructions on CISC machines. With simple, one-cycle instructions, there is little or no need for microcode; the machine instructions can be hardwired. Such instructions should execute faster than comparable machine instructions on other machines, since it is not necessary to access a microprogram control store during instruction execution.
<hr></blockquote>
some more:
<a href="http://www.heyrick.co.uk/assembler/riscvcisc.html" target="_blank">http://www.heyrick.co.uk/assembler/riscvcisc.html</a>
[quote]By contrast, the Reduced Instruction Set Computer (RISC) concept is to identify the subcomponents and use those. As these are much simpler, they can be implemented directly in silicon, so will run at the maximum possible speed. Nothing is 'translated'.
<hr></blockquote>
[ 12-15-2001: Message edited by: msp ]
[ 12-15-2001: Message edited by: msp ]</p>
<a href="http://e-www.motorola.com/brdata/PDFDB/docs/AN2203.pdf" target="_blank">Motorola MPC7450 RISC Microprocessor Familiy Software Optimization Guide</a>