Generates code specific to Apple Power Mac G5 processors.
ppc970
Generates code specific to the IBM PowerPC 970 processor.
ppcv
Generates code for generic PowerPC chips with AltiVec vector engine.
This is the default.
Maybe different target markets, hence, different optimizations.
For me the most interesting line is about the ppcv for GENERIC PowerPC chips, hmmm, hmmm again. G3 w/ SIMD soon to appear. Will the mythical PowerPC on
Naw, can't be, too much good news for Apple in one year.
My understanding of ppcv from this quote, is that the compiler will automatically generate code for the vector unit of G4 or whatever else has such unit. But, I don't really know...
My understanding of ppcv from this quote, is that the compiler will automatically generate code for the vector unit of G4 or whatever else has such unit. But, I don't really know...
Figures, though I'm still interested in the "whatever else" you mention.
You read some of the comments / benchmarks for this at Ars?
OWNAGE. Let's just hope it is released from beta soon and all the major developers use it. The G5 platform is going to absolutely kick ass once the majority of apps are compiled with this thing.
Also, another thing that was kind of odd was the difference in binary sizes. I got about 700k for the XLC binary versus just north of 100k for the gcc one. Weird.
XLC better have done a kickass job unrolling some of the loops.
Sounds like a lot of unrolling and inlining.
I just got an e-mail from Dr. Hunter (genX here on Ars) over at NASA Langely, he says that with xlf in his Jet3D bench he gets a huge improvement over the Absoft FORTRAN.
Type of Code\tG4\tG5
Scalar\t70%\t210%\t% improvement
Vector\t40%\t70%\tusing gcc3.3 for the vetorized code
If this holds that places his 2.0GHz G5 (single)=254 at 787 and his Dual G5 = 498 at 1544 MFLOPs. That is just for Scalar. The vector would go from 2755 to 4684 MFLOPs (single) and 5177 to 8801 (dual).
Even the G4 numbers are much better, 129 to 219 (single, scalar) and 1612 to 2258 (single, vector) for a 1.25GHz G4.
I would say that if these numbers hold, some CFD type applications could really benefit from a G5.
P.S. Craig please correct me if my understanding of your numbers is wrong.
Optimizing for G5 might lead to a general optimization for all G5-class porcessors while optimizing for ppc970 just optimizes for that one processor. Today that is the same thing, but in the future that's probably not so.
I found this link through macnews.net.tc. If it says what I think it does, and the figures are real, then the G5 when running software optimized for it, will most likely be faster than what a lot of people expected. Of course, that depends on Spec benchmarks having any resemblance to real applications.
You read some of the comments / benchmarks for this at Ars?
OWNAGE. Let's just hope it is released from beta soon and all the major developers use it. The G5 platform is going to absolutely kick ass once the majority of apps are compiled with this thing.
I just hope APPLE uses it!
Will this just be beneficial to apps or will it also help 10.X(s) run better? Isn't this what kind of happened with the P4 when it debuted? Poor performance at first, then it came into its own when some new code came out for it. (Only the 970 is getting it sooner )
" ...he says that with xlf in his Jet3D bench he gets a huge improvement over the Absoft FORTRAN.
Type of Code G4 G5
Scalar 70% 210% % improvement
Vector 40% 70% using gcc3.3 for the vetorized code
If this holds that places his 2.0GHz G5 (single)=254 at 787 and his Dual G5 = 498 at 1544 MFLOPs. That is just for Scalar. The vector would go from 2755 to 4684 MFLOPs (single) and 5177 to 8801 (dual).
Even the G4 numbers are much better, 129 to 219 (single, scalar) and 1612 to 2258 (single, vector) for a 1.25GHz G4."
Also, what type of apps will benefit from this type of improvement. A/V? DTP? Games?
My german is better than my english, but here's a quick translation:
"No comes a big oomph to the Mac-G5-scene: IBM will offer the legendary C/C++ and Fortran Compiler with PowerPC and especially G5 optimization not only for AIX but also for Linux and MacOS X. Beta versions of the compilers (C/C++ 6.0 and Fortran 8.1) for MacOS X are ready to download for a 60 day test.
Comparisons of the compilers with the SPEC-CPU benchmarks show in parts drastically performance improvements compared to gcc/g77 3.3 -- admittedly measured on a Power4. But also the very conservatively estimated SPEC values (937 SPECint2000 and 1051 SPECfp2000 for the 1,8 GHz-PPC970) published by IBM on the first presentation of the PowerPC-970-Processor were already much higher than those mentioned by Steve Jobs on the spectacular G5 introduction. Measured with gcc3.3 and NAG-Fortran, VeriTest only showed 800 SPECint2000 and 840 SPECfp2000 for the PPC 970 with 2 GHz.
Especially the optimizations for the two floating point units available in the Power4 and PPC970 (especially with the Fortran compiler) works obviously much better than with the GNU-fellows. With these two FPUs, the 970 should have an definite architectural advantage in the floating point area compared to the Intel Pentium 4.
Altivec will also be supported for the PPC90 and "generic PowerPC" (switch ppvc). It's still unclear if this includes an automatic vectorization like in the Intel-Compilers. Such a auto vectorization could bring another performance jump. GNU has also defined this technique for Altivec as a project goal.
Interestingly, the IBM compiler differentiates even between G5 and PPC970. Maybe the compiler optimizes with G5 especially to Apples crossbar switch and brings out another bit of optimization."
The best I can do on a 1 GHz G4 is 1.4x with basically the same GCC settings as above. XLC makes LAME fly on a G5 in comparison. The opposite is true for G4s. GCC produced the fastest encoding LAME binary for my 1 GHz G4. Obviously XLC can't be expected to optimize well for the G4...since IBM never produced one...
Comments
Originally posted by PB
g5
Generates code specific to Apple Power Mac G5 processors.
ppc970
Generates code specific to the IBM PowerPC 970 processor.
ppcv
Generates code for generic PowerPC chips with AltiVec vector engine.
This is the default.
Maybe different target markets, hence, different optimizations.
For me the most interesting line is about the ppcv for GENERIC PowerPC chips, hmmm, hmmm again. G3 w/ SIMD soon to appear. Will the mythical PowerPC on
IBM's roadmap appear.
1+ GHz
Multicore Superscalar
SMP Capable
Integrated SIMD Engine
Rapid I/O
n-way Crossbar CoreConnect
Naw, can't be, too much good news for Apple in one year.
Originally posted by rickag
Naw, can't be, too much good news for Apple in one year.
My understanding of ppcv from this quote, is that the compiler will automatically generate code for the vector unit of G4 or whatever else has such unit. But, I don't really know...
Originally posted by PB
My understanding of ppcv from this quote, is that the compiler will automatically generate code for the vector unit of G4 or whatever else has such unit. But, I don't really know...
Figures, though I'm still interested in the "whatever else" you mention.
OWNAGE. Let's just hope it is released from beta soon and all the major developers use it. The G5 platform is going to absolutely kick ass once the majority of apps are compiled with this thing.
I just hope APPLE uses it!
peterh
Smack-Fu Master, in training
Tribus: Atlanta, G
Registered: August 11, 2002
Posts: 114
quote:Originally posted by pbkobold:
Also, another thing that was kind of odd was the difference in binary sizes. I got about 700k for the XLC binary versus just north of 100k for the gcc one. Weird.
XLC better have done a kickass job unrolling some of the loops.
Sounds like a lot of unrolling and inlining.
I just got an e-mail from Dr. Hunter (genX here on Ars) over at NASA Langely, he says that with xlf in his Jet3D bench he gets a huge improvement over the Absoft FORTRAN.
Type of Code\tG4\tG5
Scalar\t70%\t210%\t% improvement
Vector\t40%\t70%\tusing gcc3.3 for the vetorized code
If this holds that places his 2.0GHz G5 (single)=254 at 787 and his Dual G5 = 498 at 1544 MFLOPs. That is just for Scalar. The vector would go from 2755 to 4684 MFLOPs (single) and 5177 to 8801 (dual).
Even the G4 numbers are much better, 129 to 219 (single, scalar) and 1612 to 2258 (single, vector) for a 1.25GHz G4.
I would say that if these numbers hold, some CFD type applications could really benefit from a G5.
P.S. Craig please correct me if my understanding of your numbers is wrong.
210% improvement
at least for Fourtran
Originally posted by Bigc
Wonder how much the Fortran Compiler is going to cost after it's out of Beta?
The AIX version of the Fortran compiler costs $2,399 and the C++ compiler costs $2,223.
Originally posted by JLL
The AIX version of the Fortran compiler costs $2,399 and the C++ compiler costs $2,223.
Well, I hope they give out deals if your not going to use them for commercial sales of software.
http://www.heise.de/newsticker/data/as-28.08.03-000/
I found this link through macnews.net.tc. If it says what I think it does, and the figures are real, then the G5 when running software optimized for it, will most likely be faster than what a lot of people expected. Of course, that depends on Spec benchmarks having any resemblance to real applications.
Originally posted by Moogs
You read some of the comments / benchmarks for this at Ars?
OWNAGE. Let's just hope it is released from beta soon and all the major developers use it. The G5 platform is going to absolutely kick ass once the majority of apps are compiled with this thing.
I just hope APPLE uses it!
Will this just be beneficial to apps or will it also help 10.X(s) run better? Isn't this what kind of happened with the P4 when it debuted? Poor performance at first, then it came into its own when some new code came out for it. (Only the 970 is getting it sooner )
" ...he says that with xlf in his Jet3D bench he gets a huge improvement over the Absoft FORTRAN.
Type of Code G4 G5
Scalar 70% 210% % improvement
Vector 40% 70% using gcc3.3 for the vetorized code
If this holds that places his 2.0GHz G5 (single)=254 at 787 and his Dual G5 = 498 at 1544 MFLOPs. That is just for Scalar. The vector would go from 2755 to 4684 MFLOPs (single) and 5177 to 8801 (dual).
Even the G4 numbers are much better, 129 to 219 (single, scalar) and 1612 to 2258 (single, vector) for a 1.25GHz G4."
Also, what type of apps will benefit from this type of improvement. A/V? DTP? Games?
Originally posted by dabront
Anyone good at translating German to English?
http://www.heise.de/newsticker/data/as-28.08.03-000/
My german is better than my english, but here's a quick translation:
"No comes a big oomph to the Mac-G5-scene: IBM will offer the legendary C/C++ and Fortran Compiler with PowerPC and especially G5 optimization not only for AIX but also for Linux and MacOS X. Beta versions of the compilers (C/C++ 6.0 and Fortran 8.1) for MacOS X are ready to download for a 60 day test.
Comparisons of the compilers with the SPEC-CPU benchmarks show in parts drastically performance improvements compared to gcc/g77 3.3 -- admittedly measured on a Power4. But also the very conservatively estimated SPEC values (937 SPECint2000 and 1051 SPECfp2000 for the 1,8 GHz-PPC970) published by IBM on the first presentation of the PowerPC-970-Processor were already much higher than those mentioned by Steve Jobs on the spectacular G5 introduction. Measured with gcc3.3 and NAG-Fortran, VeriTest only showed 800 SPECint2000 and 840 SPECfp2000 for the PPC 970 with 2 GHz.
Especially the optimizations for the two floating point units available in the Power4 and PPC970 (especially with the Fortran compiler) works obviously much better than with the GNU-fellows. With these two FPUs, the 970 should have an definite architectural advantage in the floating point area compared to the Intel Pentium 4.
Altivec will also be supported for the PPC90 and "generic PowerPC" (switch ppvc). It's still unclear if this includes an automatic vectorization like in the Intel-Compilers. Such a auto vectorization could bring another performance jump. GNU has also defined this technique for Altivec as a project goal.
Interestingly, the IBM compiler differentiates even between G5 and PPC970. Maybe the compiler optimizes with G5 especially to Apples crossbar switch and brings out another bit of optimization."
setenv CC gcc
setenv CFLAGS "-O3 -fstrict-aliasing -fomit-frame-pointer -funroll-loops -finline-functions -mdynamic-no-pic -no-cpp-precomp -mcpu=970 -mtune=970 -faltivec"
10881/10884 (100%)| 2:06/ 2:06| 2:12/ 2:12| 2.2452x| 0:00
average: 157.2 kbps LR: 4409 (40.51%) MS: 6475 (59.49%)
setenv CC xlc
setenv CFLAGS "-O5 -qtune=g5 -qarch=g5 -qnopic -qunroll -qnounwind -qinline -qnoeh -qaltivec"
10881/10884 (100%)| 1:20/ 1:20| 1:24/ 1:24| 3.5340x| 0:00
average: 157.2 kbps LR: 4409 (40.51%) MS: 6475 (59.49%)
The best I can do on a 1 GHz G4 is 1.4x with basically the same GCC settings as above. XLC makes LAME fly on a G5 in comparison. The opposite is true for G4s. GCC produced the fastest encoding LAME binary for my 1 GHz G4. Obviously XLC can't be expected to optimize well for the G4...since IBM never produced one...