IBM: XLC for MacOS X. XLC is a world class C compiler highly tuned for G5!

rickag · August 28, 2003 9:21AM

Quote:

Originally posted by PB

g5

Generates code specific to Apple Power Mac G5 processors.

ppc970

Generates code specific to the IBM PowerPC 970 processor.

ppcv

Generates code for generic PowerPC chips with AltiVec vector engine.

This is the default.

Maybe different target markets, hence, different optimizations.

For me the most interesting line is about the ppcv for GENERIC PowerPC chips, hmmm, hmmm again. G3 w/ SIMD soon to appear. Will the mythical PowerPC on

IBM's roadmap appear.

1+ GHz

Multicore Superscalar

SMP Capable

Integrated SIMD Engine

Rapid I/O

n-way Crossbar CoreConnect

Naw, can't be, too much good news for Apple in one year.

pb · August 28, 2003 9:40AM

Quote:

Originally posted by rickag

Naw, can't be, too much good news for Apple in one year.

My understanding of ppcv from this quote, is that the compiler will automatically generate code for the vector unit of G4 or whatever else has such unit. But, I don't really know...

rickag · August 28, 2003 12:09PM

Quote:

Originally posted by PB

My understanding of ppcv from this quote, is that the compiler will automatically generate code for the vector unit of G4 or whatever else has such unit. But, I don't really know...

Figures, though I'm still interested in the "whatever else" you mention.

moogs · August 28, 2003 12:22PM

You read some of the comments / benchmarks for this at Ars?

OWNAGE. Let's just hope it is released from beta soon and all the major developers use it. The G5 platform is going to absolutely kick ass once the majority of apps are compiled with this thing.

I just hope APPLE uses it!

rickag · August 28, 2003 1:02PM

I think this is what Moogs is refering too.

Quote:

peterh

Smack-Fu Master, in training

Tribus: Atlanta, G

Registered: August 11, 2002

Posts: 114

quote:Originally posted by pbkobold:

Also, another thing that was kind of odd was the difference in binary sizes. I got about 700k for the XLC binary versus just north of 100k for the gcc one. Weird.

XLC better have done a kickass job unrolling some of the loops.

Sounds like a lot of unrolling and inlining.

I just got an e-mail from Dr. Hunter (genX here on Ars) over at NASA Langely, he says that with xlf in his Jet3D bench he gets a huge improvement over the Absoft FORTRAN.

Type of Code\tG4\tG5

Scalar\t70%\t210%\t% improvement

Vector\t40%\t70%\tusing gcc3.3 for the vetorized code

If this holds that places his 2.0GHz G5 (single)=254 at 787 and his Dual G5 = 498 at 1544 MFLOPs. That is just for Scalar. The vector would go from 2755 to 4684 MFLOPs (single) and 5177 to 8801 (dual).

Even the G4 numbers are much better, 129 to 219 (single, scalar) and 1612 to 2258 (single, vector) for a 1.25GHz G4.

I would say that if these numbers hold, some CFD type applications could really benefit from a G5.

P.S. Craig please correct me if my understanding of your numbers is wrong.

210% improvement

at least for Fourtran

bigc · August 28, 2003 1:19PM

Wonder how much the Fortran Compiler is going to cost after it's out of Beta?

jll · August 28, 2003 3:25PM

Quote:

Originally posted by Bigc

Wonder how much the Fortran Compiler is going to cost after it's out of Beta?

The AIX version of the Fortran compiler costs $2,399 and the C++ compiler costs $2,223.

bigc · August 28, 2003 3:45PM

Quote:

Originally posted by JLL

The AIX version of the Fortran compiler costs $2,399 and the C++ compiler costs $2,223.

Well, I hope they give out deals if your not going to use them for commercial sales of software.

henriok · August 28, 2003 5:02PM

Optimizing for G5 might lead to a general optimization for all G5-class porcessors while optimizing for ppc970 just optimizes for that one processor. Today that is the same thing, but in the future that's probably not so.

dabront · August 28, 2003 7:43PM

Anyone good at translating German to English?

http://www.heise.de/newsticker/data/as-28.08.03-000/

I found this link through macnews.net.tc. If it says what I think it does, and the figures are real, then the G5 when running software optimized for it, will most likely be faster than what a lot of people expected. Of course, that depends on Spec benchmarks having any resemblance to real applications.

opuscroakus · August 28, 2003 8:32PM

Quote:

Originally posted by Moogs

You read some of the comments / benchmarks for this at Ars?

OWNAGE. Let's just hope it is released from beta soon and all the major developers use it. The G5 platform is going to absolutely kick ass once the majority of apps are compiled with this thing.

I just hope APPLE uses it!

Will this just be beneficial to apps or will it also help 10.X(s) run better? Isn't this what kind of happened with the P4 when it debuted? Poor performance at first, then it came into its own when some new code came out for it. (Only the 970 is getting it sooner

)

" ...he says that with xlf in his Jet3D bench he gets a huge improvement over the Absoft FORTRAN.

Type of Code G4 G5

Scalar 70% 210% % improvement

Vector 40% 70% using gcc3.3 for the vetorized code

If this holds that places his 2.0GHz G5 (single)=254 at 787 and his Dual G5 = 498 at 1544 MFLOPs. That is just for Scalar. The vector would go from 2755 to 4684 MFLOPs (single) and 5177 to 8801 (dual).

Even the G4 numbers are much better, 129 to 219 (single, scalar) and 1612 to 2258 (single, vector) for a 1.25GHz G4."

Also, what type of apps will benefit from this type of improvement. A/V? DTP? Games?

gspotter · August 29, 2003 4:26AM

Quote:

Originally posted by dabront

Anyone good at translating German to English?

http://www.heise.de/newsticker/data/as-28.08.03-000/

My german is better than my english, but here's a quick translation:

"No comes a big oomph to the Mac-G5-scene: IBM will offer the legendary C/C++ and Fortran Compiler with PowerPC and especially G5 optimization not only for AIX but also for Linux and MacOS X. Beta versions of the compilers (C/C++ 6.0 and Fortran 8.1) for MacOS X are ready to download for a 60 day test.

Comparisons of the compilers with the SPEC-CPU benchmarks show in parts drastically performance improvements compared to gcc/g77 3.3 -- admittedly measured on a Power4. But also the very conservatively estimated SPEC values (937 SPECint2000 and 1051 SPECfp2000 for the 1,8 GHz-PPC970) published by IBM on the first presentation of the PowerPC-970-Processor were already much higher than those mentioned by Steve Jobs on the spectacular G5 introduction. Measured with gcc3.3 and NAG-Fortran, VeriTest only showed 800 SPECint2000 and 840 SPECfp2000 for the PPC 970 with 2 GHz.

Especially the optimizations for the two floating point units available in the Power4 and PPC970 (especially with the Fortran compiler) works obviously much better than with the GNU-fellows. With these two FPUs, the 970 should have an definite architectural advantage in the floating point area compared to the Intel Pentium 4.

Altivec will also be supported for the PPC90 and "generic PowerPC" (switch ppvc). It's still unclear if this includes an automatic vectorization like in the Intel-Compilers. Such a auto vectorization could bring another performance jump. GNU has also defined this technique for Altivec as a project goal.

Interestingly, the IBM compiler differentiates even between G5 and PPC970. Maybe the compiler optimizes with G5 especially to Apples crossbar switch and brings out another bit of optimization."

eugene · September 7, 2003 4:03AM

Back to the LAME example on a 1.8 GHz G5...

Quote:

setenv CC gcc

setenv CFLAGS "-O3 -fstrict-aliasing -fomit-frame-pointer -funroll-loops -finline-functions -mdynamic-no-pic -no-cpp-precomp -mcpu=970 -mtune=970 -faltivec"

10881/10884 (100%)| 2:06/ 2:06| 2:12/ 2:12| 2.2452x| 0:00

average: 157.2 kbps LR: 4409 (40.51%) MS: 6475 (59.49%)

Quote:

setenv CC xlc

setenv CFLAGS "-O5 -qtune=g5 -qarch=g5 -qnopic -qunroll -qnounwind -qinline -qnoeh -qaltivec"

10881/10884 (100%)| 1:20/ 1:20| 1:24/ 1:24| 3.5340x| 0:00

average: 157.2 kbps LR: 4409 (40.51%) MS: 6475 (59.49%)

The best I can do on a 1 GHz G4 is 1.4x with basically the same GCC settings as above. XLC makes LAME fly on a G5 in comparison. The opposite is true for G4s. GCC produced the fastest encoding LAME binary for my 1 GHz G4. Obviously XLC can't be expected to optimize well for the G4...since IBM never produced one...

IBM: XLC for MacOS X. XLC is a world class C compiler highly tuned for G5!

Comments