Sorry to harass the UNIX masters again, but do I need the IBM complier to run these? When I try to (and I don't think I need to "chmod 755" since it looks like the files are executable) with "./" i get the message:
Quote:
dyld: ./utf-g5-O5 can't open library: /opt/ibmcmp/lib/libxlsmp.dylib (No such file or directory, errno = 2)
Trace/BPT trap
C'mon, let's see if we can get a G5 down to 9 secs !
Sorry to harass the UNIX masters again, but do I need the IBM complier to run these? When I try to (and I don't think I need to "chmod 755" since it looks like the files are executable) with "./" i get the message:
C'mon, let's see if we can get a G5 down to 9 secs !
I booted back into Jagwyre to be able to use the IBM compiler, since it requires gcc 3.3 (or so it says). Since the binary runs on my G4, (and I specified a 970 cross-compile), I am suspecting that the 64-bit code isn't in there.
I should be able to boot back into Panther and take a look with Shark.
Hmm, here at work I ran this code using Visual C++'s compilier.
I'm getting: 14 sec.
Seems too good for a Thinkpad PIII 800MHz, running Win98, 192MB RAM.
I think I might have shortend the math by changing the "long long" variables to just "long". I had to do that because VC++ told me that "long long" was illegal.
I think I might have shortend the math by changing the "long long" variables to just "long". I had to do that because VC++ told me that "long long" was illegal.
Yes, that will speed it up, but it won't accurately calculate the factorial, as the result's too large for a 32-bit "long"...
This is a great thread but the more I read the more I am convinced that the G5 isn't performing anywhere near it potential. Something isn't quite right with the way it is running applications it must be either the compiler or the OS or both.
I am sure that when Panther is released we are going to see a series of performance boosts that are going to move the G5's further from the G4's. I know that Panther is approaching GM, but from what we can see I believe that there are going to be a LOT of point increases just to help the G5. I expect that all the applications we use will also need a re-compile to take advantage of those optomisations and that means it is likely to be a couple of years before the G5 is going to be really on song for most of us.
I booted back into Jagwyre to be able to use the IBM compiler, since it requires gcc 3.3 (or so it says). Since the binary runs on my G4, (and I specified a 970 cross-compile), I am suspecting that the 64-bit code isn't in there.
I should be able to boot back into Panther and take a look with Shark.
All of this excitement for a lousy 9 seconds ! Of course, I really hope that someone can pull off a G5-optimised version. I'm very curious if it will run faster.
Knowing nothing about the technical side of things, maybe this all suggests that optimising for the G5 isn't all that easy or reliable as of yet.
This is a great thread but the more I read the more I am convinced that the G5 isn't performing anywhere near it potential. Something isn't quite right with the way it is running applications it must be either the compiler or the OS or both.
This is a VERY basic benchmark. It only tests how quickly the CPU can go through a loop and do some basic math. In the real world, the ability to transfer info from main memory to the CPU is the real bottleneck, and the G5 will kick rear in that area. Besides, the G5 will scale much better than the G4 in terms of CPU speed.
Sorry to harass the UNIX masters again, but do I need the IBM complier to run these? When I try to (and I don't think I need to "chmod 755" since it looks like the files are executable) with "./" i get the message
Comments
Originally posted by hawkman
400Mhz B+W G3
IBM XL C Version 6.0, XL C++ Version 6.0 for Mac OS X, Beta
[xxx-Computer:~/desktop] hawk% xlc -O5 -o unthreaded_factorial unthreaded_factorial.c
[xxx-Computer:~/desktop] hawk% ./unthreaded_factorial
Start: 1063170913 End: 1063170958
i= 50000001
Time=45
That's looking good.
Downloading the IBM compiler now.
http://www.eugenechan.com/~ceugene/g5factorial.tgz
Originally posted by Eugene
G5 binaries built with XLC (-O5 -qtune=g5 -qarch=g5):
http://www.eugenechan.com/~ceugene/g5factorial.tgz
[localhost:~] lundy% /Users/lundy/Desktop/g5factorial/tf-g5-O5
Creating Thread Number: 0
Creating Thread Number: 1
Loop Done; Time=9 secs for thread#:1, Loops=25000000
Loop Done; Time=9 secs for thread#:0, Loops=25000000
It doesn't crash on my G4, with -O3 or -O5.
I found no compiler option in the reference manual pages of the XLC for 64-bitness. Enabling -qlonglong makes no difference either.
Hmm.
Originally posted by Eugene
G5 binaries built with XLC (-O5 -qtune=g5 -qarch=g5):
http://www.eugenechan.com/~ceugene/g5factorial.tgz
Sorry to harass the UNIX masters again, but do I need the IBM complier to run these? When I try to (and I don't think I need to "chmod 755" since it looks like the files are executable) with "./" i get the message:
dyld: ./utf-g5-O5 can't open library: /opt/ibmcmp/lib/libxlsmp.dylib (No such file or directory, errno = 2)
Trace/BPT trap
C'mon, let's see if we can get a G5 down to 9 secs !
The IBM XLC manual says that "cc" is an alternate invocation command of "xlc"
How do we know if we are invoking the IBM XLC or the GNU compiler when both are resident on the same computer?
Originally posted by Gabid
Sorry to harass the UNIX masters again, but do I need the IBM complier to run these? When I try to (and I don't think I need to "chmod 755" since it looks like the files are executable) with "./" i get the message:
C'mon, let's see if we can get a G5 down to 9 secs !
I booted back into Jagwyre to be able to use the IBM compiler, since it requires gcc 3.3 (or so it says). Since the binary runs on my G4, (and I specified a 970 cross-compile), I am suspecting that the 64-bit code isn't in there.
I should be able to boot back into Panther and take a look with Shark.
I'm getting: 14 sec.
Seems too good for a Thinkpad PIII 800MHz, running Win98, 192MB RAM.
I think I might have shortend the math by changing the "long long" variables to just "long". I had to do that because VC++ told me that "long long" was illegal.
That has to be it. Right?
Originally posted by Transcendental Octothorpe
I think I might have shortend the math by changing the "long long" variables to just "long". I had to do that because VC++ told me that "long long" was illegal.
Yes, that will speed it up, but it won't accurately calculate the factorial, as the result's too large for a 32-bit "long"...
-- Mark
Anyone know how to create a 64-bit integer in VC++?
I am sure that when Panther is released we are going to see a series of performance boosts that are going to move the G5's further from the G4's. I know that Panther is approaching GM, but from what we can see I believe that there are going to be a LOT of point increases just to help the G5. I expect that all the applications we use will also need a re-compile to take advantage of those optomisations and that means it is likely to be a couple of years before the G5 is going to be really on song for most of us.
Originally posted by lundy
I booted back into Jagwyre to be able to use the IBM compiler, since it requires gcc 3.3 (or so it says). Since the binary runs on my G4, (and I specified a 970 cross-compile), I am suspecting that the 64-bit code isn't in there.
I should be able to boot back into Panther and take a look with Shark.
All of this excitement for a lousy 9 seconds ! Of course, I really hope that someone can pull off a G5-optimised version. I'm very curious if it will run faster.
Knowing nothing about the technical side of things, maybe this all suggests that optimising for the G5 isn't all that easy or reliable as of yet.
Originally posted by Transcendental Octothorpe
Soooo...
Anyone know how to create a 64-bit integer in VC++?
__int64 is the name of the type, yes that is two underscores before int64. Welcome to hell.
Originally posted by Addison
This is a great thread but the more I read the more I am convinced that the G5 isn't performing anywhere near it potential. Something isn't quite right with the way it is running applications it must be either the compiler or the OS or both.
This is a VERY basic benchmark. It only tests how quickly the CPU can go through a loop and do some basic math. In the real world, the ability to transfer info from main memory to the CPU is the real bottleneck, and the G5 will kick rear in that area. Besides, the G5 will scale much better than the G4 in terms of CPU speed.
Besides, we don't have any dual G5 scores yet.
Originally posted by Addison
This is a great thread but the more I read the more I am convinced that the G5 isn't performing anywhere near it potential.
Why do you say that? It's faster per-cycle than the G4 on this very simple benchmark, which doesn't take advantage of its greatly improved bandwidth.
Originally posted by Yevgeny
__int64 is the name of the type, yes that is two underscores before int64. Welcome to hell.
Ahh, thank you, kind sir.
With that change, I am now getting 32 seconds.
Which still seems pretty good, for an 800MHz PIII.
Originally posted by Gabid
Sorry to harass the UNIX masters again, but do I need the IBM complier to run these? When I try to (and I don't think I need to "chmod 755" since it looks like the files are executable) with "./" i get the message
Sorry, should be fixed now.
Originally posted by Eugene
Sorry, should be fixed now.
Thank you. I'll be home in an hour and a half or so and I'll try running it then.
Originally posted by Transcendental Octothorpe
Ahh, thank you, kind sir.
With that change, I am now getting 32 seconds.
Which still seems pretty good, for an 800MHz PIII.
Actually, that is kind of fast considering that the 2.8GHz P4 was finishing in something like 20 seconds.
Originally posted by mark_wilkins
Yes, that will speed it up, but it won't accurately calculate the factorial, as the result's too large for a 32-bit "long"...
-- Mark
Yep. That's why it was written for a long long result and argument.
16! = 20922789888000
2 ^ 32 = 4294967296
2 ^ 64 = 18446744073709550000
OK, now with Shark I looked at the assembly code that was generated by Xcode (gcc 3.3, -mcpu=970 -mtune=970 -O3 -mpowerpc64)
factorial.c statement 24, "return (value * factorial (value - 1))":
0x1c84subic r4,r31,1
3.3%0x1c88addme r3,r30
0x1c8cbl $-56 <factorial>
10.9%0x1c90mullw r5,r31,r3
10.9%0x1c94mulhwu r0,r31,r4
5.3%0x1c98mullw r3,r4,r30
3.7%0x1c9cmullw r10,r31,r4
0.0%0x1ca0add r4,r5,r0
0x1ca4add r9,r4,r3
5.1%0x1ca8mr r3,r9
0x1cacmr r4,r10
The "mullw" opcode is Multiply Low Word, a 32-bit multiply!!
There's a "-fast" switch documented in yesterday's Updater note. I'm off to try that.
Ahh, thank you, kind sir.
Daarrr! I keep forgetting that long is the same size as int on x86 (4 bytes).
Using __int 64 I get a much more respectable 20s from my Duron 1300, still using Visual Studio 98.
Which seems a bit too fast.