Duron 1300, running Win2K, using the awesome Visual Studio 98 compiler: 37s
PowerBook 12", gcc 3.1: 26s
PowerBook 12", gcc 3.3: 25s
Illustrating that the Visual Studio 98 compiler is not very quick, even when set to fastest. Anyone know where I can find a faster, free, Windows compiler ?
Hmm this is interesting, I just ran it on a G4 350 upgraded to a 1.2GHz G4 via sonnet and a 120GB ATA133 on a ata133 card via PCI slot. It actually ran FASTER than the MDD 1.25 we have listed in this thread
Well, top on the Mac shows only one process, which it is (but the process has two threads), and shows the processor usage from 0% to 200% on a dualie.
Fixing the sequential thing I'm not really sure what is going on. Let's see what the 4-thread results are. When looking at top, note what it says under the "#TH" (threads) column.
4-thread results:
Code:
./threaded_factorial
Creating Thread Number: 0
Creating Thread Number: 1
Creating Thread Number: 2
Creating Thread Number: 3
Loop Done; Time=73 secs for thread#:1, Loops=12500000
Loop Done; Time=77 secs for thread#:2, Loops=12500000
Loop Done; Time=78 secs for thread#:0, Loops=12500000
Loop Done; Time=57 secs for thread#:3, Loops=12500000
Top does show 5 threads. The first three (1,2,0) seemed to end at the same time; then, there was a long delay before the last one. Odd.
Some more G5 results (again, from a 1.8 GHz, 512 MB, processor set to highest).
Now, this morning, with the first file (unthreaded_factorial) posted I got:
Quote:
Start: 1063156763 End: 1063156773
i= 50000001
Time=10
Now, this means 10 secs, right?
So I got home and followed the instructions for the binaries set up the G5 one (threaded_g5). I used "chmod 755", typed "cd" to go to my Desktop where the files are and ran it via the terminal with ./ (which I'm spelling out just in case I made any obvious mistakes I need corrected ). I got:
Quote:
Creating Thread Number: 0
Creating Thread Number: 1
Loop Done; Time=25 secs for thread#:0, Loops=25000000
Loop Done; Time=25 secs for thread#:1, Loops=25000000
So is this actually SLOWER as it looks? Why would that be?
For comparison I tried to set up the other 2 binaries, but when I try to run them, I'm told "Permision denied". Huh? I ran "chmod 755" after I did "cd desktop", but that shouldn't matter, should it? Again, I know nothing about UNIX so all help would be helpful .
Hmm this is interesting, I just ran it on a G4 350 upgraded to a 1.2GHz G4 via sonnet and a 120GB ATA133 on a ata133 card via PCI slot. It actually ran FASTER than the MDD 1.25 we have listed in this thread
It got 18 seconds.
Hey! That's mine!
Who knows? Both have cache running @ processor speed (and my assumption is that the benchmark will run entirely in cache). I wasn't running anything else in the background, either.
The only thing I can think of is the version of cc I'm running? cc --version shows:
cc (GCC) 3.1 20020420 (prerelease)
Copyright (C) 2002 Free Software Foundation, Inc.
In any case, enjoy! Your upgraded G4 beat a single-processor Xeon 2.8GHz system in this benchmark!
Who knows? Both have cache running @ processor speed (and my assumption is that the benchmark will run entirely in cache). I wasn't running anything else in the background, either.
The only thing I can think of is the version of cc I'm running? cc --version shows:
cc (GCC) 3.1 20020420 (prerelease)
Copyright (C) 2002 Free Software Foundation, Inc.
In any case, enjoy! Your upgraded G4 beat a single-processor Xeon 2.8GHz system in this benchmark!
-John
Addendum: Sure enough, running a binary compiled with GCC 3.3 (from page 1 of this thread) gave me a result of 18 seconds. Whew!
Some more G5 results (again, from a 1.8 GHz, 512 MB, processor set to highest).
So is this actually SLOWER as it looks? Why would that be?
OK, that's the first Mac anomalous result. Not having a G5, I can't run it, but I will take a look at the file if Shark will do it. At least we know something is wrong with the compile, because it's the only thing changed.
EDIT: As far as the permissions, do an "ls -l" on the directory that the files are in and post the output.
EDIT: As far as the permissions, do an "ls -l" on the directory that the files are in and post the output. [/B]
I think I see the problem now, though I have no idea how to fix it: the files I can run have as permissions: "-rwxr-xr-x" while the ones I can't have "-rw-r--r--".
OK, that's the first Mac anomalous result. Not having a G5, I can't run it, but I will take a look at the file if Shark will do it. At least we know something is wrong with the compile, because it's the only thing changed.
Yeah, I thought my numbers looked strange. I wonder what things will look like if you're able to fix things?
I think I see the problem now, though I have no idea how to fix it: the files I can run have as permissions: "-rwxr-xr-x" while the ones I can't have "-rw-r--r--".
Somehow your permissions didn't get to those files - there is no "x" (for execute) in the first group (user).
Do the chmod 755 <filename> again. In Terminal, type 'chmod 755 ' (note the trailing space) and then drag the file from the Finder into the Terminal window and hit Return.
Somehow your permissions didn't get to those files - there is no "x" (for execute) in the first group (user).
Do the chmod 755 <filename> again. In Terminal, type 'chmod 755 ' (note the trailing space) and then drag the file from the Finder into the Terminal window and hit Return.
Oops! That was it. So now I've been able to run all four binaries, with the unthreaded still being the fastest (should this be the case?).
(In order of speed, from slowest to fastest. Note that a couple of these results were about 1 sec slower until I ran them a second [and subsequent] times)
"threaded_no-opim":
Creating Thread Number: 0
Creating Thread Number: 1
Loop Done; Time=44 secs for thread#:0, Loops=25000000
Loop Done; Time=45 secs for thread#:1, Loops=25000000
"threaded_g5":
Creating Thread Number: 0
Creating Thread Number: 1
Loop Done; Time=25 secs for thread#:1, Loops=25000000
Loop Done; Time=25 secs for thread#:0, Loops=25000000
"threaded_std":
Creating Thread Number: 0
Creating Thread Number: 1
Loop Done; Time=10 secs for thread#:1, Loops=25000000
Loop Done; Time=10 secs for thread#:0, Loops=25000000
"unthreaded_factorial":
Start: 1063167093 End: 1063167103
i= 50000001
Time=10
There you go: discuss and explain away! This non-techie loves learning what all these numbers mean (though I can appreicate the fun of factorials; back in high school my friends and I used to see whose calculator could run 69! the fastest, since that was the biggest number they could handle).
Another thing thing of note is that if I had any other apps besides the Terminal running the times would be longer, but the G5 seemed to be affected the most (going as high as 31 or 32 secs per thread).
Oops! That was it. So now I've been able to run all four binaries, with the unthreaded still being the fastest (should this be the case?).
(In order of speed, from slowest to fastest. Note that a couple of these results were about 1 sec slower until I ran them a second [and subsequent] times)
"threaded_no-opim":
Loop Done; Time=44 secs for thread#:0, Loops=25000000
Loop Done; Time=45 secs for thread#:1, Loops=25000000
"threaded_g5":
Loop Done; Time=25 secs for thread#:1, Loops=25000000
Loop Done; Time=25 secs for thread#:0, Loops=25000000
"threaded_std":
Loop Done; Time=10 secs for thread#:1, Loops=25000000
Loop Done; Time=10 secs for thread#:0, Loops=25000000
"unthreaded_factorial":
Time=10
There you go: discuss and explain away! This non-techie loves learning what all these numbers mean (though I can appreicate the fun of factorials; back in high school my friends and I used to see whose calculator could run 69! the fastest, since that was the biggest number they could handle).
Another thing thing of note is that if I had any other apps besides the Terminal running the times would be longer, but the G5 seemed to be affected the most (going as high as 31 or 32 secs per thread).
OK - maybe it's not as bizarre as we thought.
First, the threading should not help your times - you have a single-processor machine. So it is not surprising that the unthreaded result of 10 seconds is the same as the threaded_std result (I think the threaded_std compile was with optimization set to -O3, the setting for speed at the expense of memory use).
Also, with the exception of the "G5-optimized" code, we are seeing that your 1.8 single G5 runs the test at basically the same speed as a 1.0 dual G4. This means to me that the integer non-Altivec arithmetic is about the same between the G4 and the G5. Where the G5 will kick ass on the G4 is on real-world apps where memory has to be loaded and processed.
The G5 optimized run still isn't explained. I think either Apple is wrong about the settings, or there might be a bug in the new G5 code-generation part of the compiler.
What we should do is compile for G5 with IBM's compiler. Eugene got 9 seconds on his run.
Anybody know the IBM link? I can do a cross-compile for the G5 and upload the executable.
Comments
Hands down the winner.
And yes, I am just saying that because I am a programmer.
Duron 1300, running Win2K, using the awesome Visual Studio 98 compiler: 37s
PowerBook 12", gcc 3.1: 26s
PowerBook 12", gcc 3.3: 25s
Illustrating that the Visual Studio 98 compiler is not very quick, even when set to fastest. Anyone know where I can find a faster, free, Windows compiler ?
unthreaded, no optimization: 32
unthreaded, -O3: 17
threaded, no optimization: 16
threaded, -O3: 9
Originally posted by 3.1416
Xserve, dual [email protected], gcc 3.1:
unthreaded, no optimization: 32
unthreaded, -O3: 17
threaded, no optimization: 16
threaded, -O3: 9
Dual Xserve 1.33 with multithreading and -O3 is lowest result so far!
It got 18 seconds.
Originally posted by lundy
Well, top on the Mac shows only one process, which it is (but the process has two threads), and shows the processor usage from 0% to 200% on a dualie.
Fixing the sequential thing I'm not really sure what is going on. Let's see what the 4-thread results are. When looking at top, note what it says under the "#TH" (threads) column.
4-thread results:
./threaded_factorial
Creating Thread Number: 0
Creating Thread Number: 1
Creating Thread Number: 2
Creating Thread Number: 3
Loop Done; Time=73 secs for thread#:1, Loops=12500000
Loop Done; Time=77 secs for thread#:2, Loops=12500000
Loop Done; Time=78 secs for thread#:0, Loops=12500000
Loop Done; Time=57 secs for thread#:3, Loops=12500000
Top does show 5 threads. The first three (1,2,0) seemed to end at the same time; then, there was a long delay before the last one. Odd.
Now, this morning, with the first file (unthreaded_factorial) posted I got:
Start: 1063156763 End: 1063156773
i= 50000001
Time=10
Now, this means 10 secs, right?
So I got home and followed the instructions for the binaries set up the G5 one (threaded_g5). I used "chmod 755", typed "cd" to go to my Desktop where the files are and ran it via the terminal with ./ (which I'm spelling out just in case I made any obvious mistakes I need corrected ). I got:
Creating Thread Number: 0
Creating Thread Number: 1
Loop Done; Time=25 secs for thread#:0, Loops=25000000
Loop Done; Time=25 secs for thread#:1, Loops=25000000
So is this actually SLOWER as it looks? Why would that be?
For comparison I tried to set up the other 2 binaries, but when I try to run them, I'm told "Permision denied". Huh? I ran "chmod 755" after I did "cd desktop", but that shouldn't matter, should it? Again, I know nothing about UNIX so all help would be helpful .
Originally posted by Mount_my_floppy
Hmm this is interesting, I just ran it on a G4 350 upgraded to a 1.2GHz G4 via sonnet and a 120GB ATA133 on a ata133 card via PCI slot. It actually ran FASTER than the MDD 1.25 we have listed in this thread
It got 18 seconds.
Hey! That's mine!
Who knows? Both have cache running @ processor speed (and my assumption is that the benchmark will run entirely in cache). I wasn't running anything else in the background, either.
The only thing I can think of is the version of cc I'm running? cc --version shows:
cc (GCC) 3.1 20020420 (prerelease)
Copyright (C) 2002 Free Software Foundation, Inc.
In any case, enjoy! Your upgraded G4 beat a single-processor Xeon 2.8GHz system in this benchmark!
-John
Originally posted by lundy
Dual Xserve 1.33 with multithreading and -O3 is lowest result so far!
My XLC results don't count?
Originally posted by bangstudios
Hey! That's mine!
Who knows? Both have cache running @ processor speed (and my assumption is that the benchmark will run entirely in cache). I wasn't running anything else in the background, either.
The only thing I can think of is the version of cc I'm running? cc --version shows:
cc (GCC) 3.1 20020420 (prerelease)
Copyright (C) 2002 Free Software Foundation, Inc.
In any case, enjoy! Your upgraded G4 beat a single-processor Xeon 2.8GHz system in this benchmark!
-John
Addendum: Sure enough, running a binary compiled with GCC 3.3 (from page 1 of this thread) gave me a result of 18 seconds. Whew!
-John
Originally posted by Eugene
My XLC results don't count?
Wups, I overlooked that part of your post.
Originally posted by Gabid
Some more G5 results (again, from a 1.8 GHz, 512 MB, processor set to highest).
So is this actually SLOWER as it looks? Why would that be?
OK, that's the first Mac anomalous result. Not having a G5, I can't run it, but I will take a look at the file if Shark will do it. At least we know something is wrong with the compile, because it's the only thing changed.
EDIT: As far as the permissions, do an "ls -l" on the directory that the files are in and post the output.
EDIT: As far as the permissions, do an "ls -l" on the directory that the files are in and post the output. [/B]
I think I see the problem now, though I have no idea how to fix it: the files I can run have as permissions: "-rwxr-xr-x" while the ones I can't have "-rw-r--r--".
Originally posted by lundy
OK, that's the first Mac anomalous result. Not having a G5, I can't run it, but I will take a look at the file if Shark will do it. At least we know something is wrong with the compile, because it's the only thing changed.
Yeah, I thought my numbers looked strange. I wonder what things will look like if you're able to fix things?
Originally posted by Gabid
I think I see the problem now, though I have no idea how to fix it: the files I can run have as permissions: "-rwxr-xr-x" while the ones I can't have "-rw-r--r--".
Somehow your permissions didn't get to those files - there is no "x" (for execute) in the first group (user).
Do the chmod 755 <filename> again. In Terminal, type 'chmod 755 ' (note the trailing space) and then drag the file from the Finder into the Terminal window and hit Return.
Originally posted by lundy
Somehow your permissions didn't get to those files - there is no "x" (for execute) in the first group (user).
Do the chmod 755 <filename> again. In Terminal, type 'chmod 755 ' (note the trailing space) and then drag the file from the Finder into the Terminal window and hit Return.
Oops! That was it. So now I've been able to run all four binaries, with the unthreaded still being the fastest (should this be the case?).
(In order of speed, from slowest to fastest. Note that a couple of these results were about 1 sec slower until I ran them a second [and subsequent] times)
"threaded_no-opim":
Creating Thread Number: 0
Creating Thread Number: 1
Loop Done; Time=44 secs for thread#:0, Loops=25000000
Loop Done; Time=45 secs for thread#:1, Loops=25000000
"threaded_g5":
Creating Thread Number: 0
Creating Thread Number: 1
Loop Done; Time=25 secs for thread#:1, Loops=25000000
Loop Done; Time=25 secs for thread#:0, Loops=25000000
"threaded_std":
Creating Thread Number: 0
Creating Thread Number: 1
Loop Done; Time=10 secs for thread#:1, Loops=25000000
Loop Done; Time=10 secs for thread#:0, Loops=25000000
"unthreaded_factorial":
Start: 1063167093 End: 1063167103
i= 50000001
Time=10
There you go: discuss and explain away! This non-techie loves learning what all these numbers mean (though I can appreicate the fun of factorials; back in high school my friends and I used to see whose calculator could run 69! the fastest, since that was the biggest number they could handle).
Another thing thing of note is that if I had any other apps besides the Terminal running the times would be longer, but the G5 seemed to be affected the most (going as high as 31 or 32 secs per thread).
Originally posted by Gabid
Oops! That was it. So now I've been able to run all four binaries, with the unthreaded still being the fastest (should this be the case?).
(In order of speed, from slowest to fastest. Note that a couple of these results were about 1 sec slower until I ran them a second [and subsequent] times)
"threaded_no-opim":
Loop Done; Time=44 secs for thread#:0, Loops=25000000
Loop Done; Time=45 secs for thread#:1, Loops=25000000
"threaded_g5":
Loop Done; Time=25 secs for thread#:1, Loops=25000000
Loop Done; Time=25 secs for thread#:0, Loops=25000000
"threaded_std":
Loop Done; Time=10 secs for thread#:1, Loops=25000000
Loop Done; Time=10 secs for thread#:0, Loops=25000000
"unthreaded_factorial":
Time=10
There you go: discuss and explain away! This non-techie loves learning what all these numbers mean (though I can appreicate the fun of factorials; back in high school my friends and I used to see whose calculator could run 69! the fastest, since that was the biggest number they could handle).
Another thing thing of note is that if I had any other apps besides the Terminal running the times would be longer, but the G5 seemed to be affected the most (going as high as 31 or 32 secs per thread).
OK - maybe it's not as bizarre as we thought.
First, the threading should not help your times - you have a single-processor machine. So it is not surprising that the unthreaded result of 10 seconds is the same as the threaded_std result (I think the threaded_std compile was with optimization set to -O3, the setting for speed at the expense of memory use).
Also, with the exception of the "G5-optimized" code, we are seeing that your 1.8 single G5 runs the test at basically the same speed as a 1.0 dual G4. This means to me that the integer non-Altivec arithmetic is about the same between the G4 and the G5. Where the G5 will kick ass on the G4 is on real-world apps where memory has to be loaded and processed.
The G5 optimized run still isn't explained. I think either Apple is wrong about the settings, or there might be a bug in the new G5 code-generation part of the compiler.
What we should do is compile for G5 with IBM's compiler. Eugene got 9 seconds on his run.
Anybody know the IBM link? I can do a cross-compile for the G5 and upload the executable.
[xxx-Computer:~/desktop] hawk% cc -v
Reading specs from /usr/libexec/gcc/darwin/ppc/3.1/specs
Thread model: posix
Apple Computer, Inc. GCC version 1175, based on gcc version 3.1 20020420 (prerelease)
[xxx-Computer:~/desktop] hawk% cc -O3 -o unthreaded_factorial unthreaded_factorial.c
[xxx-Computer:~/desktop] hawk% ./unthreaded_factorial
Start: 1063175301 End: 1063175373
i= 50000001
Time=72
[xxx-Computer:~/desktop] hawk% cc -O3 -lpthread -o unthreaded_factorial unthreaded_factorial.c
[xxx-Computer:~/desktop] hawk% ./unthreaded_factorial
Start: 1063175807 End: 1063175879
i= 50000001
Time=72
IBM XL C Version 6.0, XL C++ Version 6.0 for Mac OS X, Beta
[xxx-Computer:~/desktop] hawk% xlc -O5 -o unthreaded_factorial unthreaded_factorial.c
[xxx-Computer:~/desktop] hawk% ./unthreaded_factorial
Start: 1063170913 End: 1063170958
i= 50000001
Time=45