One of the reasons that the Core Duo as well as the Woodcrest do so well on the factorial test is the 2MB of level 2 cache. The entire binary is only 40 KB - it can almost fit in level 1 cache. So the CPU doesn't have to go to RAM at all, unless you have other processes being switched in and out which are clobbering the cache.
I might be able to get a faster time then. At the time I ran the script I was running several programs. Then again, I still had plenty of free ram. Would it make a difference if I ran the script right after a reboot? My guess is no, just because I know next to nothing about how these things work.
If keeping the code in the cache is important, then you should get a faster time the second time you run it.
I'm speaking here only of the C application - the Threaded Factorial. The Monty Hall is written in Perl so the Perl interpreter is what is getting loaded, and it may be too big to fit in the cache. One of these days I should port that to C.
I prefer these kinds of tests because they are 100% threaded with no waits, and there is no disk access, video, or RAM issue to confuse the results.
Look at mikethomson's results for 1, 2 and 4 threads - it scales almost perfectly linearly.
You can give it some ridiculous number of threads like 64 if you want - then you should see a worse performance as Mach has to flip each thread in and out for no purpose. Of course, the times for each of the 64 threads will be zero as each thread has very few iterations to do.
Loop Done; Time=353 secs for thread#:0, Loops=2147483647
Quite a big difference between the PowerBook and the MacBook, eh?
I would like to have tested one of the MacBook Pros I returned, because I'm convinced they were unusually slow and that the MacBook ran circles around them...
Anybody out there got a MacBook Pro 17" that they can test?
Folder with Xcode project for threadedFactorialHighRes:Attachment 125
For the threadedFactorialHighRes, you can either just compile the main.c file in the command line, or you can double-click the threadedFactorialHighRes.xcodeproj file and the whole project will open in Xcode. Of course, the paths will not be correct for your machine so that will require some fixing. If you don't know how to do that in Xcode or if you do not have Xcode, just use gcc to compile on the command line. Make sure whatever flag to make a universal binary is set - I do not know what it is.
If you DO have Xcode, the simplest way to get around the paths problem is to make a new project (choose command-line tool) and then just copy the whole text of the main.c file into the main.c file of the new project, replacing everything that Xcode put there. Then you can use the Xcode compile and link interface. Be sure to set the architecture of the target to Universal (ppc and 386) in the Info panel of the target.
Nope. It makes two completely separate code generations, so the settings for one do not affect the other.
EDIT: of course, in large programs, if one has no need for one platform or the other, one can make the binary a smaller file by choosing one platform instead of Universal. But this app only generates about a 4K universal binary anyway, so it's moot.
Comments
One of the reasons that the Core Duo as well as the Woodcrest do so well on the factorial test is the 2MB of level 2 cache. The entire binary is only 40 KB - it can almost fit in level 1 cache. So the CPU doesn't have to go to RAM at all, unless you have other processes being switched in and out which are clobbering the cache.
I might be able to get a faster time then. At the time I ran the script I was running several programs. Then again, I still had plenty of free ram. Would it make a difference if I ran the script right after a reboot? My guess is no, just because I know next to nothing about how these things work.
I'm speaking here only of the C application - the Threaded Factorial. The Monty Hall is written in Perl so the Perl interpreter is what is getting loaded, and it may be too big to fit in the cache. One of these days I should port that to C.
I prefer these kinds of tests because they are 100% threaded with no waits, and there is no disk access, video, or RAM issue to confuse the results.
Look at mikethomson's results for 1, 2 and 4 threads - it scales almost perfectly linearly.
You can give it some ridiculous number of threads like 64 if you want - then you should see a worse performance as Mach has to flip each thread in and out for no purpose. Of course, the times for each of the 64 threads will be zero as each thread has very few iterations to do.
It keeps asking me to login, confirms that I have logged in, and the simply returns me to the login screen again.
Any ideas?
4 processors detected in system.
2147483647 total iterations will be performed.
Starting to create 4 threads.
Creating Thread Number: 0
Creating Thread Number: 1
Creating Thread Number: 2
Creating Thread Number: 3
Loop Done; Time=52 secs for thread#:3, Loops=536870911
Loop Done; Time=52 secs for thread#:0, Loops=536870911
Loop Done; Time=52 secs for thread#:2, Loops=536870911
Loop Done; Time=52 secs for thread#:1, Loops=536870911
2 processors detected in system.
2147483647 total iterations will be performed.
Starting to create 2 threads.
Creating Thread Number: 0
Creating Thread Number: 1
Loop Done; Time=121 secs for thread#:0, Loops=1073741823
Loop Done; Time=121 secs for thread#:1, Loops=1073741823
Quad Woodcrest: 35 seconds.
Impressive.
/Users/stephend/Desktop $ perl ./OSXAutoThreadMHP.pl -i 10000000
Automatic Processor Detection found 1 processor cores.
Playing 10000000 games across 1 cores...
Thread# 0 created.
Waiting for thread(s) to finish....
Grand totals for all threads:
Sticker has won 3334322 times
Switcher has won 6665678 times
Time elapsed for 10000000 solutions was 400.315786 seconds.
Hmm. I thought this 600Mhz G3 was pretty quick when I got it back in 2001...
--> Stephen
It also displays the big numbers with thousands separators for readability.
{Executable offline to check some things - use the original executable for now - JL}
2 processors detected in system.
2,147,483,647 total iterations will be performed.
Starting to create 2 threads.
Creating Thread Number: 0
Creating Thread Number: 1
Loop Done; Time=277 secs for thread#:0, Loops=1,073,741,823
Loop Done; Time=278 secs for thread#:1, Loops=1,073,741,823
It seems a bit strange that my MacBook outperformed my iMac, when they both have virtually the same processor?
2 processors detected in system.
2147483647 total iterations will be performed.
Starting to create 2 threads.
Creating Thread Number: 0
Creating Thread Number: 1
Loop Done; Time=110 secs for thread#:0, Loops=1073741823
Loop Done; Time=110 secs for thread#:1, Loops=1073741823
Which is more in line with the results from the MacBook
Lundy: is there a problem with Threaded Factorial 2?
It seems a bit strange that my MacBook outperformed my iMac, when they both have virtually the same processor?
Sure seems to be. Thanks for picking up on that. I will have to re-check the new code to see what the deal is.
How many iterations did you ask it to do?
Sure seems to be. Thanks for picking up on that. I will have to re-check the new code to see what the deal is.
How many iterations did you ask it to do?
I'm asking both 'Threaded Factorial' & 'Threaded Factorial2' to do 2,147,483,647 ? simply because I want to keep the playing field level.
1 processors detected in system.
2147483647 total iterations will be performed.
Starting to create 1 threads.
Creating Thread Number: 0
Loop Done; Time=353 secs for thread#:0, Loops=2147483647
Quite a big difference between the PowerBook and the MacBook, eh?
I would like to have tested one of the MacBook Pros I returned, because I'm convinced they were unusually slow and that the MacBook ran circles around them...
Anybody out there got a MacBook Pro 17" that they can test?
2 processors detected in system.
2147483647 total iterations will be performed.
Starting to create 2 threads.
Creating Thread Number: 0
Creating Thread Number: 1
Loop Done; Time=128 secs for thread#:1, Loops=1073741823
Loop Done; Time=128 secs for thread#:0, Loops=1073741823
I have a Quad, but the site won't let me download the link.
It keeps asking me to login, confirms that I have logged in, and the simply returns me to the login screen again.
Any ideas?
I'm having the same problem trying to download the perl script on my Quad G5 as well.
Also, lundy, any change we can get the source to your factorial application?
I'm having the same problem trying to download the perl script on my Quad G5 as well.
Also, lundy, any change we can get the source to your factorial application?
Absolutely!
I will post the Monty Hall and the C source to the threadedFactorialHighRes
below.
Monty Hall Perl Script:Attachment 124
Folder with Xcode project for threadedFactorialHighRes:Attachment 125
For the threadedFactorialHighRes, you can either just compile the main.c file in the command line, or you can double-click the threadedFactorialHighRes.xcodeproj file and the whole project will open in Xcode. Of course, the paths will not be correct for your machine so that will require some fixing. If you don't know how to do that in Xcode or if you do not have Xcode, just use gcc to compile on the command line. Make sure whatever flag to make a universal binary is set - I do not know what it is.
If you DO have Xcode, the simplest way to get around the paths problem is to make a new project (choose command-line tool) and then just copy the whole text of the main.c file into the main.c file of the new project, replacing everything that Xcode put there. Then you can use the Xcode compile and link interface. Be sure to set the architecture of the target to Universal (ppc and 386) in the Info panel of the target.
EDIT: of course, in large programs, if one has no need for one platform or the other, one can make the binary a smaller file by choosing one platform instead of Universal. But this app only generates about a 4K universal binary anyway, so it's moot.