With just a simple "make" and no optimizing or flags set of any kind here are my results:
Dual 533
1GB of RAM
FFT: 200000
37 seconds.
Since I have a dual processor machine and this app is NOT multithreaded I decided to run two at the same time with FFT set to 100000 on each and then combined the times and got a result of:
34 seconds. with the same build as before. ( I added one second to make up for the time it took me to switch terminal windows and press return..)
for comparison to this form of multithreading I ran it once more with only one thread running to 100000 and multiplied by 2 and came up with:
28 Seconds.
Maybe later I will look at adding dual processor support and doing some optimizing. Till then I think this is sufficient.
Interesting, what is the processor in this machine?
Intel P4 (Pentium-M) 1.7Ghz
... unfortunately it's not Centrino... or else i'd have more battery life (and i think more speed too). But that's ok since i have a swapping bay (using 2 batteries)
This is extremely interesting as I am a statistician doing a lot of simulations. Using the Windows binary in your zip file, I got the following on my Celeron 2.5 Ghs home computer runing Windows XP:
Calculation of PI using FFT and AGM, ver. LG1.1.2-MP1.5.2af.memsave
I got 13 secs on a stock single 1.8 G5 running the binary provided by SilentEchoes. Would someone more in the know than I know the degree to which this could be improved by targeting the G5 specifically or by using AltiVec?
Well all cross platform tests are seriously flawed because of optimizations. I actually tend to think that the test is less skewed when no optimizations are used at all.
Gabid I am going to pop open the source code now and see if I can't tweak it a bit and run a more advanced compile.
63 secs on a 667 DVI Ti with 768 ram (running adium, safari, firefox, mail, synergy, fuzzy clock, menumeters (cpu and network), various other services, and the finder 10.3 in the background.)
Well all cross platform tests are seriously flawed because of optimizations. I actually tend to think that the test is less skewed when no optimizations are used at all.
Gabid I am going to pop open the source code now and see if I can't tweak it a bit and run a more advanced compile.
Much obliged. If you post it, I'll run it. Gotta use this G5 for something
This is extremely interesting as I am a statistician doing a lot of simulations. Using the Windows binary in your zip file, I got the following on my Celeron 2.5 Ghs home computer runing Windows XP:
.
.
.
.
Total 56 sec. (real time),\tchksum= 3f99
Very interesting. How much L2 cache has this machine?
Of course this test is inherently flawed... compilers used, optimizations used (or not, such as altivec), etc etc etc.
Of course it is inherently flawed. The win32 binaries are compiled with some Intel compiler (I don't remember now the version; running "strings" on the binary will reveal it). But the source code is here. Could anyone recompile for Windows with gcc3? That way we eliminate (at least minimize) the compiler effect.
OK, so it is a Pentium 4-M. Let's see: Pentium-M (Centrino) 1.6 GHz I posted at the beginning: 13 sec; Pentium 4-M 1.7 GHz: 21 sec. So, clock for clock, the Pentium-M is about twice as fast as a Pentium 4-M in this test .
OK, so it is a Pentium 4-M. Let's see: Pentium-M (Centrino) 1.6 GHz I posted at the beginning: 13 sec; Pentium 4-M 1.7 GHz: 21 sec. So, clock for clock, the Pentium-M is about twice as fast as a Pentium 4-M in this test .
Centrinos were meant to have practically double speed & double battery life
The L2 cache for Celeron is only 126K, which is why it is so slow. But I bought the machine for $400.
That's fine, but the average person would never expect from the "visible" specifications that a 450 MHz G4 can beat by a very measurable margin this 2.5 GHz machine. By the way, is there some L3 cache in winewise's machine?
Comments
Dual 533
1GB of RAM
FFT: 200000
37 seconds.
Since I have a dual processor machine and this app is NOT multithreaded I decided to run two at the same time with FFT set to 100000 on each and then combined the times and got a result of:
34 seconds. with the same build as before. ( I added one second to make up for the time it took me to switch terminal windows and press return..)
for comparison to this form of multithreading I ran it once more with only one thread running to 100000 and multiplied by 2 and came up with:
28 Seconds.
Maybe later I will look at adding dual processor support and doing some optimizing. Till then I think this is sufficient.
Originally posted by PB
Interesting, what is the processor in this machine?
Intel P4 (Pentium-M) 1.7Ghz
... unfortunately it's not Centrino... or else i'd have more battery life (and i think more speed too). But that's ok since i have a swapping bay (using 2 batteries)
This is extremely interesting as I am a statistician doing a lot of simulations. Using the Windows binary in your zip file, I got the following on my Celeron 2.5 Ghs home computer runing Windows XP:
Calculation of PI using FFT and AGM, ver. LG1.1.2-MP1.5.2af.memsave
nfft= 262144
radix= 10000
error_margin= 0.00460427
mem_alloc_size= 9437304
calculating 1048576 digits of PI...
AGM iteration,\ttime= 3,\tchksum= ffffdb4c
precision= 48,\ttime= 5,\tchksum= fffff18d
precision= 80,\ttime= 8,\tchksum= ffffe3e4
precision= 176,\ttime= 11,\tchksum= ffffdd72
precision= 352,\ttime= 14,\tchksum= ffffd78d
precision= 688,\ttime= 17,\tchksum= ffffd47c
precision= 1392,\ttime= 21,\tchksum= ffffd90c
precision= 2784,\ttime= 24,\tchksum= fffff9ad
precision= 5584,\ttime= 27,\tchksum= ffffc2dd
precision= 11168,\ttime= 30,\tchksum= ffffcb7a
precision= 22336,\ttime= 34,\tchksum= ffffe6b3
precision= 44688,\ttime= 37,\tchksum= fffff96d
precision= 89408,\ttime= 40,\tchksum= ffffed75
precision= 178816,\ttime= 43,\tchksum= fffffbe5
precision= 357648,\ttime= 46,\tchksum= ffffe715
precision= 715312,\ttime= 49,\tchksum= fffff393
precision= 1430640,\ttime= 52,\tchksum= ffffe44b
Total 56 sec. (real time),\tchksum= 3f99
46 sec on a Pentium 3 @ 750mhz (mobile version?)
Of course this test is inherently flawed... compilers used, optimizations used (or not, such as altivec), etc etc etc.
Gabid I am going to pop open the source code now and see if I can't tweak it a bit and run a more advanced compile.
Originally posted by SilentEchoes
Well all cross platform tests are seriously flawed because of optimizations. I actually tend to think that the test is less skewed when no optimizations are used at all.
Gabid I am going to pop open the source code now and see if I can't tweak it a bit and run a more advanced compile.
Much obliged. If you post it, I'll run it. Gotta use this G5 for something
Originally posted by si_flippant
mmmm, i wasn't...
Yes, you were, not the one you thought I thought, the other one...
Originally posted by alsoRun
PB,
This is extremely interesting as I am a statistician doing a lot of simulations. Using the Windows binary in your zip file, I got the following on my Celeron 2.5 Ghs home computer runing Windows XP:
.
.
.
.
Total 56 sec. (real time),\tchksum= 3f99
Very interesting. How much L2 cache has this machine?
Originally posted by chych
Of course this test is inherently flawed... compilers used, optimizations used (or not, such as altivec), etc etc etc.
Of course it is inherently flawed. The win32 binaries are compiled with some Intel compiler (I don't remember now the version; running "strings" on the binary will reveal it). But the source code is here. Could anyone recompile for Windows with gcc3? That way we eliminate (at least minimize) the compiler effect.
Originally posted by chych
18 sec on a Dual 1.25 G4 (only uses one processor though).
46 sec on a Pentium 3 @ 750mhz (mobile version?)
Hmmm, perhaps not. I tested it in a desktop Pentium III @ 800 MHz under Windows 2000 and it gave 42 sec total.
Originally posted by deunan
Intel P4 (Pentium-M) 1.7Ghz
... unfortunately it's not Centrino...
OK, so it is a Pentium 4-M. Let's see: Pentium-M (Centrino) 1.6 GHz I posted at the beginning: 13 sec; Pentium 4-M 1.7 GHz: 21 sec. So, clock for clock, the Pentium-M is about twice as fast as a Pentium 4-M in this test .
Originally posted by PB
OK, so it is a Pentium 4-M. Let's see: Pentium-M (Centrino) 1.6 GHz I posted at the beginning: 13 sec; Pentium 4-M 1.7 GHz: 21 sec. So, clock for clock, the Pentium-M is about twice as fast as a Pentium 4-M in this test .
Centrinos were meant to have practically double speed & double battery life
calculating 1048576 digits of PI...
AGM iteration, time= 3, chksum= ffffdb4c
precision= 48, time= 6, chksum= fffff18d
precision= 80, time= 8, chksum= ffffe3e4
precision= 176, time= 11, chksum= ffffdd72
precision= 352, time= 13, chksum= ffffd78d
precision= 688, time= 16, chksum= ffffd47c
precision= 1392, time= 19, chksum= ffffd90c
precision= 2784, time= 21, chksum= fffff9ad
precision= 5584, time= 24, chksum= ffffc2dd
precision= 11168, time= 26, chksum= ffffcb7a
precision= 22336, time= 29, chksum= ffffe6b3
precision= 44688, time= 31, chksum= fffff96d
precision= 89408, time= 34, chksum= ffffed75
precision= 178816, time= 36, chksum= fffffbe5
precision= 357648, time= 39, chksum= ffffe715
precision= 715312, time= 41, chksum= fffff393
precision= 1430640, time= 44, chksum= ffffe44b
writing pi.dat...
Total 48 sec. (real time), chksum= 3f99
Originally posted by PB
Very interesting. How much L2 cache has this machine?
Originally posted by alsoRun
The L2 cache for Celeron is only 126K, which is why it is so slow. But I bought the machine for $400.
That's fine, but the average person would never expect from the "visible" specifications that a 450 MHz G4 can beat by a very measurable margin this 2.5 GHz machine. By the way, is there some L3 cache in winewise's machine?
Originally posted by PB
Yes, you were, not the one you thought I thought, the other one...
mmmm, no... but jeez... if it makes you feel better... FFS.