That's fine, but the average person would never expect from the "visible" specifications that a 450 MHz G4 can beat by a very measurable margin this 2.5 GHz machine. By the way, is there some L3 cache in winewise's machine?
No, the slower G4s (350-533 MHz) have 1 MB of 2:1 L2 cache per processor. The 667 MHz and higher have 256 kb of 1:1 L2 cache, and some models have 1 MB or 2 MB of L3 cache per processor.
Would not xbench be easier? I'm not doing to ff whatever test cause I am too lazy.... i get 108.6 on a TiBook 1Ghz 1GB RAM and 5400rpm HD.
Certainly it is easier, that's not the point. But with the FFT pi calculation test, we have the source code and not only someone here could recompile the code for windows with gcc3 (to minimize the compiler effect and provide a more acceptable cross-platform CPU test), but I hoped that someone would optimise the code for the Altivec unit. This would give us a better idea of what happens on executing the same code when the whole G4(or 5) processor is being keeped busy vs. when only the scalar part works vs. when executed on x86 CPUs. Keep in mind, it would be the same code, not ports or adaptations.
The PowerBook is running Panther, the Intel box is Win XP, and the AMD box is Knoppix 3.3. Unfortunately my 64 bit Gentoo install just went belly up when we lost power a few days back.
Comments
Originally posted by si_flippant
mmmm, no... but jeez... if it makes you feel better... FFS.
OK, it's not exactly that, but... ffs (for fun's sake). Thanks for the fun si_flippant .
200,000 length
Run 1: 60 sec
Runs 2 and 3: 52 sec
100,000 length
Runs 1 and 2: 21 sec
2 x 100,000 length, simultaneous: 43 sec each
Interesting that the second run went faster by 13%. And threading two simultaneously didn't hurt total performance.
Calculation of PI using FFT and AGM, ver. LG1.1.2-MP1.5.2af.memsave
nfft= 262144
radix= 10000
error_margin= 0.00460427
mem_alloc_size= 9437304
calculating 1048576 digits of PI...
AGM iteration,\ttime= 2,\tchksum= ffffdb4c
precision= 48,\ttime= 3,\tchksum= fffff18d
precision= 80,\ttime= 5,\tchksum= ffffe3e4
precision= 176,\ttime= 6,\tchksum= ffffdd72
precision= 352,\ttime= 8,\tchksum= ffffd78d
precision= 688,\ttime= 10,\tchksum= ffffd47c
precision= 1392,\ttime= 11,\tchksum= ffffd90c
precision= 2784,\ttime= 13,\tchksum= fffff9ad
precision= 5584,\ttime= 14,\tchksum= ffffc2dd
precision= 11168,\ttime= 16,\tchksum= ffffcb7a
precision= 22336,\ttime= 18,\tchksum= ffffe6b3
precision= 44688,\ttime= 19,\tchksum= fffff96d
precision= 89408,\ttime= 21,\tchksum= ffffed75
precision= 178816,\ttime= 22,\tchksum= fffffbe5
precision= 357648,\ttime= 24,\tchksum= ffffe715
precision= 715312,\ttime= 26,\tchksum= fffff393
precision= 1430640,\ttime= 27,\tchksum= ffffe44b
Total 29 sec. (real time),\tchksum= 3f99
Originally posted by PB
That's fine, but the average person would never expect from the "visible" specifications that a 450 MHz G4 can beat by a very measurable margin this 2.5 GHz machine. By the way, is there some L3 cache in winewise's machine?
No, the slower G4s (350-533 MHz) have 1 MB of 2:1 L2 cache per processor. The 667 MHz and higher have 256 kb of 1:1 L2 cache, and some models have 1 MB or 2 MB of L3 cache per processor.
Just doesn't sound right.
BTW: a 400Mhz Pismo finishes in 67 seconds.
EDIT: Ran it again on my G5 and got 12 seconds. I think I added a few extra "0" at the end of the 200,000.
Originally posted by Ebby
EDIT: Ran it again on my G5 and got 12 seconds. I think I added a few extra "0" at the end of the 200,000.
Riiiight! Feed the monster with some extra zeros .
Calculation of PI using FFT and AGM, ver. LG1.1.2-MP1.5.2af.memsave
length of FFT =?
200000
initializing...
nfft= 262144
radix= 10000
error_margin= 0.00587898
mem_alloc_size= 9437304
calculating 1048576 digits of PI...
AGM iteration, time= 1, chksum= ffffdb4c
precision= 48, time= 2, chksum= fffff18d
precision= 80, time= 3, chksum= ffffe3e4
precision= 176, time= 4, chksum= ffffdd72
precision= 352, time= 5, chksum= ffffd78d
precision= 688, time= 6, chksum= ffffd47c
precision= 1392, time= 7, chksum= ffffd90c
precision= 2784, time= 9, chksum= fffff9ad
precision= 5584, time= 10, chksum= ffffc2dd
precision= 11168, time= 11, chksum= ffffcb7a
precision= 22336, time= 12, chksum= ffffe6b3
precision= 44688, time= 13, chksum= fffff96d
precision= 89408, time= 14, chksum= ffffed75
precision= 178816, time= 15, chksum= fffffbe5
precision= 357648, time= 16, chksum= ffffe715
precision= 715312, time= 17, chksum= fffff393
precision= 1430640, time= 18, chksum= ffffe44b
writing pi.dat...
Total 20 sec. (real time), chksum= 3f99
Originally posted by Algol
Would not xbench be easier? I'm not doing to ff whatever test cause I am too lazy.... i get 108.6 on a TiBook 1Ghz 1GB RAM and 5400rpm HD.
Certainly it is easier, that's not the point. But with the FFT pi calculation test, we have the source code and not only someone here could recompile the code for windows with gcc3 (to minimize the compiler effect and provide a more acceptable cross-platform CPU test), but I hoped that someone would optimise the code for the Altivec unit. This would give us a better idea of what happens on executing the same code when the whole G4(or 5) processor is being keeped busy vs. when only the scalar part works vs. when executed on x86 CPUs. Keep in mind, it would be the same code, not ports or adaptations.
nfft= 262144
radix= 10000
error_margin= 0.00587898
mem_alloc_size= 9437304
calculating 1048576 digits of PI...
AGM iteration, time= 1, chksum= ffffdb4c
precision= 48, time= 2, chksum= fffff18d
precision= 80, time= 4, chksum= ffffe3e4
precision= 176, time= 5, chksum= ffffdd72
precision= 352, time= 6, chksum= ffffd78d
precision= 688, time= 7, chksum= ffffd47c
precision= 1392, time= 9, chksum= ffffd90c
precision= 2784, time= 10, chksum= fffff9ad
precision= 5584, time= 11, chksum= ffffc2dd
precision= 11168, time= 12, chksum= ffffcb7a
precision= 22336, time= 14, chksum= ffffe6b3
precision= 44688, time= 15, chksum= fffff96d
precision= 89408, time= 16, chksum= ffffed75
precision= 178816, time= 17, chksum= fffffbe5
precision= 357648, time= 19, chksum= ffffe715
precision= 715312, time= 20, chksum= fffff393
precision= 1430640, time= 21, chksum= ffffe44b
writing pi.dat...
Total 23 sec. (real time), chksum= 3f99
PI_CS 13 secs
PI_CW 29 secs
PowerBook 12" G4 1.33GHz / 768MB RAM: 24 sec
Intel P4-2.6 @ 3.25GHz / 512MB RAM: 7 sec
AMD Athlon 64 3200+ / 512MB RAM: 9 sec
The PowerBook is running Panther, the Intel box is Win XP, and the AMD box is Knoppix 3.3. Unfortunately my 64 bit Gentoo install just went belly up when we lost power a few days back.
B
Wonder if they will notice if I download 100megs of GNU software at work...
Calculation of PI using FFT and AGM, ver. LG1.1.2-MP1.5.2af.memsave
nfft= 262144
radix= 10000
error_margin= 0.00460427
mem_alloc_size= 9437304
calculating 1048576 digits of PI...
AGM iteration,\ttime= 1,\tchksum= ffffdb4c
precision= 48,\ttime= 2,\tchksum= fffff18d
precision= 80,\ttime= 3,\tchksum= ffffe3e4
precision= 176,\ttime= 3,\tchksum= ffffdd72
precision= 352,\ttime= 4,\tchksum= ffffd78d
precision= 688,\ttime= 5,\tchksum= ffffd47c
precision= 1392,\ttime= 6,\tchksum= ffffd90c
precision= 2784,\ttime= 7,\tchksum= fffff9ad
precision= 5584,\ttime= 8,\tchksum= ffffc2dd
precision= 11168,\ttime= 9,\tchksum= ffffcb7a
precision= 22336,\ttime= 10,\tchksum= ffffe6b3
precision= 44688,\ttime= 11,\tchksum= fffff96d
precision= 89408,\ttime= 12,\tchksum= ffffed75
precision= 178816,\ttime= 13,\tchksum= fffffbe5
precision= 357648,\ttime= 13,\tchksum= ffffe715
precision= 715312,\ttime= 14,\tchksum= fffff393
precision= 1430640,\ttime= 15,\tchksum= ffffe44b
Total 17 sec. (real time),\tchksum= 3f99
Thanks
B
Running pi_ca
- Xidius
Originally posted by >_>
15 seconds on my 1.5Ghz 17" Powerbook.
Running pi_ca
- Xidius
What gives the pi_cs?
- Xidius