Measure the speed of you's guys'es MACS

silentechoes · April 26, 2004 7:25PM

With just a simple "make" and no optimizing or flags set of any kind here are my results:

Dual 533

1GB of RAM

FFT: 200000

37 seconds.

Since I have a dual processor machine and this app is NOT multithreaded I decided to run two at the same time with FFT set to 100000 on each and then combined the times and got a result of:

34 seconds. with the same build as before. ( I added one second to make up for the time it took me to switch terminal windows and press return..)

for comparison to this form of multithreading I ran it once more with only one thread running to 100000 and multiplied by 2 and came up with:

28 Seconds.

Maybe later I will look at adding dual processor support and doing some optimizing. Till then I think this is sufficient.

silentechoes · April 26, 2004 7:33PM

Here is the binary I used. Just download it, unzip it and double click it. It will launch the terminal for you.

deunan · April 26, 2004 8:27PM

Quote:

Originally posted by PB

Interesting, what is the processor in this machine?

Intel P4 (Pentium-M) 1.7Ghz

... unfortunately it's not Centrino... or else i'd have more battery life (and i think more speed too). But that's ok since i have a swapping bay (using 2 batteries)

alsorun · April 26, 2004 9:52PM

PB,

This is extremely interesting as I am a statistician doing a lot of simulations. Using the Windows binary in your zip file, I got the following on my Celeron 2.5 Ghs home computer runing Windows XP:

Calculation of PI using FFT and AGM, ver. LG1.1.2-MP1.5.2af.memsave

nfft= 262144

radix= 10000

error_margin= 0.00460427

mem_alloc_size= 9437304

calculating 1048576 digits of PI...

AGM iteration,\ttime= 3,\tchksum= ffffdb4c

precision= 48,\ttime= 5,\tchksum= fffff18d

precision= 80,\ttime= 8,\tchksum= ffffe3e4

precision= 176,\ttime= 11,\tchksum= ffffdd72

precision= 352,\ttime= 14,\tchksum= ffffd78d

precision= 688,\ttime= 17,\tchksum= ffffd47c

precision= 1392,\ttime= 21,\tchksum= ffffd90c

precision= 2784,\ttime= 24,\tchksum= fffff9ad

precision= 5584,\ttime= 27,\tchksum= ffffc2dd

precision= 11168,\ttime= 30,\tchksum= ffffcb7a

precision= 22336,\ttime= 34,\tchksum= ffffe6b3

precision= 44688,\ttime= 37,\tchksum= fffff96d

precision= 89408,\ttime= 40,\tchksum= ffffed75

precision= 178816,\ttime= 43,\tchksum= fffffbe5

precision= 357648,\ttime= 46,\tchksum= ffffe715

precision= 715312,\ttime= 49,\tchksum= fffff393

precision= 1430640,\ttime= 52,\tchksum= ffffe44b

Total 56 sec. (real time),\tchksum= 3f99

gabid · April 26, 2004 10:17PM

I got 13 secs on a stock single 1.8 G5 running the binary provided by SilentEchoes. Would someone more in the know than I know the degree to which this could be improved by targeting the G5 specifically or by using AltiVec?

chych · April 26, 2004 10:22PM

18 sec on a Dual 1.25 G4 (only uses one processor though).

46 sec on a Pentium 3 @ 750mhz (mobile version?)

Of course this test is inherently flawed... compilers used, optimizations used (or not, such as altivec), etc etc etc.

silentechoes · April 26, 2004 10:38PM

Well all cross platform tests are seriously flawed because of optimizations. I actually tend to think that the test is less skewed when no optimizations are used at all.

Gabid I am going to pop open the source code now and see if I can't tweak it a bit and run a more advanced compile.

paul · April 26, 2004 10:45PM

63 secs on a 667 DVI Ti with 768 ram (running adium, safari, firefox, mail, synergy, fuzzy clock, menumeters (cpu and network), various other services, and the finder 10.3 in the background.)

gabid · April 26, 2004 10:51PM

Quote:

Originally posted by SilentEchoes

Well all cross platform tests are seriously flawed because of optimizations. I actually tend to think that the test is less skewed when no optimizations are used at all.

Gabid I am going to pop open the source code now and see if I can't tweak it a bit and run a more advanced compile.

Much obliged. If you post it, I'll run it. Gotta use this G5 for something

ichiban_jay · April 27, 2004 12:05AM

33 seconds on a 12" 1 ghz powerbook. we are doing 200000 compilations right?

pb · April 27, 2004 1:16AM

Quote:

Originally posted by si_flippant

mmmm, i wasn't...

Yes, you were, not the one you thought I thought, the other one...

pb · April 27, 2004 1:18AM

Quote:

Originally posted by alsoRun

PB,

This is extremely interesting as I am a statistician doing a lot of simulations. Using the Windows binary in your zip file, I got the following on my Celeron 2.5 Ghs home computer runing Windows XP:

.

.

.

.

Total 56 sec. (real time),\tchksum= 3f99

Very interesting. How much L2 cache has this machine?

pb · April 27, 2004 1:22AM

Quote:

Originally posted by chych

Of course this test is inherently flawed... compilers used, optimizations used (or not, such as altivec), etc etc etc.

Of course it is inherently flawed. The win32 binaries are compiled with some Intel compiler (I don't remember now the version; running "strings" on the binary will reveal it). But the source code is here. Could anyone recompile for Windows with gcc3? That way we eliminate (at least minimize) the compiler effect.

pb · April 27, 2004 1:27AM

Quote:

Originally posted by chych

18 sec on a Dual 1.25 G4 (only uses one processor though).

46 sec on a Pentium 3 @ 750mhz (mobile version?)

Hmmm, perhaps not. I tested it in a desktop Pentium III @ 800 MHz under Windows 2000 and it gave 42 sec total.

pb · April 27, 2004 1:43AM

Quote:

Originally posted by deunan

Intel P4 (Pentium-M) 1.7Ghz

... unfortunately it's not Centrino...

OK, so it is a Pentium 4-M. Let's see: Pentium-M (Centrino) 1.6 GHz I posted at the beginning: 13 sec; Pentium 4-M 1.7 GHz: 21 sec. So, clock for clock, the Pentium-M is about twice as fast as a Pentium 4-M in this test

.

deunan · April 27, 2004 5:59AM

Quote:

Originally posted by PB

OK, so it is a Pentium 4-M. Let's see: Pentium-M (Centrino) 1.6 GHz I posted at the beginning: 13 sec; Pentium 4-M 1.7 GHz: 21 sec. So, clock for clock, the Pentium-M is about twice as fast as a Pentium 4-M in this test .

Centrinos were meant to have practically double speed & double battery life

winewise · April 27, 2004 7:07AM

Here is the result on an old Dual 450 with 1 Gig of RAM

calculating 1048576 digits of PI...

AGM iteration, time= 3, chksum= ffffdb4c

precision= 48, time= 6, chksum= fffff18d

precision= 80, time= 8, chksum= ffffe3e4

precision= 176, time= 11, chksum= ffffdd72

precision= 352, time= 13, chksum= ffffd78d

precision= 688, time= 16, chksum= ffffd47c

precision= 1392, time= 19, chksum= ffffd90c

precision= 2784, time= 21, chksum= fffff9ad

precision= 5584, time= 24, chksum= ffffc2dd

precision= 11168, time= 26, chksum= ffffcb7a

precision= 22336, time= 29, chksum= ffffe6b3

precision= 44688, time= 31, chksum= fffff96d

precision= 89408, time= 34, chksum= ffffed75

precision= 178816, time= 36, chksum= fffffbe5

precision= 357648, time= 39, chksum= ffffe715

precision= 715312, time= 41, chksum= fffff393

precision= 1430640, time= 44, chksum= ffffe44b

writing pi.dat...

Total 48 sec. (real time), chksum= 3f99

alsorun · April 27, 2004 7:17AM

The L2 cache for Celeron is only 126K, which is why it is so slow. But I bought the machine for $400.

Quote:

Originally posted by PB

Very interesting. How much L2 cache has this machine?

pb · April 27, 2004 8:25AM

Quote:

Originally posted by alsoRun

The L2 cache for Celeron is only 126K, which is why it is so slow. But I bought the machine for $400.

That's fine, but the average person would never expect from the "visible" specifications that a 450 MHz G4 can beat by a very measurable margin this 2.5 GHz machine. By the way, is there some L3 cache in winewise's machine?

si_flippant · April 27, 2004 12:09PM

Quote:

Originally posted by PB

Yes, you were, not the one you thought I thought, the other one...

mmmm, no... but jeez... if it makes you feel better... FFS.

Measure the speed of you's guys'es MACS

Comments