Post your speeds - Calculate 50 Million Factorials

124678

Comments

  • Reply 61 of 146
    lundylundy Posts: 4,466member
    Quote:

    Originally posted by hawkman

    400Mhz B+W G3



    IBM XL C Version 6.0, XL C++ Version 6.0 for Mac OS X, Beta

    [xxx-Computer:~/desktop] hawk% xlc -O5 -o unthreaded_factorial unthreaded_factorial.c

    [xxx-Computer:~/desktop] hawk% ./unthreaded_factorial

    Start: 1063170913 End: 1063170958

    i= 50000001

    Time=45




    That's looking good.



    Downloading the IBM compiler now.
  • Reply 62 of 146
    eugeneeugene Posts: 8,254member
    G5 binaries built with XLC (-O5 -qtune=g5 -qarch=g5):



    http://www.eugenechan.com/~ceugene/g5factorial.tgz
  • Reply 63 of 146
    lundylundy Posts: 4,466member
    Quote:

    Originally posted by Eugene

    G5 binaries built with XLC (-O5 -qtune=g5 -qarch=g5):



    http://www.eugenechan.com/~ceugene/g5factorial.tgz




    [localhost:~] lundy% /Users/lundy/Desktop/g5factorial/tf-g5-O5

    Creating Thread Number: 0

    Creating Thread Number: 1

    Loop Done; Time=9 secs for thread#:1, Loops=25000000

    Loop Done; Time=9 secs for thread#:0, Loops=25000000





    It doesn't crash on my G4, with -O3 or -O5.



    I found no compiler option in the reference manual pages of the XLC for 64-bitness. Enabling -qlonglong makes no difference either.



    Hmm.
  • Reply 64 of 146
    gabidgabid Posts: 477member
    Quote:

    Originally posted by Eugene

    G5 binaries built with XLC (-O5 -qtune=g5 -qarch=g5):



    http://www.eugenechan.com/~ceugene/g5factorial.tgz




    Sorry to harass the UNIX masters again, but do I need the IBM complier to run these? When I try to (and I don't think I need to "chmod 755" since it looks like the files are executable) with "./" i get the message:



    Quote:

    dyld: ./utf-g5-O5 can't open library: /opt/ibmcmp/lib/libxlsmp.dylib (No such file or directory, errno = 2)

    Trace/BPT trap



    C'mon, let's see if we can get a G5 down to 9 secs !
  • Reply 65 of 146
    One point of confusion on my part:



    The IBM XLC manual says that "cc" is an alternate invocation command of "xlc"



    How do we know if we are invoking the IBM XLC or the GNU compiler when both are resident on the same computer?
  • Reply 66 of 146
    lundylundy Posts: 4,466member
    Quote:

    Originally posted by Gabid

    Sorry to harass the UNIX masters again, but do I need the IBM complier to run these? When I try to (and I don't think I need to "chmod 755" since it looks like the files are executable) with "./" i get the message:







    C'mon, let's see if we can get a G5 down to 9 secs !




    I booted back into Jagwyre to be able to use the IBM compiler, since it requires gcc 3.3 (or so it says). Since the binary runs on my G4, (and I specified a 970 cross-compile), I am suspecting that the 64-bit code isn't in there.



    I should be able to boot back into Panther and take a look with Shark.
  • Reply 67 of 146
    Hmm, here at work I ran this code using Visual C++'s compilier.



    I'm getting: 14 sec.







    Seems too good for a Thinkpad PIII 800MHz, running Win98, 192MB RAM.



    I think I might have shortend the math by changing the "long long" variables to just "long". I had to do that because VC++ told me that "long long" was illegal.



    That has to be it. Right?
  • Reply 68 of 146
    Quote:

    Originally posted by Transcendental Octothorpe

    I think I might have shortend the math by changing the "long long" variables to just "long". I had to do that because VC++ told me that "long long" was illegal.



    Yes, that will speed it up, but it won't accurately calculate the factorial, as the result's too large for a 32-bit "long"...



    -- Mark
  • Reply 69 of 146
    Soooo...



    Anyone know how to create a 64-bit integer in VC++?
  • Reply 70 of 146
    addisonaddison Posts: 1,185member
    This is a great thread but the more I read the more I am convinced that the G5 isn't performing anywhere near it potential. Something isn't quite right with the way it is running applications it must be either the compiler or the OS or both.



    I am sure that when Panther is released we are going to see a series of performance boosts that are going to move the G5's further from the G4's. I know that Panther is approaching GM, but from what we can see I believe that there are going to be a LOT of point increases just to help the G5. I expect that all the applications we use will also need a re-compile to take advantage of those optomisations and that means it is likely to be a couple of years before the G5 is going to be really on song for most of us.
  • Reply 71 of 146
    gabidgabid Posts: 477member
    Quote:

    Originally posted by lundy

    I booted back into Jagwyre to be able to use the IBM compiler, since it requires gcc 3.3 (or so it says). Since the binary runs on my G4, (and I specified a 970 cross-compile), I am suspecting that the 64-bit code isn't in there.



    I should be able to boot back into Panther and take a look with Shark.




    All of this excitement for a lousy 9 seconds ! Of course, I really hope that someone can pull off a G5-optimised version. I'm very curious if it will run faster.



    Knowing nothing about the technical side of things, maybe this all suggests that optimising for the G5 isn't all that easy or reliable as of yet.
  • Reply 72 of 146
    yevgenyyevgeny Posts: 1,148member
    Quote:

    Originally posted by Transcendental Octothorpe

    Soooo...



    Anyone know how to create a 64-bit integer in VC++?




    __int64 is the name of the type, yes that is two underscores before int64. Welcome to hell.
  • Reply 73 of 146
    yevgenyyevgeny Posts: 1,148member
    Quote:

    Originally posted by Addison

    This is a great thread but the more I read the more I am convinced that the G5 isn't performing anywhere near it potential. Something isn't quite right with the way it is running applications it must be either the compiler or the OS or both.



    This is a VERY basic benchmark. It only tests how quickly the CPU can go through a loop and do some basic math. In the real world, the ability to transfer info from main memory to the CPU is the real bottleneck, and the G5 will kick rear in that area. Besides, the G5 will scale much better than the G4 in terms of CPU speed.



    Besides, we don't have any dual G5 scores yet.
  • Reply 74 of 146
    Quote:

    Originally posted by Addison

    This is a great thread but the more I read the more I am convinced that the G5 isn't performing anywhere near it potential.



    Why do you say that? It's faster per-cycle than the G4 on this very simple benchmark, which doesn't take advantage of its greatly improved bandwidth.
  • Reply 75 of 146
    Quote:

    Originally posted by Yevgeny

    __int64 is the name of the type, yes that is two underscores before int64. Welcome to hell.



    Ahh, thank you, kind sir.



    With that change, I am now getting 32 seconds.



    Which still seems pretty good, for an 800MHz PIII.
  • Reply 76 of 146
    eugeneeugene Posts: 8,254member
    Quote:

    Originally posted by Gabid

    Sorry to harass the UNIX masters again, but do I need the IBM complier to run these? When I try to (and I don't think I need to "chmod 755" since it looks like the files are executable) with "./" i get the message



    Sorry, should be fixed now.
  • Reply 77 of 146
    gabidgabid Posts: 477member
    Quote:

    Originally posted by Eugene

    Sorry, should be fixed now.



    Thank you. I'll be home in an hour and a half or so and I'll try running it then.
  • Reply 78 of 146
    yevgenyyevgeny Posts: 1,148member
    Quote:

    Originally posted by Transcendental Octothorpe

    Ahh, thank you, kind sir.



    With that change, I am now getting 32 seconds.



    Which still seems pretty good, for an 800MHz PIII.




    Actually, that is kind of fast considering that the 2.8GHz P4 was finishing in something like 20 seconds.
  • Reply 79 of 146
    lundylundy Posts: 4,466member
    Quote:

    Originally posted by mark_wilkins

    Yes, that will speed it up, but it won't accurately calculate the factorial, as the result's too large for a 32-bit "long"...



    -- Mark




    Yep. That's why it was written for a long long result and argument.





    16! = 20922789888000

    2 ^ 32 = 4294967296

    2 ^ 64 = 18446744073709550000



    OK, now with Shark I looked at the assembly code that was generated by Xcode (gcc 3.3, -mcpu=970 -mtune=970 -O3 -mpowerpc64)



    Code:




    factorial.c statement 24, "return (value * factorial (value - 1))":

    0x1c84subic r4,r31,1

    3.3%0x1c88addme r3,r30

    0x1c8cbl $-56 <factorial>

    10.9%0x1c90mullw r5,r31,r3

    10.9%0x1c94mulhwu r0,r31,r4

    5.3%0x1c98mullw r3,r4,r30

    3.7%0x1c9cmullw r10,r31,r4

    0.0%0x1ca0add r4,r5,r0

    0x1ca4add r9,r4,r3

    5.1%0x1ca8mr r3,r9

    0x1cacmr r4,r10









    The "mullw" opcode is Multiply Low Word, a 32-bit multiply!!





    There's a "-fast" switch documented in yesterday's Updater note. I'm off to try that.
  • Reply 80 of 146
    stoostoo Posts: 1,490member
    Quote:

    Ahh, thank you, kind sir.



    Daarrr! I keep forgetting that long is the same size as int on x86 (4 bytes).



    Using __int 64 I get a much more respectable 20s from my Duron 1300, still using Visual Studio 98.



    Which seems a bit too fast.
Sign In or Register to comment.