Post your speeds - Calculate 50 Million Factorials

123457

Comments

  • Reply 121 of 146
    lundylundy Posts: 4,466member
    Quote:

    Originally posted by Programmer

    Okay guys, try this.



    Won't compile for me with the default flags and Xcode.



    cd /Users/lundy/Programming/vectors

    /Users/lundy/Programming/vectors/main.c:7: error: parse error before "fixed64"

    /Users/lundy/Programming/vectors/main.c:8: error: syntax error before '{' token

    /Users/lundy/Programming/vectors/main.c:11: error: parse error before ':' token

    /Users/lundy/Programming/vectors/main.c:12: warning: type defaults to `int' in declaration of `fixed64'

    /Users/lundy/Programming/vectors/main.c:12: error: parse error before '&' token

    /Users/lundy/Programming/vectors/main.c:13: error: parse error before '&' token

    /Users/lundy/Programming/vectors/main.c:14: error: parse error before '&' token

    /Users/lundy/Programming/vectors/main.c:15: error: parse error before '&' token

    /Users/lundy/Programming/vectors/main.c:16: error: parse error before '&' token

    /Users/lundy/Programming/vectors/main.c:17: error: parse error before ':' token

    /Users/lundy/Programming/vectors/main.c:21: error: parse error before "operator"

    /Users/lundy/Programming/vectors/main.c: In function `TransformVectors':

    /Users/lundy/Programming/vectors/main.c:37: error: `for' loop initial declaration used outside C99 mode

    /Users/lundy/Programming/vectors/main.c: In function `main':

    /Users/lundy/Programming/vectors/main.c:58: error: ` for' loop initial declaration used outside C99 mode

    /Users/lundy/Programming/vectors/main.c:70: error: `for' loop initial declaration used outside C99 mode

    /Users/lundy/Programming/vectors/main.c:81: warning: int format, long int arg (arg 2)

    \t/Users/lundy/Programming/vectors/main.c:7: error: parse error before "fixed64"

    \t/Users/lundy/Programming/vectors/main.c:8: error: syntax error before '{' token

    \t/Users/lundy/Programming/vectors/main.c:11: error: parse error before ':' token

    \t/Users/lundy/Programming/vectors/main.c:12: warning: type defaults to `int' in declaration of `fixed64'

    \t/Users/lundy/Programming/vectors/main.c:12: error: parse error before '&' token

    \t/Users/lundy/Programming/vectors/main.c:13: error: parse error before '&' token

    \t/Users/lundy/Programming/vectors/main.c:14: error: parse error before '&' token

    \t/Users/lundy/Programming/vectors/main.c:15: error: parse error before '&' token

    \t/Users/lundy/Programming/vectors/main.c:16: error: parse error before '&' token

    \t/Users/lundy/Programming/vectors/main.c:17: error: parse error before ':' token

    \t/Users/lundy/Programming/vectors/main.c:21: error: parse error before "operator"

    \t/Users/lundy/Programming/vectors/main.c:37: error: `for' loop initial declaration used outside C99 mode

    \t/Users/lundy/Programming/vectors/main.c:58: error: `for' loop initial declaration used outside C99 mode

    \t/Users/lundy/Programming/vectors/main.c:70: error: `for' loop initial declaration used outside C99 mode
  • Reply 122 of 146
    Quote:

    Originally posted by lundy

    Won't compile for me with the default flags and Xcode.





    Make sure you give it the extension ".cpp" -- it is C++ code, not C code.
  • Reply 123 of 146
    lundylundy Posts: 4,466member
    Quote:

    Originally posted by Programmer

    Make sure you give it the extension ".cpp" -- it is C++ code, not C code.





    D'oh!
  • Reply 124 of 146
    Somebody want to try this on a G5, please? Remember, save as a .cpp file.
  • Reply 125 of 146
    Quote:

    Originally posted by Programmer

    Somebody want to try this on a G5, please? Remember, save as a .cpp file.



    On a dual G5 I got a result of 46. On a 500 MHz G3 PowerBook I got a result of 425. I optimized for speed.
  • Reply 126 of 146
    Quote:

    Originally posted by Tidris

    On a dual G5 I got a result of 46. On a 500 MHz G3 PowerBook I got a result of 425. I optimized for speed.



    Ok, here is a more detailed report.



    fixed64 typedef: 48

    float typedef: 47

    double typedef: 47 (sometimes 46)



    That was on a dual G5.
  • Reply 127 of 146
    Hmmm, that's interesting... can somebody send me their G5 so I can run a profile on the code and see why it isn't significantly faster than my G4 running at half the clock rate? Thanks.
  • Reply 128 of 146
    lundylundy Posts: 4,466member
    Quote:

    Originally posted by Programmer

    Somebody want to try this on a G5, please? Remember, save as a .cpp file.



    The first segment of the Xcode disk image is corrupt - at least for me it crashes Disk Utility. Another dude confirmed this.



    Don't want to go back to Project Builder and the only way to get gcc 3.3 on Panther is to install Xcode.
  • Reply 129 of 146
    Quote:

    Originally posted by Tidris

    Ok, here is a more detailed report.



    fixed64 typedef: 48

    float typedef: 47

    double typedef: 47 (sometimes 46)



    That was on a dual G5.




    The floating point numbers improved somewhat by turning off profiling and using -O3 instead of -fast:



    float typedef: 42

    double typedef: 40



    I am using OSX 10.2.7, ProjectBuilder 2.1, gcc-3.3, in case that matters.
  • Reply 130 of 146
    How about testing it with the XLC++-compiler?
  • Reply 131 of 146
    Quote:

    Originally posted by Zapchud

    How about testing it with the XLC++-compiler?



    I have been experimenting with that this morning but I must be doing something wrong because the result is worse than with gcc-3.3. For example, for the double typedef the best I can get with xlc is 45 versus 40 with gcc-3.3.
  • Reply 132 of 146
    lundylundy Posts: 4,466member
    Quote:

    Originally posted by Tidris

    I have been experimenting with that this morning but I must be doing something wrong because the result is worse than with gcc-3.3. For example, for the double typedef the best I can get with xlc is 45 versus 40 with gcc-3.3.



    59 seconds on the dual G5 EDIT: with "double".

    42 seconds with G5 optimization.

    50 seconds with long long ints.



    I made a custom Build Style in Xcode called G5-Optimized, but Xcode shows this on the detail line:



    Building target ?vectors? with build style ?G5-Optimized? (optimization:level ?size?, debug-symbols:on)





    Optimization level:"size"???



    Anybody else getting anything different? Xcode is really not that easy to get a handle on. A simple menu with the choices would be easier, for chrissakes.
  • Reply 133 of 146
    cubedudecubedude Posts: 1,556member
    Quote:

    ryan% ./unthreaded_factorial

    Start: 1064696504 End: 1064696580



    i= 50000001

    Time=76



    Quote:

    ryan% ./threaded_factorial

    Creating Thread Number: 0

    Creating Thread Number: 1

    Loop Done; Time=89 secs for thread#:0, Loops=25000000

    Loop Done; Time=89 secs for thread#:1, Loops=25000000



    G4 Cube 450
  • Reply 134 of 146
    Quote:

    Originally posted by Tidris

    The floating point numbers improved somewhat by turning off profiling and using -O3 instead of -fast:



    float typedef: 42

    double typedef: 40



    I am using OSX 10.2.7, ProjectBuilder 2.1, gcc-3.3, in case that matters.




    I was experimenting with this again tonight and I found that if I change num_vectors from 4096 to 4094 or 4098, the results become:



    fixed64 typedef: 28 seconds

    float typedef: 12 seconds

    double typedef: 12 seconds



    I used the -fast option for gcc-3.3. That is very weird!



    Edit:



    Another way to get similary fast results is to leave num_vectors at 4096 but make the size of the va and vb arrays be 4098.



    Edit:



    Changing the declaration of m, va, vb to be as follows also does the trick:



    numerictype va[num_vectors][4];

    numerictype m[4][4] = {{0.1,0.2,0.3,0.0},{0.7,0.8,0.9,0.0},{0.4,0.5,0.6, 0.0},{0.3,0.1,0.6,1.0}};

    numerictype vb[num_vectors][4];
  • Reply 135 of 146
    Quote:

    Originally posted by Tidris

    I was experimenting with this again tonight and I found that if I change num_vectors from 4096 to 4094 or 4098, the results become:



    fixed64 typedef: 28 seconds

    float typedef: 12 seconds

    double typedef: 12 seconds



    I used the -fast option for gcc-3.3. That is very weird!




    Here are additional double typedef results with other num_vector values:



    1023 vectors: 3 seconds

    1024 vectors: 3 seconds

    1025 vectors: 3 seconds



    2046 vectors: 6 seconds

    2047 vectors: 8 second

    2048 vectors: 24 seconds

    2049 vectors: 8 seconds

    2050 vectors: 7 seconds



    This smells a lot like a compiler bug...
  • Reply 136 of 146
    Quote:

    Originally posted by Tidris

    This smells a lot like a compiler bug...



    Or a limitation of the G5's L1 cache "way-ness". I deliberately sized those arrays to overflow the L1 cache by about 2x, so I don't want to shrink them. Lets leave the number of array entries unchanged but use your version with the re-ordered data arrays, which seems to avoid the problem on the G5. It would be interesting if somebody could use the CHUD tools to verify the source of the problem, however.



    My 1 GHz dual MDD G4 does this test (regardless of declaration order) in approximately:



    fixed64 372

    float 56

    double 59



    The numbers you posted for your G5 are pretty much exactly what I would have expected for a 2 GHz G5. The fixed64 number is especially interesting since it shows a 13.3x speedup thanks to the 970 being a 64-bit processor. The float numbers show both the clock rate doubling plus twice the number of FPUs, plus a bit more due to better FPU resources.



    Your fiddling with the code having such a huge impact does demonstrate how fragile performance on high speed processors can be, and why code should be profiled.
  • Reply 137 of 146
    Quote:

    Originally posted by Tidris

    I was experimenting with this again tonight and I found that if I change num_vectors from 4096 to 4094 or 4098, the results become:



    fixed64 typedef: 28 seconds

    float typedef: 12 seconds

    double typedef: 12 seconds



    I used the -fast option for gcc-3.3. That is very weird!





    If you liked those numbers, you'll like these even more:



    float typedef: 7 seconds

    double typedef: 7 seconds



    I got those with num_vectors at 4096 by changing TransformVectors() to be as follows:



    void TransformVectors (unsigned int count, numerictype m[4][4], numerictype in[][4], numerictype out[][4])

    {

    \tnumerictype m00=m[0][0], m01=m[0][1], m02=m[0][2], m03=m[0][3];

    \tnumerictype m10=m[1][0], m11=m[1][1], m12=m[1][2], m13=m[1][3];

    \tnumerictype m20=m[2][0], m21=m[2][1], m22=m[2][2], m23=m[2][3];

    \tnumerictype m30=m[3][0], m31=m[3][1], m32=m[3][2], m33=m[3][3];

    \t

    \tfor (unsigned int i = 0; i < count; ++i)

    \t{

    \t#if 0

    // Old slow way.

    \t out[i][0] = m00*in[i][0]+m01*in[i][1]+m02*in[i][2]+m03*in[i][3];

    \t out[i][1] = m10*in[i][0]+m11*in[i][1]+m12*in[i][2]+m13*in[i][3];

    \t out[i][2] = m20*in[i][0]+m21*in[i][1]+m22*in[i][2]+m23*in[i][3];

    \t out[i][3] = m30*in[i][0]+m31*in[i][1]+m32*in[i][2]+m33*in[i][3];

    \t#else

    // New fast way.

    \t numerictype out0 = m00*in[i][0]+m01*in[i][1]+m02*in[i][2]+m03*in[i][3];

    \t numerictype out1 = m10*in[i][0]+m11*in[i][1]+m12*in[i][2]+m13*in[i][3];

    \t numerictype out2 = m20*in[i][0]+m21*in[i][1]+m22*in[i][2]+m23*in[i][3];

    \t numerictype out3 = m30*in[i][0]+m31*in[i][1]+m32*in[i][2]+m33*in[i][3];

    \t out[i][0] = out0;

    \t out[i][1] = out1;

    \t out[i][2] = out2;

    \t out[i][3] = out3;

    \t#endif

    \t}

    }



    The idea behind the change is to avoid intermixing accesses to the input and output arrays.
  • Reply 138 of 146
    lundylundy Posts: 4,466member
    Here is the multithreaded version of the original code. I just put a thread wrapper around the whole shebang.



    http://www.johnnylundy.com/MPvectors.cpp.zip





    I get 26 seconds on the dual G5:



    Creating Thread Number: 0

    Creating Thread Number: 1

    Loop Done; Time = 26 secs for thread#: 0, Loops = 50000



    Loop Done; Time = 26 secs for thread#: 1, Loops = 50000





    vectors has exited with status 0.
  • Reply 139 of 146
    Quote:

    Originally posted by Tidris

    If you liked those numbers, you'll like these even more:



    float typedef: 7 seconds

    double typedef: 7 seconds

    ...

    The idea behind the change is to avoid intermixing accesses to the input and output arrays.




    Wow, that's pretty good. Somebody want to run this on an Intel and an AMD so we can have a non-Mac frame of reference.



    Strange that the intermixing has such an effect, especially since its all in cache. I wonder if it is simply a matter of writing results to registers rather than forcing the compiler to write back to memory to avoid potential aliasing problems with the following math.



    EDIT: It helps the G4 substantially too -- from 59 down to 38.
  • Reply 140 of 146
    lundylundy Posts: 4,466member
    Quote:

    Originally posted by Programmer

    Wow, that's pretty good. Somebody want to run this on an Intel and an AMD so we can have a non-Mac frame of reference.



    Strange that the intermixing has such an effect, especially since its all in cache. I wonder if it is simply a matter of writing results to registers rather than forcing the compiler to write back to memory to avoid potential aliasing problems with the following math.



    EDIT: It helps the G4 substantially too -- from 59 down to 38.




    I'm slogging through the Programming Environments Manual for the PowerPC family - very very fascinating stuff.



    Little-endian is supported by a mode bit.



    But I can't seem to get the Xcode debugger (really gdb) to show me the product of C=A*B where all are long long ints, and compiler flags set for G5.



    Is it normal for gcc to give a warning on a source statement



    static long long int A=0xFFFFFFFFFFFFFFFF;



    that the constant is too big for a long int? Well duh, it's not a long int.
Sign In or Register to comment.