New IBM Power PC 1GHz chip

13»

Comments

  • Reply 41 of 60
    stimulistimuli Posts: 564member
    Ah So! I stand corrected. I always thought SETI/RC5 were heavy on vector calcs. Then again, I have no real idea what vector calcs are.



    edit: Waaaaiiiitasec... I think I know how i got so lost and confused... I'm thinking of the actual Integer units, not the Altivec Integer extensions. I've seen some ugly linux benchmarks comparing 800 mhz G4s versus 700mhz PIIIs and a 1.2Ghz Athlon.



    The G4 got beaten like a rented mule, which says a lot about PPC optimization under gcc.



    [ 02-06-2002: Message edited by: stimuli ]</p>
  • Reply 42 of 60
    programmerprogrammer Posts: 3,458member
    Vector calcs are just operations which are don't on a set of values as a unit... in AltiVec's case the registers are 128-bits, and it can deal with them as 4 x 32, 8 x 16, 16 x 8. The 32-bit values can be either integer or floating point. If you write code for the AltiVec unit (or manage to find an auto-vectorizing compiler), then the G4 rocks. Unfortunately for the G4, the SPECmark code is all straight C and no customization for vector units is allowed. The G4 isn't particuarly fast at non-vector calculations compared to the Athlon, and only about 10-25% faster than most Pentiums on a per clock basis. At vector calculations, however, it does quite well... whether they are floating point or integer. Plus vector code is easier to write for the AltiVec unit. On the PC you have to code it in assembler and decide between the MMX, SSE, SSE2, and 3DNow! instruction sets. What a mess, and none of them are even remotely as nice as AltiVec.
  • Reply 43 of 60
    stoostoo Posts: 1,490member
    AFAIK, the 603e did support MP, and was used in BeBoxes.
  • Reply 44 of 60
    powerdocpowerdoc Posts: 8,123member
    [quote]Originally posted by stimuli:

    <strong>Actually, powerdoc, that's 256 Kilobits, not Bytes. So your looking at (theoretically) 30+ GB (Bytes)/sec throughput at 1 Ghz. Nothing to sneeze at.



    Programmer, is that the latest G4s, or also the 7400/7410?</strong><hr></blockquote>

    The real 750 fx have a 256 bits data path and not a 256 000 data path (kilo equal 1024 to be more precise). You see why i sneeze.
  • Reply 45 of 60
    telomartelomar Posts: 1,804member
    [quote]Originally posted by powerdoc:

    <strong>

    (kilo equal 1024 to be more precise)</strong><hr></blockquote>



    Actually kilo has a defined standard definition of 10^3 and was just misused when initially applied to the computing industry. If you are saying kilo equals 1024 then you are getting it wrong at least as far as standards go.



    Initially there were no standard prefixes so they just used the closest they could fine, the SI prefixes, which are metric and defined as 10^x not 2^z. As it happens this doesn't matter much with small numbers but with larger ones the error gets quite large. As a result a few years back standards were introduced for the binary prefixes.



    For reference the standards define 1024 as kibi (KIloBInary) it just isn't widely adopted in practice. Of course that is the precise definitions still



    Edit: Just fixing the quote like I said I would :eek:



    [ 02-07-2002: Message edited by: Telomar ]</p>
  • Reply 46 of 60
    stimulistimuli Posts: 564member
    Actually, Telomar, he's pointing out my typo, ie: 256 bit data bus, not 256Kb.



    Doh!
  • Reply 47 of 60
    [quote]Originally posted by stimuli:

    <strong>Actually, powerdoc, that's 256 Kilobits, not Bytes. So your looking at (theoretically) 30+ GB (Bytes)/sec throughput at 1 Ghz. Nothing to sneeze at.</strong><hr></blockquote>



    Keep in mind that the PPC744x and PPC745x model G4s also have 256 bit wide cache interfaces, so it's not really an advantage for the G3.



    Bye,

    RazzFazz
  • Reply 48 of 60
    ccr65ccr65 Posts: 59member
    [quote] The G3 is based of of the 603e, which can not be set up as a MP system. <hr></blockquote>



    Actually I seem to remember talking to an AMIGA user about 3 years ago that was talking about his multiprocessor G3 machine. It was my undertanding that the limitation regarding MP and the G3 applied to specific OS's and that there were software workarounds in the BE OS and AMIGA OS to allow MP.
  • Reply 49 of 60
    amorphamorph Posts: 7,112member
    [quote]Originally posted by CCR65:

    <strong>



    Actually I seem to remember talking to an AMIGA user about 3 years ago that was talking about his multiprocessor G3 machine. It was my undertanding that the limitation regarding MP and the G3 applied to specific OS's and that there were software workarounds in the BE OS and AMIGA OS to allow MP.</strong><hr></blockquote>



    They're not software workarounds, they're hardware.



    One of the biggest issues in SMP systems is coherency.



    All modern processors cache data so that they don't have to go to (slow, high-latency) RAM every time they want something to work on. Now, say you have two processors A and B, and a value N stored in memory. A loads N and stashes it in cache to perform some operations on it. B loads N, caches it, changes the value of N and writes it out to memory (and into its own cache). A's copy of N is now "stale," and so A is going to start churning out incorrect results.



    This is a Bad Thing. Bad Things are Wrong.



    So, in order to make SMP work, somebody has to manage the overhead of ensuring that all the processors are working on current data. If a processor "can do SMP," what that means is that all (or enough) of the logic required to do that is on board the CPU itself: the 7400, for example, supported all the states necessary to keep the caches of up to 8 of its kind current, so all that's required, essentially, is to get (up to) 8 7400s onto a motherboard and talking to each other. They'll take care of the rest.



    The 750 (like the 603 and 604) does not support all the necessary states: It supports 3 out of 5. That doesn't mean that it's impossible to arrange it into an SMP configuration, it just means that there has to be some external logic on the motherboard keeping the CPU caches coherent, because the 750s aren't clever enough to do it themselves.



    [edit: You'd think English was my tenth language. Yeesh. -Amorph]



    [ 02-07-2002: Message edited by: Amorph ]</p>
  • Reply 50 of 60
    powerdocpowerdoc Posts: 8,123member
    [quote]Originally posted by stimuli:

    <strong>Actually, Telomar, he's pointing out my typo, ie: 256 bit data bus, not 256Kb.



    Doh! </strong><hr></blockquote>



    Yes i was only found that this error or typo was funny
  • Reply 51 of 60
    [quote]Originally posted by discstickers:

    <strong>



    Yup, and thats why the G5 will kick every ass it can find. </strong><hr></blockquote>



    Drat, I've only kicked 5 asses!



    Oh, I had "offensive content filtering on!



    **click**



    Ah, much better!
  • Reply 52 of 60
    telomartelomar Posts: 1,804member
    stimuli I wasn't correcting him on the kilobit error which was fine I was correcting him on his (kilo equal 1024 to be more precise), which is just plain wrong. To be precise is to say kilo equals 1000



    I probably should have just deleted the rest of the quote *goes back and does just that*
  • Reply 53 of 60
    eugeneeugene Posts: 8,254member
    To reaffirm what Telomar said...



    kilo = 10^3 = 1000

    kibi = 2^10 = 1024



    mega = 10^6 = 1000000

    mebi = 2^20 = 1048576



    giga = 10^9 = 1000000000

    gibi = 2^30 = 1073741824
  • Reply 54 of 60
    programmerprogrammer Posts: 3,458member
    [quote]Originally posted by Amorph:

    <strong>

    The 750 (like the 603 and 604) does not support all the necessary states: It supports 3 out of 5. That doesn't mean that it's impossible to arrange it into an SMP configuration, it just means that there has to be some external logic on the motherboard keeping the CPU caches coherent, because the 750s aren't clever enough to do it themselves.</strong><hr></blockquote>



    The 604 actually supported 4 of the states (MESI), and the 7400 added a 5th (MESRI, I think). The "R" refers to a mode where one CPU can send the data it has directly to the requesting CPU without updating memory, which has a huge impact on performance. The 604 didn't have the "R" and thus the processor with the requested data would stop the requestor, update memory, and then allow the requestor to read it back from memory.



    For the G3 there is another alternative -- the OS could arrange things so that the processors never need to touch the same memory except in a very controlled fashion (i.e. operating system code that is written specifically to deal with these issues, generally by doing a lot of cache flushing or read/writes to uncacheable memory). I don't know which approach Be and Amiga took.



    [ 02-07-2002: Message edited by: Programmer ]</p>
  • Reply 55 of 60
    [quote]Originally posted by mslee:

    <strong>The display layer of quartz is vector-based, so there is some benefit from having AltiVec there. </strong><hr></blockquote>



    If I understand you correctly, you're right, but for the wrong reasons :-)



    The "vector" in vector processing is not the same as the "vector" in vector graphics. In simple terms, what it means is the ability to apply the same instruction to multiple pieces of data at the same time, which is why it makes things like RGB-CMYK conversions fly, as well as things like the transparency effects in Quartz.
  • Reply 56 of 60
    [quote]Originally posted by Eugene:

    <strong>stimuli, what do you think RC5 is? RC5 is pure integer. If only people would take the effort to write AltiVec code into their apps...</strong><hr></blockquote>



    There's writing AltiVec code, and there's writing good AltiVec code. You can recompile apps fairly easily to take advantage of AltiVec, but to actually achieve good results, you need to get deep down and dirty in the code. That's one of the reasons why the applications that take best advantage of AltiVec are Apple ones - Apple knows the importance of hardcore coding for AltiVec.



    As for the putative 1GHz G3... when will people learn that IBM is great at preannouncing product that doesn't ship for a long time?



    The fastest shipping G3 is 700MHz. The fastest shipping G4 is 1GHz. Case closed.
  • Reply 57 of 60
    [quote]Originally posted by Eugene:

    <strong>To reaffirm what Telomar said...



    kilo = 10^3 = 1000

    kibi = 2^10 = 1024

    (...)

    </strong><hr></blockquote>



    So, does anyone actually care?



    I mean, have you ever seen the "*bi" prefix anywhere in real world use?



    Bye,

    RazzFazz
  • Reply 58 of 60
    telomartelomar Posts: 1,804member
    [quote]Originally posted by RazzFazz:

    <strong>



    So, does anyone actually care?



    I mean, have you ever seen the "*bi" prefix anywhere in real world use?



    Bye,

    RazzFazz</strong><hr></blockquote>



    *shrug* Doesn't matter if you care or not but if you want to be precise you have to use the correct prefixes and yeah I have seen them used. Just never by or for the general public.
  • Reply 59 of 60
    amorphamorph Posts: 7,112member
    [quote]Originally posted by Programmer:

    <strong>The 604 actually supported 4 of the states (MESI)</strong><hr></blockquote>



    If you read my sentence carefully, I didn't say how many the 604 supports. You're right, it supports MESI.



    [quote]<strong>and the 7400 added a 5th (MESRI, I think).</strong><hr></blockquote>



    I've usually seen it written as "MERSI." Easier to remember that way.



    [quote]<strong>The "R" refers to a mode where one CPU can send the data it has directly to the requesting CPU without updating memory, which has a huge impact on performance. The 604 didn't have the "R" and thus the processor with the requested data would stop the requestor, update memory, and then allow the requestor to read it back from memory.</strong><hr></blockquote>



    Thanks.



    On this subject, do you know of a resource that lays all this out? I only know enough to speak in very broad terms (e.g.: my previous post) and I'll be damned if I can find a document that lays out what all five states are and how they're significant.



    I learned quite a bit about how slick the MPX bus is, though.



    [quote]<strong>For the G3 there is another alternative -- the OS could arrange things so that the processors never need to touch the same memory except in a very controlled fashion (i.e. operating system code that is written specifically to deal with these issues, generally by doing a lot of cache flushing or read/writes to uncacheable memory). I don't know which approach Be and Amiga took.</strong><hr></blockquote>



    BeOS has a famously bizarre threading architecture, so I'd guess that they used that trick.



    [ 02-07-2002: Message edited by: Amorph ]</p>
  • Reply 60 of 60
    Well I'm nitpicking, but you said:



    <strong> [quote]

    The 750 (like the 603 and 604) does not support all the necessary states: It supports 3 out of 5.

    </strong><hr></blockquote>



    Which (to me) means the 750 is like the 604. Since you also said that the 750 supports 3 out of 5, the two statements together imply that the 604 also supports 3 out of 5. So I did read your statement carefully, and it was incorrect.





    I don't have a resource that describes the MERSI stuff (you're right, it is referred to that way, I had actually typed it both ways but couldn't remember which was correct). From memory:



    "M" = modified. This memory state represents a line in the cache which has been changed from what its value in memory was and it can only be held by one processor at a time on a give cache line. If somebody else asked for this cache line then it must be written to memory first.



    "E" = exclusive. This memory state represents a line in the cache that is only in this processor's cache, but has not been modified. If this cache line is modified then nobody else needs to give up their copy of it.



    "S" = shared. This memory state represents a line in the cache that is in at least one other cache at the same time. If it is modified locally the other caches must release it or update their copy. If it is modified elsewhere it must either be updated or released.



    "I" = invalid. This memory state represents an invalid line in cache -- i.e. unused and available to be filled. Cache lines usually get this way when they are shared and somebody else announces they are going to modify their copy of it.



    "R" = reserved (I think?). This memory state means the cache line is owned by the local cache and if you want to get it you have to ask the owner. This allows a processor to modify a cache line, but pass the new contents directly to a processor that wants to read it. If two processors want to modify the same cache line then you get into an ugly situation where ownership is passed back and forth repeatedly ("M" is actually worse because it gets sent back to memory and re-fetched each time).



    Hopefully I've remembered this stuff reasonably accurately. There are a bunch of other details about how it works (bus snooping, etc) but that's it in a nutshell.
Sign In or Register to comment.