New 970 Information

13

Comments

  • Reply 41 of 77
    airslufairsluf Posts: 1,861member
  • Reply 42 of 77
    [quote]Originally posted by Outsider:

    <strong>There have been technical problems in making motherboards that support more than 2 or in some cases 1 DIMM slot for DDR 400MHz motherboards. I don't see it being a problem for the processor to be throttled for the moment; having the memory bandwith not able to saturate the bus, the opposite of what we have today. The extra bandwidth can be used for other stuff like AGP bandwidth and PCI-X. DDR-II is said to alleviate the problems that DDR-I has at high MHz but who knows when that will show up.</strong><hr></blockquote>





    I think one of the major problems being that DDR400 isn't a JEDEC standard. The old timeline [could have been changed] was to go from DDR333 [2700] to the proposed DDR2 spec.



    If I were Apple, I'd hold off and wait until the DDR2 spec is final, and most major RAM makers are building it in volume.



    It isn't like they haven't done that before. Yes, lots of folk will complain, but quite frankly, revisions will come fast and furious, assuming the 970 doesn't run into problems. It always takes a revision to fully balance a system. I expect Apple will stick to DDR333 [prices for the RAM will be down] for the first box containing a 970, then 6+ months later, we'll see a new box, with an updated memory subsystem, based on whatever is fast and in quantity.



    the visigothe



    [ 12-14-2002: Message edited by: visigothe ]</p>
  • Reply 43 of 77
    powerdocpowerdoc Posts: 8,123member
    [quote]Originally posted by visigothe:

    <strong>





    revisions will come fast and furious, assuming the 970 doesn't run into problems.</strong><hr></blockquote>



    you are right but there is good chances that this chip will be ready in time.

    The core is the one of the power4 that have prooved it's efficiency even on SOI 0,18 (1,3 ghz)

    SOI 0,13 is already used by IBM with more than one year experience (at the date of release of the 970)



    The only unknown in this chip is the altivec unit. All the others technologies inside are not news : i don't know why but i am confident in that chip.

  • Reply 44 of 77
    [quote]Originally posted by Outsider:

    <strong>There have been technical problems in making motherboards that support more than 2 or in some cases 1 DIMM slot for DDR 400MHz motherboards. I don't see it being a problem for the processor to be throttled for the moment; having the memory bandwith not able to saturate the bus, the opposite of what we have today. The extra bandwidth can be used for other stuff like AGP bandwidth and PCI-X. DDR-II is said to alleviate the problems that DDR-I has at high MHz but who knows when that will show up.</strong><hr></blockquote>



    Yeah, I don't really expect to see Apple use DDR400 -- DDR-II will be a lot more bang for the buck, and a proper standard too. Late next year would be the earliest, IIRC.



    The "extra" bus bandwidth for non-memory operations is a little strange ... most of system has gone to DMA now, specifically so that the processor doesn't need to be interrupted to do the mundane I/O work. This means that data will never cross the 970's bus. The exception might be Quartz drawing across the AGP directly into video card memory, but this doesn't work well unless there is no blending happening across AGP... doable, but the existing code may not be written that way. Feeding 3D draw commands to across the AGP will certainly benefit, but again most of the recent optimizations are trying to avoid having the processor do the work.



    Its not a problem that the bottleneck will no longer be the processor bus, it just means we'll need to pay more attention to the memory subsystem again. We don't really know how fast Apple's current DDR333 implementation is because MPX can't exercise it fully.
  • Reply 45 of 77
    Note: This comment was posted earlier in the other thread/mess. I do think that the comment is relevent, just don't now how much.



    I would like to point out that we still don't know that much about the 970. And, correct me if I'm wrong, the 970 is a processor that is extremely dependent on a GREAT compilier. In that case I think that Apple has chose well. At least that is where I would want the performance ball to be at, in the compilier and not relying on brute force number crunching, ie. lots of heat. I can't wait for Hannabls' next installment. My point about the compilier is that it is easier and quicker to fix than the CPU.



    Ty
  • Reply 46 of 77
    [quote]Originally posted by Brendon:

    <strong>I would like to point out that we still don't know that much about the 970. And, correct me if I'm wrong, the 970 is a processor that is extremely dependent on a GREAT compilier. In that case I think that Apple has chose well. At least that is where I would want the performance ball to be at, in the compilier and not relying on brute force number crunching, ie. lots of heat. I can't wait for Hannabls' next installment. My point about the compilier is that it is easier and quicker to fix than the CPU.



    Ty</strong><hr></blockquote>



    I'm not sure that assertion is valid. A processor that's highly dependent on compilers is the Itanium. And the consequence of its compiler dependency is the fact that subsequent Itaniums run code compiled for the previous generation very poorly. In contrast, IBM has explained that the 970 runs previous code well. The chip brings a lot of improvements, but it's still a PPC, and that's the way I want it.



    In addition to its support of new 64-bit solutions, the 970 retains full native support for 32-bit applications. This not only protects 32-bit software investments, but provides these 32-bit applications with the same high-performance levels that it extends to 64-bit uses. This native, nonemulated, 32-bit support is not limited to application code, which runs unmodified.



    [ 12-14-2002: Message edited by: Big Mac ]</p>
  • Reply 47 of 77
    mr. memr. me Posts: 3,221member
    [quote]Originally posted by Brendon:

    <strong>Note: This comment was posted earlier in the other thread/mess. I do think that the comment is relevent, just don't now how much.



    I would like to point out that we still don't know that much about the 970. And, correct me if I'm wrong, the 970 is a processor that is extremely dependent on a GREAT compilier....



    Ty</strong><hr></blockquote>



    OK, you are wrong. You are confusing the Itanium with the PPC 970. In the case of the Itanium, Intel chose to move the intelligence from the processor to the compiler to achieve maximum performance.
  • Reply 48 of 77
    [quote]Originally posted by Mr. Me:

    <strong>



    OK, you are wrong. You are confusing the Itanium with the PPC 970. In the case of the Itanium, Intel chose to move the intelligence from the processor to the compiler to achieve maximum performance.</strong><hr></blockquote>



    I'm not confused. I drew my conclusion from this:



    <strong>"In the preceding section, I went into mind-numbing detail on the ins and outs of group formation and group dispatching in the 970. If you only breezed through the section and thought, "all of that seems like kind of a pain," then you got 90% of the point I wanted to make. Yes, it is indeed a pain, and that pain is the price the 970 pays for having both width and depth at the same time. The 970's big tradeoff is that it needs less logic to support its long pipeline and extremely wide execution core, but in return it has to give up a measure of granularity, flexibility, and control over the dispatch and issuing of its instructions. Depending on the makeup of the instruction stream and how the IOPs end up being arranged, the 970 could possibly end up with quite a few groups that are either mostly empty, partially empty, or stalled waiting for execution resources.



    So while the 970 may be theoretically able to accommodate a whopping 200 instructions in varying stages of fetch, decode, execution and completion, the reality is probably that under most circumstances a decent number of its valuable execution slots will be empty on any given cycle due to dispatch, scheduling, and completion limitations. The 970 makes up for this with the fact that it just has so many available slots that it can afford to waste some on group-related pipeline bubbles. This is quite different from the P4, which aims at a leaner, more efficient design that pays meticulous attention to filling all of its available execution slots and to moving those slots across the CPU at as high a speed as process technology and power dissipation constraints will allow.



    Of course, you should understand that these conclusions are, as I said in the sub-heading above, quite preliminary. You can't really get a full picture of what the 970 offers until you examine its execution core and issue queues. The 970 offers twelve (depending on how you count them) execution units for doing the actual grunt work of executing instructions, and though twelve is a relatively large number, a simple enumeration of execution resources doesn't tell you nearly as much as an examination of how those resources are organized. "</strong>



    Seems to me that keeping the 970 as full as possible will help add more speed than just increasing MHz. 200 instructions in flight is great in and of itself but is much better if these instructions are productive and not just added filler to keep the group moving through. The Itanium problems are even more so, on a higher plane. Don't get me wwrong I think that this is a strength for the 970. As the compilier improves the performance of the 970 does as well. Using more of the 200 instructions.



    [ 12-15-2002: Message edited by: Brendon ]</p>
  • Reply 49 of 77
    powerdocpowerdoc Posts: 8,123member
    Brendon this quote come from ArsTechnica. Some people doubt from the strict neutrality of these analysis.

    However I think it's difficult even for specialist to guess what will be the impact of these group of five instructions for the compiler.



    However Ars Technica forget one thing when comparing the 970 to the P4. The 970 has a smaller pipeline (even if longer than the P4) and have the best prediction branch unit (according to their own analysis). Except the engineers from IBM nobody knows what is the penalty of having group of 5 instructions together.

    I should add that this penalty must be small, if this chip is a single core Power 4 with altivec.



    However some code optimization will be welcome as usual, but less than going from the g3 to the G4 or from the P3 to the P4.

    Let's say rughly that the code optimization will be similar from going to a 7400 to a 7455



    [ 12-15-2002: Message edited by: Powerdoc ]</p>
  • Reply 50 of 77
    wmfwmf Posts: 1,164member
    I wonder if we'll get xlc for OS X.
  • Reply 51 of 77
    [quote]Originally posted by Powerdoc:

    <strong>Brendon this quote come from ArsTechnica. Some people doubt from the strict neutrality of these analysis.

    However I think it's difficult even for specialist to guess what will be the impact of these group of five instructions for the compiler.



    However Ars Technica forget one thing when comparing the 970 to the P4. The 970 has a smaller pipeline (even if longer than the P4) and have the best prediction branch unit (according to their own analysis). Except the engineers from IBM nobody knows what is the penalty of having group of 5 instructions together.

    I should add that this penalty must be small, if this chip is a single core Power 4 with altivec.



    However some code optimization will be welcome as usual, but less than going from the g3 to the G4 or from the P3 to the P4.

    Let's say rughly that the code optimization will be similar from going to a 7400 to a 7455



    [ 12-15-2002: Message edited by: Powerdoc ]</strong><hr></blockquote>



    First thanks for pointing out my error of omitting not giving ARS due reference.



    Second thanks for clarifying the 7400 to 7455 difference.
  • Reply 52 of 77
    [quote]Originally posted by Powerdoc:

    <strong>

    However some code optimization will be welcome as usual, but less than going from the g3 to the G4 or from the P3 to the P4.

    Let's say rughly that the code optimization will be similar from going to a 7400 to a 7455

    </strong><hr></blockquote>



    Aside from AltiVec the G3-&gt;G4 transition didn't require much in the way of changed scheduling. The 7455 changed the G4's pipeline mix of instruction units and thus didn't require a new scheduler. The 970 will probably require about the same level of changes -- and IBM has already had a couple of years to work on their POWER4 instruction scheduler. If they haven't already enhanced GCC in the same way, I expect they will soon.



    The 486 -&gt; Pentium transition was quite significant, and the Pentium -&gt; Pentium II transition was at least as big. Pentium II -&gt; III was no change at all, except for SSE. The Pentium III -&gt; IV is arguably the biggest change I've ever seen. Worse yet, the Athlon has different requirements. x86 processors might all run the same code (ignoring MMX, 3DNow, SSE, SSE2) but making that code run optimally is significantly different across almost all of the processors... so much so that speeding up one slows down the others. The hasn't happened much on the PowerPC -- speeding up later chips doesn't usually slow down the older ones significantly.



    Itanium is a whole other ball'o'wax. The compiler has to be really good to get decent performance, and each successive core revision is expected to be dramatically different. Good thing for ISV's since they can sell new version of their software to you every time you update your hardware.
  • Reply 53 of 77
    nevynnevyn Posts: 360member
    [quote]Originally posted by wmf:

    <strong>I wonder if we'll get xlc for OS X.</strong><hr></blockquote>



    I'll pay my share. Should we start a petition/fundraising campaign?



    If you read the GCC lists you'll note that there's a fair number of Apple engineers submitting various bits to the mainline GCC. Although GCC has been brought to reasonable ppc codegen, I haven't heard anyone extol it as a paragon of blazing fast ppc code.



    xlc (IBM's internal compiler for POWER/ppc) on the other hand...
  • Reply 54 of 77
    powerdocpowerdoc Posts: 8,123member
    [quote]Originally posted by Brendon:

    <strong>



    First thanks for pointing out my error of omitting not giving ARS due reference.



    Second thanks for clarifying the 7400 to 7455 difference.</strong><hr></blockquote>

    Programmer answer to this question with his traditional accuracy.



    Deeper pipelining (from 4 to 7) , additional integer unit and a better BPU . As he pointed out to, the 970 will recquire the same level of optimization than the 7450 -55 required.



    When i refer to the G3 to G4 transition i was refering to Altivec (the G4 is rughly a G3 with an altivec unit and a better FPU unit).

    The transition from P4 to P3 was much bigger.
  • Reply 55 of 77
    outsideroutsider Posts: 6,008member
    <a href="http://www.siliconstrategies.com/story/OEG20021215S0001"; target="_blank">http://www.siliconstrategies.com/story/OEG20021215S0001</a>;



    According to this article IBM 90nm is ready NOW. I must assume FPGA chips are easier to fab than normal chips.
  • Reply 56 of 77
    vvmpvvmp Posts: 63member
    I'm wondering if this info from news.com applies to the 970 as well. Could this mean the 130-nanometer version will be here earlier than expected?



    <a href="http://news.com.com/2100-1001-977894.html?tag=fd_top"; target="_blank">http://news.com.com/2100-1001-977894.html?tag=fd_top</a>;



    Here's a sample of the article:

    IBM plans to announce Monday that it has passed another milestone on the road toward adopting an improved process for manufacturing semiconductors.



    IBM plans to announce Monday that it has passed another milestone on the road toward adopting an improved process for manufacturing semiconductors.

    The company will announce that it has begun work on its first 90-nanometer chip, a "field programmable gate array" processor that Big Blue will manufacture for chip designer Xilinx.





    Moving to a 90-nanometer chipmaking process means smaller transistors, which in turn means more transistors on a chip and better performing chips. IBM's current generation of semiconductors is built using a 130-nanometer manufacturing process--the nanometer measurement refers to the average size of features inside chips, such as transistors and the interconnects that link them.
  • Reply 57 of 77
    outsideroutsider Posts: 6,008member
    CONFIRMED! MOSR says no new PowerMacs at MWNY so there MUST be new PowerMacs at the expo! w00t!
  • Reply 58 of 77
    Interesting articles. I followed a chain of links to this article:



    <a href="http://news.com.com/2100-1001-954456.html"; target="_blank">http://news.com.com/2100-1001-954456.html</a>;



    Where I found this quote from a head Intel researcher that made me laugh:



    "We're heading into a newer area of architectural design. We've run out of ideas and are now forging ahead with completely new ideas."



    I'm sure he didn't mean it literally, but it certainly makes it sound like IBM is in a better position since they have clearly been coming up with new ideas all along.
  • Reply 59 of 77
    [quote]Originally posted by Outsider:

    <strong>CONFIRMED! MOSR says no new PowerMacs at MWNY so there MUST be new PowerMacs at the expo! w00t!</strong><hr></blockquote>



    Are you having a silly moment, Outsider? There haven't been new PowerMacs at a MacWorld for at least a year and a half. They seem to arrive about a month later.
  • Reply 60 of 77
    snoopysnoopy Posts: 1,901member
    [quote]Originally posted by Programmer:

    <strong>



    Where I found this quote from a head Intel researcher that made me laugh:



    "We're heading into a newer area of architectural design. We've run out of ideas and are now forging ahead with completely new ideas."



    </strong><hr></blockquote>



    Could he be saying they are out of ideas to get better performance from the x86 type chips, so they are looking at a change of direction? Just a thought.
Sign In or Register to comment.