Why go all Dualies in the PM line?

Posted:
in Future Apple Hardware edited January 2014
An awful lot of people are whinning about Apple getting faster CPU's. There's a thread here about future speedbumps. Well, if a 1.0 GHz PPC 7455 is already data starved on a 167 MHz bus between it and the system controller that stands between it and the main memory, a 1.25 GHz PPC 7455 will be even more data starved.



So, speed bumps are almost pointless when it comes to the MAJOR performance tests: animation, 2d/3d rendering, DVD rendering, etc. You just get a snappier system for processing smaller sized data operations.



As for big jobs such as rendering and all, your overall system performance is ultimately measured by the overall data throughput, which is limited to the weakest link along the data processing chain. In this case, it?s the FSB interface on the PPC 7455 (now 1.3 GB/s thanks to a little over-clocking, supposedly).



Well now, if moving from a 1.0 GHz to a 1.25 GHz CPU bears little improvement (all else being equal FSB-wise) adding a second CPU that feeds from the same bus should make little difference as well, other than the fact that the 2nd CPU can store some instructions and a small quantity of frequently used data in its cache to speed up small jobs. The big jobs are still choked at the bottleneck.



So, what does Apple gain by adding a second processor to each PowerMac? A single 1.0 GHz PPC 7455 can attain a maximum THEORETICAL throughput of about 1.56 GB/s, which is more than the 1.3 GB/s bottleneck!!! So, if Motorola has or can rent some 130 nanometer capacity, we could get another incremental speedbump. But so what!



That is why the Xserve multimedia performance tests were not much of an improvement if any because the Xserve has the 133 MHz connection (1.05 GB/s ?) to the system controller just as the older PM line had. Whereas the newer upper two PM's have the 167 MHz connection with 1.3 GB/s.



The second CPU does improve performance but generally for smaller jobs or some large jobs where the extra L3 cache can minimize the number of reads to main memory for instructions.



What we OBVIOUSLY need is for the CPU's to be able to RECEIVE data at a higher rate. This will NOT be solved with a process improvement//die shrink as discussed in the 'future speed bumps' thread!



Overall performance requires a CPU design change. The president from Motorola Canada was quoted two (?) weeks ago as saying that Motorola has ceased all development in support of Apple CPU's. I presume development refers to DESIGN. So, if this is in fact true, what a bummer!



Where are we going to get a PPC 74XX design change IF Motorola has stopped? IBM? Doubt it, but possible. Apple? Doubt it more, but possible.



It looks like we're stuck with relatively slow overall SYSTEM performance until Apple employs a different CPU from another provider such as IBM's son of Power4 that I like to refer to as "Junior". How far along is IBM?



So, again, what does Apple gain MEANWHILE by going all dualies? It can't substantially improve performance on the big jobs compared to a single PPC 7455 that already is data starved (unless each CPU could be placed on separate buses and still coordinate w/ each other). Why increase unit cost for so-so improvements in system performance?



How does Apple contract Motorola (or IBM for that matter) for CPU's? Does Apple committ to a fixed number of CPU's during a time period? How variable is it and under what circumstances can Apple adjust it or stop? Could Apple be required to purchase a certain number of CPU?s from Motorola before it could make a change in vendors either due to contract or simply Apple?s cash flow?



Alternatively, could Apple be trying to dramatically increase CPU volume so that Motorola, IBM, or whoever would be more interested and willing to employ greater resources on Apple?s behalf?



I don?t think the answer to why Apple went all dualie is just marketing because a handful of independent performance tests can kill that in an instant.



I don?t know the answer. But, it might tell us much about where Apple is headed? <img src="graemlins/surprised.gif" border="0" alt="[Surprised]" />
«13

Comments

  • Reply 1 of 43
    gfeiergfeier Posts: 127member
    [quote]Originally posted by Eirik Iverson:

    <strong>...

    I don’t think the answer to why Apple went all dualie is just marketing because a handful of independent performance tests can kill that in an instant.

    </strong><hr></blockquote>



    I think that pretty much proves that it is marketing. I guess it just makes people feel a lot better to have 2.5GHz of CPUs in their box. Always remember that Apple is a corporation first and will only survive as long as there is a demand for its products.
  • Reply 2 of 43
    I have trouble accepting that the decision to go all duallie is a mostly marketing decision. Marketing is mostly about strategy as opposed to tactics, particularly when it comes to product position. Let's not confuse marketing with sales as so many, many people do.



    Tactics are short term actions in support of an overall strategy that generally can last a while, even in the computer industry. One crafts a strategy, including product(s) position, to guide the tactics through the duration. A sound market strategy and hence a product positioning should not be readily derailed by sales tactics.



    Well, if Apple registered on Dell's radar, Dell could simply publish an array of independent, real-world performance tests that show Dell's indisputable (?) advantage over Apples dualies.



    Given that Dell has more than twice the system throughput for data processing because of its FSB advantage, Dell and the others have an advantage they can sustain for many months until Apple gets a re-designed CPU.



    Remember, Apple is trying to "switch" WinDon't users. That requires altering skeptical minds that are receptive to believing negative things about switching to Apple. They'd readily accept Dell publications of independent performance tests.



    Apple publishing things such as 'up to 90% faster on Photoshop' takes huge gonades considering the throughput advantage of the x86 architecture. I suspect the particular filters or what that the PM's excel at do not require sustained throughput (?) or whatever benefits from high throughput from main memory to CPU(s).



    Anyway, I fear we'll see independent performance tests that still show Apple losing big-time in the majority of different tests. Then Apple is faced with higher costing units and I hope you know what that means if they don't sell like hotcakes.



    edit:

    If Apple is just trying to retain its existing base of PM users, then marketing makes sense!



    Eirik



    [ 08-15-2002: Message edited by: Eirik Iverson ]</p>
  • Reply 3 of 43
    Well, regardless of what some may think, the new PowerMacs are selling like hotcakes. The Apple Store near me got in a few of the 2 low end models Tuesday morning and by Wednesday they were almost sold out. There is a big demand for these machines.
  • Reply 4 of 43
    yevgenyyevgeny Posts: 1,148member
    Altivec is what can easily use up all of the available bus bandwidth. Contrary to Apple propaganda, not everything uses altivec. I believe that dual 1GHz CPU's can also use up all the available bandwidth, but this doesn't mean that dual 1GHz CPU's always use up the available bandwidth. It depends on what the CPU's are doing in their calculations. So, even though the bus is slow, there are situations where both processors are rather busy (editing cached data, etc).



    Overall, duals across the line is a good idea, and hopefully it will stay for some time to come.
  • Reply 5 of 43
    hmurchisonhmurchison Posts: 12,419member
    Dual Proc Savvy apps certainly show a difference so Realword results tell us Duallies have tangible benefits.
  • Reply 6 of 43
    programmerprogrammer Posts: 3,457member
    [quote]Originally posted by Yevgeny:

    <strong>Altivec is what can easily use up all of the available bus bandwidth. Contrary to Apple propaganda, not everything uses altivec. I believe that dual 1GHz CPU's can also use up all the available bandwidth, but this doesn't mean that dual 1GHz CPU's always use up the available bandwidth. It depends on what the CPU's are doing in their calculations. So, even though the bus is slow, there are situations where both processors are rather busy (editing cached data, etc).



    Overall, duals across the line is a good idea, and hopefully it will stay for some time to come.</strong><hr></blockquote>





    Plenty of scalar algorithms are memory bandwidth bound as well, not just AltiVec algorithms. More than you'd think because most scalar algorithms haven't had their memory bandwidth carefully optimized (this is one of the key AltiVec optimizations, after all).



    No, the reason is that not all tasks are memory bound! Sure we moan and complain about the lack of memory bandwidth which limits how fast certain apps will run... but really those are fairly specific tasks. There is all sorts of other activity in the system, and there are all sorts of other apps in use which can quite happily run out of the big & fast L2/L3 caches of these machines (500 MHz L3!!). So while one processor is blasting away at maximum bandwidth, the other processor can be happily doing the dozens of other things that need doing ... running the GUI, the file system, the network, your browser, your chat window, etc etc.



    People get so focussed on the bleeding edge of performance that they forget this typically represents quite a small piece of the software on their computer. Having a second processor means that a lot of this stuff doesn't interrupt that one task you are working on which does need maximum bandwidth.



    There are also some tasks (although rare) which can use all of the power of 2 GHz AltiVec and not be memory bound.
  • Reply 7 of 43
    Programmer,



    You did a better job at making a point I tried to make: dualies will make your system snappier, more responsive.



    The 'switch' marketing challenge, however, is showing your prospects this snappiness. I think it would be wise of Apple to create new tests for others to independently verify that feature many applications in operation for typical users that QUANTIFY the snappiness and compare this to the x86 competition.
  • Reply 8 of 43
    [quote]That is why the Xserve multimedia performance tests were not much of an improvement if any because the Xserve has the 133 MHz connection (1.05 GB/s ?) to the system controller just as the older PM line had. Whereas the newer upper two PM's have the 167 MHz connection with 1.3 GB/s. <hr></blockquote>



    Umm, the problem is that these were "Multimedia" benchmarks--and the Xserve was never intended to be a mutlimedia machine. Why would Apple optimize it for FPS frame-rates or Quicktime playback, when it's meant as a server--most often without even a video card?



    I've yet to see any good "sever" benchmarks done with the Xserver. As for dual processors, I work encoding DVD's and trust me--dual processors make a huge speed diffenrence their. Same for rendering. Faster memory and a faster FSB are great, but not every process on the computer relies on them.



    I hope Apple sticks with its guns this time and leaves the high-end machines dual.
  • Reply 9 of 43
    nitridenitride Posts: 100member
    Dual CPUs essentially give you twice as much processing capability. Its not a gimick, its a performance enhancement.



    Twice as many registers, twice the number of execution units (AltiVec, Integer, Float), more threads running at one time, twice as much L3 cache (where the CPU does a lot of its work) results in more power for the OS and the Apps.



    Going all dual makes sense with a UNIX based OS that is built for SMP and with Cocoa that can more easily thread apps than before (on the Mac).



    A single monster Ghz CPU is still limited by its L3 cache (if any), the number of registers and how many tasks it can perform per clock cycle.



    The G4 still has a very shallow pipeline than Pentiums so it can recover more quickly from a stall (or what Apple calls a bubble). And when a stall happens on one CPU the other CPU can keep working.



    There are REAL benefits to dual CPU (or more eventually) as there are with sub-processors for graphics and sound to offload work from the main CPU.



    If all you do is surf the web, download MP3s and play some games then you probably won't realize the full benefit.
  • Reply 10 of 43
    yevgenyyevgeny Posts: 1,148member
    [quote]Originally posted by Programmer:

    <strong>

    Plenty of scalar algorithms are memory bandwidth bound as well, not just AltiVec algorithms. More than you'd think because most scalar algorithms haven't had their memory bandwidth carefully optimized (this is one of the key AltiVec optimizations, after all).



    There are also some tasks (although rare) which can use all of the power of 2 GHz AltiVec and not be memory bound.</strong><hr></blockquote>



    My bad. I realize that I really poorly stated what you above stated- we're in agreement. For example, if data is being edited that is solely in the cache, then the bus is not saturated.



    Bus saturation can happen when the processor is just looping through a very large array and the bus must continually fetch new portions of the array. No altivec code is needed to overwhelm the bus in this case.



    I am sure that there are things that you can do with altivec that do not saturate the bus. In particular, if you take one vector and run several altivec functions on it (while the bus is fetching the next vector), then you won't saturate the bus. Of course, it is also easy to saturate the bus with normal altivec useage.



    The overall point is that on a modern OS, dual processors do offer a speed bump that a single processor of the same speed would not see. Apple's decision to put duals on each deskop is a good technical decision, not just clever marketing.
  • Reply 11 of 43
    yevgenyyevgeny Posts: 1,148member
    [quote]Originally posted by Nitride:

    <strong>Dual CPUs essentially give you twice as much processing capability. Its not a gimick, its a performance enhancement.</strong><hr></blockquote>



    And this rarely ever translates into twice the real performance. Twice the possibility does not mean twice the performance. That isn't how hardware or software works. I would take a single 4GHz CPU over two 2GHz CPu's (all things the same).
  • Reply 12 of 43
    powerdocpowerdoc Posts: 8,123member
    [quote]Originally posted by Yevgeny:

    <strong>



    My bad. I realize that I really poorly stated what you above stated- we're in agreement. For example, if data is being edited that is solely in the cache, then the bus is not saturated.



    Bus saturation can happen when the processor is just looping through a very large array and the bus must continually fetch new portions of the array. No altivec code is needed to overwhelm the bus in this case.



    I am sure that there are things that you can do with altivec that do not saturate the bus. In particular, if you take one vector and run several altivec functions on it (while the bus is fetching the next vector), then you won't saturate the bus. Of course, it is also easy to saturate the bus with normal altivec useage.



    The overall point is that on a modern OS, dual processors do offer a speed bump that a single processor of the same speed would not see. Apple's decision to put duals on each deskop is a good technical decision, not just clever marketing.</strong><hr></blockquote>

    Yes and the Benchmarks with photoshop showed by Apple , make appear that the performance of the dualie scale rather lineary : a dual 1,25 ghz on 166 mhz mpx bus is no more memory starved than the previous dual ghz one. If the memory starvation was a so great issue, the benchmark of the top end dualie will be just a little superior than the dual ghz dualie. There is some little room for updating the G4 even with the mpx bus.



    And to reply to the initial question, yes dualie are a really good new, especially under Jaguar. I take a dual 867 mhz rather any single 933 mhz and even single 1 ghz powermac.
  • Reply 13 of 43
    "big jobs" = some processing job that requires a boat load of data from main memory



    I do not own a dualie now. And, of the benchmarks that I recall, I don't recall a single 1 GHz CPU benchmarked against a dual 1 GHz CPU (or any same clocked CPU against two same clocked CPU's all using same bus speed/throughput).



    [edit: there must be some like benchmarks with 800 MHz 7455's; I'll look.]



    On the old PowerMac, the bottleneck was about 1.05 GB/s, and a new 1.3 GB/s.



    The maximum theoretical computation speed for the following PPC 7455 CPU configs:

    Single Dual

    800MHz 1.28 GBps 2.56 GBps

    867MHz 1.39 GBps 2.77 GBps

    1000MHz 1.60 GBps 3.20 GBps

    1250MHz 2.00 GBps 4.00 GBps



    As you can see, the current line of PPC 7455's (the last three from above) are individually capable of sucking up more than the current 1.3 GBps bottleneck.



    So, on the "big jobs", what can adding another CPU do for you in terms of performance? At most, the second processors L3 cache MAY BE ABLE TO store some useful instructions from whatever application is running the "big job". Also, if you're running other smaller apps/jobs at the same time, you'll get some minor improvement as without the second CPU.



    (Edit) With a 25% faster bus, the new dual 1 Gig may be around 25% faster than the old dual 1 gig. But the question remains, how much faster will the 1.25 dual Gig be than the old dual 1 gig? Without extra bus b/w and the 1 Gig 7455 already being capable of single-handedly sucking up all the 1.3 GB/s of data from the bus, the 1.25 Gig dualie will hurry up and wait for data faster than the dual 1.0 gig units. Would you expect it to be much more than 5% faster on the "big jobs"?



    Let me say again that dualies can make your system appear more "snappy". HAS ANYONE EVER SEEN THIS QUANTIFIED IN SOME KIND OF A PERFORMANCE TEST?



    [ 08-15-2002: Message edited by: Eirik Iverson ]



    [ 08-15-2002: Message edited by: Eirik Iverson ]</p>
  • Reply 14 of 43
    yevgenyyevgeny Posts: 1,148member
    [quote]Originally posted by Eirik Iverson:

    <strong>"big jobs" = some processing job that requires a boat load of data from main memory



    I do not own a dualie now. And, of the benchmarks that I recall, I don't recall a single 1 GHz CPU benchmarked against a dual 1 GHz CPU (or any same clocked CPU against two same clocked CPU's all using same bus speed/throughput).



    On the old PowerMac, the bottleneck was about 1.05 GB/s, and a new 1.3 GB/s.



    The maximum theoretical computation speed for the following PPC 7455 CPU configs:

    Single Dual

    800MHz 1.28 GBps 2.56 GBps

    867MHz 1.39 GBps 2.77 GBps

    1000MHz 1.60 GBps 3.20 GBps

    1250MHz 2.00 GBps 4.00 GBps



    As you can see, the current line of PPC 7455's (the last three from above) are individually capable of sucking up more than the current 1.3 GBps bottleneck.



    So, on the "big jobs", what can adding another CPU do for you in terms of performance? At most, the second processors L3 cache MAY BE ABLE TO store some useful instructions from whatever application is running the "big job". Also, if you're running other smaller apps/jobs at the same time, you'll get some minor improvement as without the second CPU.



    Let me say again that dualies can make your system appear more "snappy". HAS ANYONE EVER SEEN THIS QUANTIFIED IN SOME KIND OF A PERFORMANCE TEST?</strong><hr></blockquote>





    First, the maximum theoretical performance is not always achieved. To achieve this, you would need to hand roll your own PPC assembly code, and think REALLY hard about it. Compiled code runs slightly slower.



    Secondly is this the maximum speed for getting a value from RAM (not cache), adding 1, and putting the value back into RAM? In this case, yes, you would saturate your bus, but many calculations are not like this.



    The reason why the large L3 cache makes such a performance difference is that it holds frequently used information. If a program/algorithm uses the same information over and over again, then the bus is really not saturated.



    On "Big Jobs", the second processor might become redundant. It really depends on what the job is doing. If you are fetching lots of data, but also are doing lots of things to the data while you have it, then you don't necessarialy satureate the bus.



    It really does depend on what you are doing. Is there potential to saturate the bus? Certainly. Does this mean that the second processor is useless? No. Dual processors are not useless, especially when one processor is doing altivec stuff, and the other is just maintaining the OS and running your email app.



    Yes, this has been quantified in performance tests. If it wasn't the case, then Apple wouldn't do it. This is not a marketing gimmic.



    [ 08-15-2002: Message edited by: Yevgeny ]</p>
  • Reply 15 of 43
    Barefeats just posted very disturbing performance test results. It says:



    "CONCLUSION



    To my surprise and chagrin, the new DDR Power Mac has no apparent performance advantage over the old SDRAM Power Mac running at the same clock speed.



    The 25% faster system bus seems of no help, either. Depressing. Scandalous!"



    This shocks the hell out of me because I expected something up towards a 25% performance bump due to the faster bus. Perhaps the tests were not sufficiently taxing main memory? I doubt the difference in L3 cache would make that big of a difference.



    Here's the link:



    <a href="http://www.barefeats.com/pmddr.html"; target="_blank">Barefeats Performance Tests on the New Dual 1G vesus the old Dual 1G</a>
  • Reply 16 of 43
    hmurchisonhmurchison Posts: 12,419member
    [quote]Originally posted by Eirik Iverson:

    <strong>Barefeats just posted very disturbing performance test results. It says:



    "CONCLUSION



    To my surprise and chagrin, the new DDR Power Mac has no apparent performance advantage over the old SDRAM Power Mac running at the same clock speed.



    The 25% faster system bus seems of no help, either. Depressing. Scandalous!"



    This shocks the hell out of me because I expected something up towards a 25% performance bump due to the faster bus. Perhaps the tests were not sufficiently taxing main memory? I doubt the difference in L3 cache would make that big of a difference.



    Here's the link:



    <a href="http://www.barefeats.com/pmddr.html"; target="_blank">Barefeats Performance Tests on the New Dual 1G vesus the old Dual 1G</a></strong><hr></blockquote>



    Just because the Bus is %25 faster does not mean the overall performance improves by that amount. I agree..it is a little



    <img src="graemlins/hmmm.gif" border="0" alt="[Hmmm]" /> I wan't to go to the Promise Land.
  • Reply 17 of 43
    jcgjcg Posts: 777member
    [quote]Originally posted by Yevgeny:

    <strong>



    And this rarely ever translates into twice the real performance. Twice the possibility does not mean twice the performance. That isn't how hardware or software works. I would take a single 4GHz CPU over two 2GHz CPu's (all things the same).</strong><hr></blockquote>



    As I understand it, dual processors NEVER give you twice the performance, becouse there is a loss of performance in managing the threads bieng split between processors. I believe that I saw some percentage gains estimated on adding processors a few years ago. I think that it gave about a 15-20% loss in performance (2x1ghz processors ~ 1.8-1.85 ghz real world performance), and it gets slightly worse when you add even more processors to the system, so real world performance of a quad 1 ghz might be equil to 2.8-3.2 ghz processor, and that gain is only realized by process that can be broken into parallel threads. My numbers probably are not accurate, but I believ that the concept is.
  • Reply 18 of 43
    brussellbrussell Posts: 9,812member
    The duals are not just marketing. They "work" under two conditions:

    1. MP-aware tasks, and

    2. simultaneous tasks (in X) even if they're not MP-aware.



    For example (from Barefeats):







    and







    Now, look at this photoshop tasks that is not MP-aware:







    The hype comes from people who say "X will automagically speed up everything you do because the OS is MP-aware!" That's nonsense.
  • Reply 19 of 43
    [quote]Originally posted by BRussell:

    <strong>The duals are not just marketing. They "work" under two conditions:

    1. MP-aware tasks, and

    2. simultaneous tasks (in X) even if they're not MP-aware.





    Now, look at this photoshop tasks that is not MP-aware:





    The hype comes from people who say "X will automagically speed up everything you do because the OS is MP-aware!" That's nonsense.</strong><hr></blockquote>



    BRussell, I do not contest the idea that multiprocessors can improve performance. I am arguing that for big, main memory intensive jobs their value will be constrained by the 167 MHz bus bottleneck.



    The tests by barefeats may NOT be very memory intensive. I haven't looked at the details yet. But, I believe that the fractal test is NOT main memory intensive.



    Maybe I'm being unnecessarily defensive but I've acknowledged your first two points several times in earlier posts in this thread. Again, I'm talking about big jobs that are memory intensive. These things have real tangible benchmarks. Snappiness does not.



    As for the Barefeats tests, I haven't looked at the details yet: size of test files, detailed description of algorithm, etc.



    I don't claim to be an EE on these matters. However, I'd like to know in what real-world QUANTIFIABLE ways that the extra CPU benefits. Saying that it makes your system more snappy isn't all that compelling.



    Eirik
  • Reply 20 of 43
    yevgenyyevgeny Posts: 1,148member
    [quote]Originally posted by JCG:

    <strong>



    As I understand it, dual processors NEVER give you twice the performance, becouse there is a loss of performance in managing the threads bieng split between processors. I believe that I saw some percentage gains estimated on adding processors a few years ago. I think that it gave about a 15-20% loss in performance (2x1ghz processors ~ 1.8-1.85 ghz real world performance), and it gets slightly worse when you add even more processors to the system, so real world performance of a quad 1 ghz might be equil to 2.8-3.2 ghz processor, and that gain is only realized by process that can be broken into parallel threads. My numbers probably are not accurate, but I believ that the concept is.</strong><hr></blockquote>



    If the threads do not have to talk to each other, and they have enough badnwidth, and their resources can be seperated between them (so that the resources do not get locked for writes), then you do get double performance. Not all problems can be handled in this way.
Sign In or Register to comment.