Why go all Dualies in the PM line?

eirik iverson · August 15, 2002 1:15PM

Yevgeny,

Would you please give us your opinion as to why the Barefeats performance tests show no significant improvement between the old and the new dual 1 GHz, despite the 25% increase in bus speed?

Thanks,

Eirik

PS. This is really an open question to all, of course.

programmer · August 15, 2002 1:42PM

Its possible that Apple is pulling a fast one and running the MPX bus asynchronously from the memory bus and they are claiming the "system bus speed of 167 MHz" by talking about the memory bus speed. This would be a cheap and sleezy marketing trick, but it would explain how they managed to do it so quickly (don't need a new processor from Moto, and Moto hasn't announced a new processor yet).

We'll have to wait and see... either proper technical docs will come out or somebody will run a "real" memory benchmark. The Barefeats results are suspicious because they are so close... the new machines have smaller/faster L3 cache so you'd expect some difference.

steves · August 15, 2002 1:56PM

[quote]Originally posted by Brussel:



The hype comes from people who say "X will automagically speed up everything you do because the OS is MP-aware!" That's nonsense.

<hr></blockquote>

True, but on a dual machine, you could be running a PS filter, playing an MP3, while checking your e-mail without any significant loss of performance or responsiveness on a dual CPU machine. While it's true that not all tasks will automatically be faster, your SYSTEM will automatically be faster in OS X as opposed to the same setup in OS 9. This is the point that most people are trying to make when discussing OS X.

[quote]Originally posted by Eirik Iverson:



BRussell, I do not contest the idea that multiprocessors can improve performance. I am arguing that for big, main memory intensive jobs their value will be constrained by the 167 MHz bus bottleneck.

<hr></blockquote>

I'll be the first to agree that bandwidth is an issue regarding performance. I'd also be the first to point out that this issue is way over rated in forums like this. It's almost used as an excuse for poor CPU performance. i.e. my G4 would be as fast as your P4 if only it ran on a faster bus. This is pure BS! The vast majorty of applications are not significantly bandwidth limited. For example, a couple months ago, PC world compared the performance of several P4 systems in the 2 - 2.2GHZ range. Systems with SDR, DDR and Rambus were compared. Across a variety of applications the performance difference was generally less than 10%. I'd argue that Macs, running at lower clock rates, would see even less of a difference in overall speed. Especially when you factor in the use of L3 cache, which, for most apps, basically nullifies the arguement for the need of faster busses.

In short, yes, faster busses are a good thing. However, it's nowhere near the issue that people are making it out to be. Outside of a synthetic benchmark such as STREAM, etc. I challenge anyone to prove bus speed is the bottleneck, across a variety of applications, that it's made out to be in this forum, especially with the use of L3 cache.

Steve

kecksy · August 15, 2002 2:14PM

The old dual 1GHz had 2MB L3 cache.

The new dual 1GHz has 1MB L3 cache.

Perhaps the smaller L3 cache nullifies the performance advantage of a 167MHz bus.

This would explain why the old dual 1GHz is faster than the new dual 1GHz in Photoshop.

Looks like an extra megabyte of L3 cache gives you the same performance boost as a 167MHz bus. Apple really should have put more memory in these machines.

DDR SDRAM doesn't do squat either. I guess it's just there for marketing purposes.

What the next PowerMacs need is RapidIO. Either that or 128-bit MPX. Dam you Motorola!

[ 08-15-2002: Message edited by: Kecksy ]

eirik iverson · August 15, 2002 2:23PM

The Barefeats test results appear to bear out SteveS's statement about bus speed being over-rated. Although, I did ask Rob over at Barefeats to run tests on bigger files.

The largest of them all was 30 MB. I'd of guessed that would stress the bus enough to show a difference.

As for the L3 cache difference, maybe that does make the difference. Perhaps, the large cache enables the application to 'cache' more of the instructions from the application, which leaves the main bus that much more free to transport only data.

Plus, I imagine an instruction retrieval yields considerably more latency than just another gulp of data.

I still intend to buy a PowerMac soon, BTW. I want to get into the digital era, make more use out of my digital camcorder and all. I don't want to learn this on a WinDon't machine. I never want to have to **** with the WinDon't registry or confusing I/O settings ever, ever again. Sorry for the tangent; had to say it.

Eirik

programmer · August 15, 2002 2:35PM

[quote]Originally posted by Eirik Iverson:

The Barefeats test results appear to bear out SteveS's statement about bus speed being over-rated. Although, I did ask Rob over at Barefeats to run tests on bigger files.

The largest of them all was 30 MB. I'd of guessed that would stress the bus enough to show a difference.

As for the L3 cache difference, maybe that does make the difference. Perhaps, the large cache enables the application to 'cache' more of the instructions from the application, which leaves the main bus that much more free to transport only data.

Plus, I imagine an instruction retrieval yields considerably more latency than just another gulp of data.

I still intend to buy a PowerMac soon, BTW. I want to get into the digital era, make more use out of my digital camcorder and all. I don't want to learn this on a WinDon't machine. I never want to have to **** with the WinDon't registry or confusing I/O settings ever, ever again. Sorry for the tangent; had to say it.

Eirik<hr></blockquote>

For these tests all of the performance intensive code will fit in the L1 instruction cache, so that shouldn't be a factor at all.

Increasing the size of the data set probably won't help the results -- he was already using a 10 MB image, so that is well beyond the cache size. I don't know the code that is running on a per-pixel basis in these tests, however, so I can't say whether this machine would be CPU or bandwidth limited. We need a test that was definitely memory bound before passing judgement.

yevgeny · August 15, 2002 2:37PM

[quote]Originally posted by Eirik Iverson:

Yevgeny,

Would you please give us your opinion as to why the Barefeats performance tests show no significant improvement between the old and the new dual 1 GHz, despite the 25% increase in bus speed?<hr></blockquote>

I don't know why the new bus shows no speed improvement. Given that all the machines are MP machines, there should be some speed difference. The new machines do have a 166 MHz mpx bus. That we do have tests where the new machines are faster seems to indicate that something is up with the testing.

nevyn · August 15, 2002 2:41PM

[quote]Originally posted by Eirik Iverson:

Yevgeny,

Would you please give us your opinion as to why the Barefeats performance tests show no significant improvement between the old and the new dual 1 GHz, despite the 25% increase in bus speed?

<hr></blockquote>

Since it's _exactly_ the same, I'd guess that it isn't a bus-limited operation. If there's enough calculating on each part of the image a bit at a time without maxing out the memory bus, it looks to be a CPU-bound operation.

I don't know exactly what their Photoshop actions are for their benchmark, but that's my guess.

Is there an easy memory-bandwidth test we could aim BareFeats way?

[Edited to add the following:]

Or to put it simpler:

The CPUs are identical, it's only the bus that's different. Not everything is bus-limited, some things are CPU-limited. Whatever is limiting determines how fast things will go.

[ 08-15-2002: Message edited by: Nevyn ]

yevgeny · August 15, 2002 2:50PM

[quote]Originally posted by Nevyn:



Is there an easy memory-bandwidth test we could aim BareFeats way?<hr></blockquote>

Sure, it looks like this:

void BigJob()

{

long* pLong = new long[100000000];

// start timing code

for (long i = 0; i < 100000000; i++)

pLong[i] = i;

// end timing code

}

Have them load up a copy of metrowerks, open the "Hello world" sample and copy/paste the code from this function into that. Find the timing librarys and use cout to output the time. (make sure to run in release mode, not debug mode)

OR, if they are scared of writing code, they could run any one of a number of commercially available benchmarking programs, instead of some ad hoc test platform.

brussell · August 15, 2002 4:36PM

[quote]Originally posted by Eirik Iverson:

BRussell, I do not contest the idea that multiprocessors can improve performance. I am arguing that for big, main memory intensive jobs their value will be constrained by the 167 MHz bus bottleneck.

The tests by barefeats may NOT be very memory intensive. I haven't looked at the details yet. But, I believe that the fractal test is NOT main memory intensive.

Maybe I'm being unnecessarily defensive but I've acknowledged your first two points several times in earlier posts in this thread. Again, I'm talking about big jobs that are memory intensive. These things have real tangible benchmarks. Snappiness does not.

As for the Barefeats tests, I haven't looked at the details yet: size of test files, detailed description of algorithm, etc.

I don't claim to be an EE on these matters. However, I'd like to know in what real-world QUANTIFIABLE ways that the extra CPU benefits. Saying that it makes your system more snappy isn't all that compelling.

Eirik<hr></blockquote>Well, I'm getting mixed messages from your posts then. You're saying that you don't trust the people who say "it feels snappier" and I couldn't agree more. But I showed a few empirical examples of where duals do dramatically improve performance.

So now can you show an example of where duals don't improve performance in an MP-aware app? I'd just like to see some empirical evidence that duals don't help because of bandwidth limitations.

BTW, could you please provide a link to where someone from Motorola said they had ceased Apple-related development.

:eek:

g-news · August 15, 2002 5:03PM

Apple also had Dual systems back in the 604 days where tehre was NO L3 cache and a measily 50MHz, sometimes even 45MHz 60x bus. Do you think that wasn't saturated too? And the daystar machines with quad 604e machines on a 50MHz bus?

But then too, there were MP aware apps and they DID benefit from it.

Today we even have SMP in the OS, so we also benefit for multitasking, which was not the case under 7.5.x-9.2.2 (or only very little).

Seeing PS, SoundJAM and Q3 benefit around +90% with duals compared to singles, I can understand Apple quite clearly.

Plus it helps the marketing as well.

I'll be glad if my DP 1.25GHz arrives and I will notice the second CPU, even though they share a 166MHz bus.

And yes, I think the future lies with IBM.

G_News

eirik iverson · August 15, 2002 5:16PM

There's no direct link from irumors.net so I'm pasting it below:

"Bye bye Motorola - it was good while it lasted!

Posted July 30 2002

Within the next few months, Apple will have ditched Motorola processors for its high end computers. Indeed, after a discussion with Motorola Canada president Frank Maw on July 25, he quite happily told our sources that product developement has already ceased on non- imbeded PowerPC processors. What does this mean? Quite simply, it means that Motorola have stopped upgrading the processors we use in our Macs.

Apple have used Motorola G4 processors for one main reason - AltiVec. However, with processor development ceasing, it is not known what will happen. Will they use upcoming IBM processors for new PowerMacs and continue using Motorola for their lower end computers?

Motorola, of course, will continue to ship current processors if there is a need for them. But Apple wants faster chips, which Motorola cannot provide.

Apple have been put in a very tough position. Will they be able to get out of this mess without losing sales? We hope so."

However, there is a follow-up to the article that I didn't see that waters down their original so-called quote from a corporate officer in Motorola:

"A few things have come to light in the Motorola saga

Posted July 31 2002

We would like to expand on what we told you yesterday regarding Motorola and their ceasing chip development. First up, we would like to point you to a link that covers more on the topic. View this here.

Secondly, we would like to tell you that we do not necessarily agree with what we posted yesterday. This information came from an email rumor submission. After looking over other evidence, we would also like to point out that Motorola will not be fully abandoning the Mac. Frank Maw, however, say that there will be no sure timeline for chip production."

Well, this more recent report by irumors.net takes much of the wind out of the first one. Supposedly someone from CNET spoke with Frank Maw but I haven't seen it in CNET. That reference from a board somewhere may have been an error.

As for the benchmarks and/or the rationale for why there faster or more CPU's shouldn't improve major memory intensive performance, I'll refer you to a post or two in arstechnica later.

I am looking for benchmarks, BTW. Specifically, I'm looking for a comparison of a single 800MHz PPC 7455 versus a dual 800MHz PPC 7455. This should provide some insight into single versus dual performance, provided that the cache levels are comparable. Different cache levels would just complicate things a bit.

eirik iverson · August 15, 2002 5:21PM

This arstechnica thread has some quality explantions/speculation about the so-called b/w bottleneck and questions the value of more or faster CPU's for improved performance on main memory intensive applications. It also, BTW, rips the Barefeats tests for some methodology deficiencies.

<a href="http://arstechnica.infopop.net/OpenTopic/page?a=tpc&s=50009562&f=8"; target="_blank">ArsTechnica Thread on PM G4 bottleneck & benchmarks</a>

kuku · August 15, 2002 9:24PM

I'm urge to use some ungrateful slangs just about now.

People who might not even be of a computer engineer profession, no real world results, and not even theorical results are trying there hands at bashing duals.

I've just finish playing with a Dual 1.25ghz new model.

Apple standard configs at the apple store.

Thanks to the power of unix and their utilites, I can realistically say, that BOTH CPUS are getting a full tilt workout at even trivial things.

So naysayers of SMP, the processor is getting close to that 2.5ghz factor.

And even a single 2.5ghz computer doesn't scale right on real world results. things do happen based on situations to keep it from getting linear results.

After playing with the computer, I am a bliever of the dual being 2.5ghz as a 2x1.25ghz. The processors are kept fed well enough that unless you're a prick that does the "Look that split second the 2nd cpu is idle".

And I wasn't lax in playing with it, Apple store keeps their showcase computers packed with programs.

~Kuku

[ 08-15-2002: Message edited by: Kuku ]

eirik iverson · August 15, 2002 10:41PM

[quote]Originally posted by Kuku:



I've just finish playing with a Dual 1.25ghz new model.

Apple standard configs at the apple store.

Thanks to the power of unix and their utilites, I can realistically say, that BOTH CPUS are getting a full tilt workout at even trivial things.

So naysayers of SMP, the processor is getting close to that 2.5ghz factor.

And even a single 2.5ghz computer doesn't scale right on real world results. things do happen based on situations to keep it from getting linear results.

After playing with the computer, I am a bliever of the dual being 2.5ghz as a 2x1.25ghz. The processors are kept fed well enough that unless you're a prick that does the "Look that split second the 2nd cpu is idle".

And I wasn't lax in playing with it, Apple store keeps their showcase computers packed with programs.

~Kuku

[ 08-15-2002: Message edited by: Kuku ]<hr></blockquote>

Kuku,

This sounds interesting could you possibly elaborate on the programs you ran and the files, particularly file sizes, that you employed?

I don't think anyone disputes that the dual 1.25 will be anything but very snappy/responsive in casual usage. There's an interesting discussion in a few threads in arstechnica with some semiconductor engineers speculating about the dual 1.25 and dual 1.00 in performing memory intensive jobs.

Eirik

g-news · August 16, 2002 4:53AM

I'll tell you guys in 6 weeks when that baby arrives here.

"Going to scream, it is."

G-News

nullptr · August 16, 2002 6:23AM

Perhaps Apple is doing this as a favor to Motorola. All dualies means almost twice as many chips sold. As odd as this may sound (Apple helping Motorola despite the mess that Motorola put them into), it makes sense. If Apple is using another vendor for their next generation chip *IBM*, Motorola surely knows and might need some incentive to keep on producing G4s instead of stopping production and cutting their losses. Apple will still need G4s for the forseeable future for their Powerbook/iMac lines. This might be a bargaining chip for Motorola to keep on doing G4 work (the little of it that they are doing) until Apple can switch to a new vendor. Just a thought.

eliahu · August 16, 2002 1:34PM

Apple went all dual processors to remain competetive and try to spur sales. Do you know what you get for your money on the PC side right now? Dell has the Dimension 8200 for under $1K. This box boasts a 2GHz P4 with a 400 MHz bus.

Apple has been stuck with tiny incremental processor speed bumps and they're still using an archaic bus. If they want Wintel users to switch, they had better be offering something more compelling than great style for prices significantly higher than what PC users are accustomed to.

Apple went all dual because it had no choice. Tower sales have been sad for a while now. The magic words "DDR" and dual processors will get many excited and should improve the bottom line until they can make the hardware on par with PCs.

hmurchison · August 16, 2002 1:41PM

[quote] This box boasts a 2GHz P4 with a 400 MHz bus. <hr></blockquote>

Geez that's descriptive. And it's a 100mhz bus Quad Pumped.

amorph · August 16, 2002 2:19PM

[quote]Originally posted by eliahu:

Apple went all dual because it had no choice. Tower sales have been sad for a while now. The magic words "DDR" and dual processors will get many excited and should improve the bottom line until they can make the hardware on par with PCs.<hr></blockquote>

Do you mean on par with a Dell Dimension in real terms, or in marketing-speak? The Dimension series is designed to look good on a spec sheet and that's about it. There's a reason it costs under $1K.

Why go all Dualies in the PM line?

Comments