G5 Speculation Revisited

gamblor · July 10, 2002 12:21PM

[quote] The Maya benchmark isn't very valid. They used an older version on the Mac that didn't even support dual processors versus a newer PC version that does support dual processors. Not exactly scientific there. <hr></blockquote>

Hate to break it to ya, Jeff, but they used the latest versions available for each platform. That's entirely valid.

[quote] The chart for Lightwave doesn't seem to make any sense. It shows the G4 at 100 and the P4 at 161, but looking at the data that was supposedly used to generate the charts shows the following test times. 127, 16, 8, and 10 seconds for the G4 and 127, 8, 4, and 7 seconds for the P4. Not exactly sauteed here is it? <hr></blockquote>

Jeff, go to <a href="http://www.blanos.com/benchmark"; target="_blank">the Lightwave benchmark website</a> and see if you can find any instances of a dual gig Powermac scoring much higher than HALF what the top scoring machine gets. (Check the numbers again. The dual Xeon 2400 P4 gets 68 on the raytrace test, not 127, just slightly over half. That's pathetic.)

[quote] Cinebench also doesn't make use of dual processors, so you would expect the G4 to fall behind. <hr></blockquote>

Why? PCs come with dual procs as well. Why is that a detriment to the G4, and not PCs? Besides that, it highlights a pretty damn important point-- not all software can take advantage of dual processors.

[quote] They only have scores for an 800 MHz G4 on the Mathematica test. Who knows what a dual 1 GHz would really do? <hr></blockquote>

...and the only scores they've got for the P4 are for a 1700... who knows what a dual 2400 or single 2533 would do? I seriously doubt if the results for PC vs. Mac have changed much since that test was performed.

maverick · July 10, 2002 12:41PM

[quote] Gamblor "Sorry, but the G4 is seriously bandwidth choked" <hr></blockquote>

Yeah, you're right, it is. But that's the bandwidth of it's access to the motherboard (the G4 uses a 133 MHz motherboard, while the P4 can use a 200MHz motherboard), but like I said that's only part of the equation. You also talked about the QDB. and you're right, but I was just talking about chip performance. As I said Apple has to develop a happy medium between bus performance, and chip performance to build a more reliable product than the average PC

[quote] Gamblor "What you've failed to acknowledge is that the P4 has never been offered at a speed as low as the fastest G4 that is available now. In fact, the fastest P4 has a clock speed 2.5+ times what the fastest G4 is available at. To top it all off, a P4 machine running at that speed is going to cost half what Apple charges for the dual gig Powermac." <hr></blockquote>

I was just using this as an example, and trying to simplify this for conversations sake, and to simplify the math. In all honesty the Pentium 4 works about 7 processes per cycle, versus 15 processes per cycle on the new G4 processors.

[quote] Gamblor "Yes, on a cycle-by-cycle basis, the G4 is more efficient, but that's not the point. There are P4 machines available which take any machine from Apple to school, for less money. They make up their lower efficiency per cycle with high clock rates. You started your response by complaining that we weren't considering "the whole equation". " <hr></blockquote>

Yes I did, and you're still not, Apple has to develop a happy medium of bus and chip performance to build a good system. Plus Apple still doesn't have as fast of versions of video and audio cards as the PC does, that will change with time as companies begin to develop more universal products, and Apple designs a more efficient bus for the G4. And while you're arguing the point let's just say that those faster and newer machines that Intel rushed out to compete with Apple and AMD still have many bugs that are still unresolved, and may never be. Faster development of faster systems usually means less test time and buggy hardware.

:cool:

[ 07-10-2002: Message edited by: Maverick ]

gamblor · July 10, 2002 3:20PM

[quote] (the G4 uses a 133 MHz motherboard, while the P4 can use a 200MHz motherboard) <hr></blockquote>

No... The P4's bus is a QDR 100MHz bus, 400MHz effective. Or at least that's what Intel says-- actually, its a 100MHz DDR bus which can carry out a read and a write transaction at the same time; but it doesn't have a 200MHz bus.

[quote] You also talked about the QDB. and you're right, but I was just talking about chip performance. <hr></blockquote>

Chip performance is utterly irrelevant if the chip is starved by a slow bus, and the G4 is DEFINITELY starved by a slow bus.

[quote] As I said Apple has to develop a happy medium between bus performance, and chip performance to build a more reliable product than the average PC <hr></blockquote>

Define this "happy medium". To me, Apple would have achieved a "happy medium" if they had the G4 on a bus that was fast enough to supply it with data.

[quote] In all honesty the Pentium 4 works about 7 processes per cycle, versus 15 processes per cycle on the new G4 processors. <hr></blockquote>

What do you mean by "processes"? Are you talking about how many execution units each chip has? The number of instructions the chip can retire every cycle? What?

And again, "15 processes per cycle" is meaningless if the bus can't provide the processor with that many "processes" to perform each cycle...

[quote] Yes I did, and you're still not, Apple has to develop a happy medium of bus and chip performance to build a good system. <hr></blockquote>

Actually, yes, I am considering the whole equation; it's painfully obvious that the LARGEST deficiency Macs have right now is an anemic bus. The current Macs certainly don't find a "happy medium" between bus speed and processor speed.

lemon bon bon · July 10, 2002 3:29PM

"Actually, yes, I am considering the whole equation; it's painfully obvious that the LARGEST deficiency Macs have right now is an anemic bus. The current Macs certainly don't find a "happy medium" between bus speed and processor speed. "

Gamblor said it.

Lemon Bon Bon

PS. Regarding the potential 'next generation' PPC...

What would people prefer?

Moto G5 on Rio?

OR...

IBM Power4 Desktop variant on Hypertransport?

(ie...Why are Apple sitting on the Hypertransport Jedi Council?)

[ 07-10-2002: Message edited by: Lemon Bon Bon ]

hotboxd · July 10, 2002 4:10PM

I would prefer HT because it has more room to grow than RIO. The RIO architecture maxes out at around 8GB/s, I believe HT has a higher ceiling.

RIO would be fine for the rest of the systems tho (iMac, Powerbook, iBook). I would think that something like the e500 core + OCEAN setup that Motorola has been hyping will go into all Apple systems. Something like that is probably what the next Powermac rev. will contain, and eventually the iMac and Powerbook/iBook would transfer to that architecture while the PM gets the G5 (maybe, who knows about that).

ed m. · July 10, 2002 4:41PM

Here is an interesting article by my friend Dave that takes a look at all the up-and-coming designs... It's worth the read.

<a href="http://www.workingmac.com/igeek/23.wm"; target="_blank">http://www.workingmac.com/igeek/23.wm</a>;

--

Ed M.

maverick · July 10, 2002 4:47PM

Gamblor,

OK! end of processor discussion. Let's just say that we're arguing 2 different points of the same argument except that I'm talking about the microprocessor, and you are talking about the bus. Intel says that their available busses are 200MHz, 400 MHz, and 533 MHz. Nothing about them only being 100 or 133 MHz. However I already know their using multiple IO,s for advanced cache pipelineing because if they weren't, and they were actually using 200, 400, or 533 MHz boards they would burn through your case and desk because of the shear heat involved. Since you don't seem to understand the concept of processes per cycle, and I don't know it by another name (due to the lack of my professor at school not giving me another name) we'll use MIPS. The G4 processes 2.42 MIPS per MHz according to Motorola, and Intel's Pentium 4 only processes at around 1.2 MIPS according to my class in computers (a source I use because Intel's website doesn't list MIPS for it's current line of processors). As per floating point calculations I don't know. In all honesty I don't care. The G4 outperforms the Pentium 4. You on the other hand have this obvious infatuation with wider bandwidth bus technology, and that's great. maybe next week you're thirst will be quenched. I wasn't trying to insult you're knowledge, but you seem to be trying to take potshots at mine. You're wrong about the benchmarks you quoted, because the benchmarks aren't equal. If you need that speed after monday, and Apple doesn't deliver, then switch. Get what you need to do the job. Stop arguing about it. I'll be fine with my G3, and G4 computers. End of conversation, don't reply, because I won't continue with this rethoric.

[ 07-10-2002: Message edited by: Maverick ]

amorph · July 10, 2002 5:02PM

[quote]Originally posted by Lemon Bon Bon:

"Actually, yes, I am considering the whole equation; it's painfully obvious that the LARGEST deficiency Macs have right now is an anemic bus. The current Macs certainly don't find a "happy medium" between bus speed and processor speed. "

Gamblor said it.

<hr></blockquote>

The funny thing is that pretty much every PowerMac since the 604 bowed in has been bus starved. The last 9600MP had two 350MHz processors with minimal caches trying to sip data through an inefficient 50MHz bus. That's why we now have three levels of cache.

In contrast, the G4 isn't bus-starved because the bus is so slow - MaxBus isn't that bad, really; it holds its own against a lot of DDR busses at the same real clockspeed - it's bus-starved because the G4 is so powerful. Despite that, the situation it's in is a lot like the situation the 604ev was in when it was subsisting in the older PowerMacs - and, at the same time, powering (high-bandwidth) IBM minicomputers that could serve hundreds of users simultaneously.

The real problem is that high bandwidth motherboards have traditionally been prohibitively expensive to implement. That's why HT and RIO are especially exciting: They're bringing high-end capabilities down into the sub $10K space.

gamblor · July 10, 2002 5:45PM

[quote] Gamblor said it. <hr></blockquote>

Are you giving me credit, or blaming me?

[quote] I would prefer HT because it has more room to grow than RIO. The RIO architecture maxes out at around 8GB/s, I believe HT has a higher ceiling. <hr></blockquote>

Well, 8GB/s should be enough for the next week or so, at least.

IIRC, both RIO and HT are slated to go even higher. If memory access is taken off the bus and handled with an on chip controller, then 8GB/s should be plenty fast for what the bus will be used for.

...

[quote] OK! end of processor discussion. <hr></blockquote>

Then why are you continuing?

[quote] Nothing about them only being 100 or 133 MHz. <hr></blockquote>

100 & 133MHz are the base frequencies for the bus. Intel uses QDR to get an effective 400 or 533MHz bus speed. (QDR = Quad Data Rate = up to four bits transfered per clock tick.)

[quote] Since you don't seem to understand the concept of processes per cycle, <hr></blockquote>

I've got a degree in computer science, and I've been working as a programmer for over a decade. In that time, I haven't come across the term "processes per cycle". IPC (Instructions Per Cycle), sure, but not "processes per cycle". I simply asked for an explanation of the term.

[quote] The G4 processes 2.42 MIPS per MHz according to Motorola, and Intel's Pentium 4 only processes at around 1.2 MIPS according to my class in computers (a source I use because Intel's website doesn't list MIPS for it's current line of processors). <hr></blockquote>

As I said before, it doesn't matter how fast the G4 is if its bus can't supply it with data fast enough to keep busy. Those MIPS figures are theoretical, and don't take into account how long it takes to load/store data across the bus.

Think of it this way: Lets say the G4 can execute 15 processes in a single cycle. If it takes three cycles to load those 15 processes because of the slow bus, then the average number of processes the G4 can execute in a single cycle is actually 3.75, because the processor spends three cycles idle while the processes are loaded across the bus. Now do you understand the importance of bus speed to the performance of the processor?

[quote] The G4 outperforms the Pentium 4. <hr></blockquote>

Only for processes which are not effected by the anemic bus speed, and at the begining of the 21st century, those types of processes are few and far in between.

[quote] You on the other hand have this obvious infatuation with wider bandwidth bus technology, <hr></blockquote>

I have an infatuation with bus speed because i recognize that its the one area where the G4 gets its ASS kicked!

...

[quote] In contrast, the G4 isn't bus-starved because the bus is so slow - MaxBus isn't that bad, really; it holds its own against a lot of DDR busses at the same real clockspeed - it's bus-starved because the G4 is so powerful. <hr></blockquote>

ToMAYto, ToMAHto.

The end result is the same-- emasculated machines that don't live up to their full potential.

Yeah, from what I gather the MPX bus is a rockin' bus. It's just too bad it can't be easily retrofitted to use DDR or QDR...

[quote] The real problem is that high bandwidth motherboards have traditionally been prohibitively expensive to implement. That's why HT and RIO are especially exciting: They're bringing high-end capabilities down into the sub $10K space. <hr></blockquote>

Bingo. Amorph gets a cookie.

It's going to be incredibly cool to have a desktop class machine with a full-blown switching fabric in it...

lemon bon bon · July 10, 2002 5:55PM

"I have an infatuation with bus speed because i recognize that its the one area where the G4 gets its ASS kicked!"

He said it again!

Go Gamblor!

Amorph...yeah we know bus speeds have been lagging cpu speeds for a long time and holding back overall system speed. Over the years, the articles I read stated it was cost prohibitive to do much about the bus...easier to ramp the CPU. Either way...that's where most of the focus up until the last year or so seems to have gone. Seems that is about to change in the New Year!

Bandwidth...at last, seems to be becoming more of a focus of late...

The next gen' of bus standards will begin to really let cpus fly? Certainly if a G4.75 is plopped onto a rio board then we'll just how much a dual 1.5 G4 has in its locker.

However, come next April, I just hope it's a G5 sitting on Rio...or a Power 4 variant on Hypertransport.

Lemon Bon Bon

[ 07-10-2002: Message edited by: Lemon Bon Bon ]

amorph · July 10, 2002 10:10PM

[quote]Originally posted by Lemon Bon Bon:

Amorph...yeah we know bus speeds have been lagging cpu speeds for a long time and holding back overall system speed.<hr></blockquote>

Which means that your hope for a G5 is essentially vain. It's the rest of the system that needs to catch up.

You put a processor equivalent to the 7455 - today's G4, at 1GHz - but with an onboard memory controller hooked up to a HyperTransport fabric, and you'd swear you were on a "G5."

[quote]However, come next April, I just hope it's a G5 sitting on Rio...or a Power 4 variant on Hypertransport.<hr></blockquote>

I don't doubt that we'll have another significant processor upgrade by April. All signs point that way. The real news, though, is that Apple can, and probably will, adopt a motherboard with the bandwidth to do it justice. It won't be a hot chip that spends most of its life twiddling its thumbs, like the 604s and the G4s.

(The G3 doesn't count, because it was designed for a slow bus. But then someone bolted AltiVec on...)

Also, Gamblor's talking real world application use, where there are so many variables that talking about even a 50% boost to bus speed is beside the point. If Lightwave is x86-optimized code ported to Mac, that's probably the problem: naive (i.e., straightforward) code usually runs faster on a PPC than code heavily optimized for x86. The architectures are that different. I seem to recall the NewTek guys getting something like a 30% speed boost just by changing the type of their floating point variables, so I'm guessing that there's a lot of work to be done in that area.

Then of course there's the fact that Excel is dog slow on a Mac, for which there is no hardware explanation whatsoever.

You can throw hardware at bad code and expect some sort of improvement, but then Apple is stuck making much faster machines (therefore, in all likelihood, significantly more expensive) just to make ill-suited and sloppy code run decently.

[ 07-10-2002: Message edited by: Amorph ]

ed m. · July 10, 2002 10:33PM

Amorph, You said it perfectly. The problem seems to stem from the lack of understanding of the PPC architecture in general, AltiVec implementation, and Mac hardware in general. Couple that with the fact that most programmers were first weaned on x86 coding techniques (and therefore use to that) which don't apply to the PPC. Most code is probably common-base x86 derived and simply recompiled for PPC and optimized (in certain areas) to the best of their understanding in those areas. That's why I listen so closely to what Chris Cox from Adobe has to say about things like this. He's highly proficient in BOTH development environments. Perhaps you should e-mail him and as him specifics. The truth is that developers spend MORE time and effort optimizing for x86 than they do PPC. One of the things these developers need to start doing is increasing the amount of parallelism within their code. I mentioned that in the first few posts. If they are not then a good chunk of performance if being completely wasted and that's regardless of the bus.

--

Ed M.

gamblor · July 10, 2002 10:36PM

[quote](The G3 doesn't count, because it was designed for a slow bus. But then someone bolted AltiVec on...)<hr></blockquote>

Damn the luck...

Actually, the current situation with IBM's 750FX is supreme irony-- IBM has been able to kick the ol' 60x bus up to 200MHz... How sweet it would be if Moto could get the same improvement with MPX.

[quote]If Lightwave is x86-optimized code ported to Mac, that's probably the problem: naive (i.e., straightforward) code usually runs faster on a PPC than code heavily optimized for x86. The architectures are that different. I seem to recall the NewTek guys getting something like a 30% speed boost just by changing the type of their floating point variables, so I'm guessing that there's a lot of work to be done in that area.<hr></blockquote>

With Lightwave 7.0b, Newtek began using Moto's C math library that's optimized for the G4 (not Altivec, just for the FPU; although they did make Altivec enhancements as well). That's when they got the 30% boost(minimum, I might add-- some of the improvements were as high as seven times faster). I'm not exactly sure if there's much more room for improvement, though; the upgrade to 7.5 didn't see too much of a performance boost, if any.

[quote]You can throw hardware at bad code and expect some sort of improvement, but then Apple is stuck making much faster machines (therefore, in all likelihood, significantly more expensive) just to make ill-suited and sloppy code run decently.<hr></blockquote>

Well, to be honest, that's what any chip manufacturer is stuck with... If I had a nickle for every time I've heard a manager say, "Optimize? Let's not waste our time. With the next rev of <insert favorite processor here>, the code will run a lot quicker." True, but kinda misses the point...

jeff leigh · July 11, 2002 12:44PM

[quote] Jeff, go to the Lightwave benchmark website and see if you can find any instances of a dual gig Powermac scoring much higher than HALF what the top scoring machine gets. (Check the numbers again. The dual Xeon 2400 P4 gets 68 on the raytrace test, not 127, just slightly over half. That's pathetic.) <hr></blockquote>

Those numbers I quoted were for a single 2400 P4, not a dual Xeon 2400 P4. And considering we've been talking about how 'expensive' Mac systems are, I'm curious as to how much a dual 2400 Xeon P4 would run ya? I'm betting it's pretty damn expensive. Not in the same league as a single 2400 P4 or dual 1 GHz G4. For that kind of money I would expect more than just slighty twice as fast.

lemon bon bon · July 11, 2002 2:29PM

"Which means that your hope for a G5 is essentially vain. It's the rest of the system that needs to catch up."

Actually, it's Apple/Moto that needs to catch up.

Sorry, that's cheap. But I couldn't resist...

I take your point. A G4 on Rio. Add a couple of extra fpu and I may 'jump'. Why not? You could get away with calling that a G5 in terms of the actual architecture it uses? A bigger change than going from g3 to G3+ ahem, I mean...'G4'.

I just wish they would leap to Rapid Io with the G4 rather than adopt a DDR/Bus mobo solution that's late...been delayed and delayed and obviously had some problems. And with 'power' Macs now rumoured to be coming in August?

What's the point if Rio is just 3 months around the corner by the time the 'new' 'power'Macs ship?

Still it seems to be Apple's new policy of not blowing their hardware wad all in one go since the 'Cube' show...

Lemon Bon Bon

gamblor · July 11, 2002 3:03PM

[quote] Those numbers I quoted were for a single 2400 P4, not a dual Xeon 2400 P4. And considering we've been talking about how 'expensive' Mac systems are, I'm curious as to how much a dual 2400 Xeon P4 would run ya? I'm betting it's pretty damn expensive. <hr></blockquote>

From <a href="http://www.alienware.com"; target="_blank">Alienware</a>, it'll run ya about $4500. A dual 2000 runs $3500, and in a couple of the Lightwave benchmarks, runs as fast as the dual 2400 (obviously bus bound). That's a machine that comes with a dual channel Ultra160 SCSI controller on the motherboard, and a GeForce 4 4600. Everything else is more or less equivalent to the Mac (Gig Enet, Firewire, etc.) A dual gig Mac with SCSI and video to match is going to be at least $3500; or, leave it out, but you'd want at LEAST the video upgrade if you're a serious Lightwave (make that any 3D) user.

[quote] Not in the same league as a single 2400 P4 or dual 1 GHz G4. <hr></blockquote>

A dual gig Mac isn't in the same league as a single 2400 P4, which can be had for as little as $1500.

[quote] For that kind of money I would expect more than just slighty twice as fast. <hr></blockquote>

It's only "just slightly twice as fast" on ONE of the tests. It lags the PCs by a factor of 2-2.5 on most of the rest.

You can match the top end Mac's performance for half the price with a PC, or you can nearly double it for the same price, and more than double it if you're willing to go $1000 higher-- and that's with pre-built PCs. We won't talk about how much it would cost to put together your own PC, because that would make the Mac look even worse.

amorph · July 11, 2002 3:14PM

[quote]Originally posted by Gamblor:

With Lightwave 7.0b, Newtek began using Moto's C math library that's optimized for the G4 (not Altivec, just for the FPU; although they did make Altivec enhancements as well). That's when they got the 30% boost(minimum, I might add-- some of the improvements were as high as seven times faster). I'm not exactly sure if there's much more room for improvement, though; the upgrade to 7.5 didn't see too much of a performance boost, if any.<hr></blockquote>

As you yourself pointed out, developers don't usually optimize unless they have to. Linking to another math library and tweaking a few routines to support AltiVec is nice, but I'm sure they could get a lot more out of the machine if they were serious about it. Depending on how much of their codebase is optimized for x86, that might be a daunting amount of work, and they might not have anyone on staff familiar enough with Mac/PPC hardware to really know how to tweak the code.

[quote]Originally posted by Lemon Bon Bon:

I take your point. A G4 on Rio. Add a couple of extra fpu and I may 'jump'. Why not? You could get away with calling that a G5 in terms of the actual architecture it uses? A bigger change than going from g3 to G3+ ahem, I mean...'G4'.<hr></blockquote>

Actually, you didn't take my point: If you took the current G4, and the only changes you made to it were an onboard memory controller and HyperTransport connectivity, and you plopped that processor down in a motherboard set up as an HT fabric, that alone would hardly be recognizeable. In other words, all you'd have to do is change everything around the CPU itself (including, say, the OS) to get a startling improvement in performance. Add a couple more execution units to the G4 and clock it 50% higher, and as long as it's fed well, you're really flying.

[quote]I just wish they would leap to Rapid Io with the G4 rather than adopt a DDR/Bus mobo solution that's late...been delayed and delayed and obviously had some problems. And with 'power' Macs now rumoured to be coming in August?<hr></blockquote>

Nobody knows what the architecture will be in August - or, for that matter, whether it will appear in August. moki heard a rumor that we're getting a stopgap, but that doesn't mean that we are. For all we know, Apple will unveil a switched-fabric HyperTransport board with twin 1.5GHz G4s and PC2700 DDR RAM. Or not.

[ 07-11-2002: Message edited by: Amorph ]

gamblor · July 11, 2002 4:00PM

[quote] As you yourself pointed out, developers don't usually optimize unless they have to. Linking to another math library and tweaking a few routines to support AltiVec is nice, but I'm sure they could get a lot more out of the machine if they were serious about it. Depending on how much of their codebase is optimized for x86, that might be a daunting amount of work, and they might not have anyone on staff familiar enough with Mac/PPC hardware to really know how to tweak the code. <hr></blockquote>

Don't think that Lightwave is a PC app that has been ported to the Mac. It started out as an Amiga app, and in the past has run on IRIX, Solaris, and, I think, AIX & HPUX, as well as Windows & Mac. Newtek doesn't even use native UIs-- they've got their own buttons, menus, dialog boxes, etc. I'm quite certain they start with a cross platform code base and add platform specific optimizations from there; it's not like they have to undo x86 optimizations before optimizing for PPC.

Besides that, I don't think it was as simple as "linking to another math library"-- I'm certain they did quite a bit of laborous analysis of the code; I can't imagine that you'd get a 3-4x (on average) speed increase just by changing libraries & making minor Altivec tweaks.

The simple fact is, a typical lightwave scene has dozens (possibly hundreds) of textures, thousands of triangles (like tens or hundreds of thousands), and multiple lights; not to mention metanurbs, particle rendering, etc. All that isn't going to fit in the cache. No way around it-- on the Mac, Lightwave's performance is hindered by the bus.

(Interestingly enough, the P4's performance on some of the tests scaled with the clock speed of the processor; that suggests to me that the P4 isn't able to saturate it's bus in those tests.)

amorph · July 11, 2002 4:30PM

[quote]Originally posted by Gamblor:

Newtek doesn't even use native UIs-- they've got their own buttons, menus, dialog boxes, etc.<hr></blockquote>

Slowdown #1, albeit on all platforms.

[quote]I'm quite certain they start with a cross platform code base and add platform specific optimizations from there; it's not like they have to undo x86 optimizations before optimizing for PPC.<hr></blockquote>

That's heartening, actually.

[quote]Besides that, I don't think it was as simple as "linking to another math library"-- I'm certain they did quite a bit of laborous analysis of the code; I can't imagine that you'd get a 3-4x (on average) speed increase just by changing libraries & making minor Altivec tweaks.<hr></blockquote>

Code is funny; I've heard of bigger speed increases than that just by rearranging the order in which functions were called. It's easy to get that kind of speedup without exhaustively revising the code, simply by targeting a couple of bottlenecks. That's probably what they did with AltiVec (for those bottlenecks that could be vectorized, anyway). I can't blame them, really. That's what you do.

The bottom line is that it's impossible to tell what the effort was judging by the performance increase.

[quote]The simple fact is, a typical lightwave scene has dozens (possibly hundreds) of textures, thousands of triangles (like tens or hundreds of thousands), and multiple lights; not to mention metanurbs, particle rendering, etc. All that isn't going to fit in the cache. No way around it-- on the Mac, Lightwave's performance is hindered by the bus.<hr></blockquote>

DDR's worst case is having to do lots of little fetches from RAM - at that, it's no faster than SDR. It's also RDRAM's worst case, and it's usually even slower. DDR on the PC should manage 600-1200MB/s, maybe more. MaxBus does 800-1000MB/s. RDRAM is 600-2500MB/s. The high numbers for all those involve streaming relatively large blocks of data, which is not implied by a program that manipulates thousands of relatively small, discrete and compute-intensive bits. So I'm still unsure. There's obviously something holding it back, but without an in-depth examination of the code I'm withholding judgment.

If it is the bus (which is certainly not out of the question; I'm playing devil's advocate here) and if Apple is going to roll out a motherboard where the processor has an onboard memory controller connected to memory via HyperTransport, that bottleneck should disappear promptly.

[quote](Interestingly enough, the P4's performance on some of the tests scaled with the clock speed of the processor; that suggests to me that the P4 isn't able to saturate it's bus in those tests.)<hr></blockquote>

Or, it suggests that the program is compute intensive rather than memory intensive.

gamblor · July 11, 2002 5:48PM

[quote] Slowdown #1, albeit on all platforms. <hr></blockquote>

True, but it doesn't effect the benchmark results.

[quote] The bottom line is that it's impossible to tell what the effort was judging by the performance increase. <hr></blockquote>

True. It's also impossible to tell how much more performance they can get through further optimization. (... And my problem's intractable, so

)

[quote]

DDR's worst case is having to do lots of little fetches from RAM - at that, it's no faster than SDR. It's also RDRAM's worst case, and it's usually even slower. DDR on the PC should manage 600-1200MB/s, maybe more. MaxBus does 800-1000MB/s. RDRAM is 600-2500MB/s. The high numbers for all those involve streaming relatively large blocks of data, which is not implied by a program that manipulates thousands of relatively small, discrete and compute-intensive bits. So I'm still unsure. There's obviously something holding it back, but without an in-depth examination of the code I'm withholding judgment. <hr></blockquote>

When calculating a pixel value in a typical 3D pipeline, there are usually hundreds or thousands of small(er) ops that go into it. It's not like a matrix (or two, or a dozen) has to be inverted for every pixel; all that "grunt work" gets done before rendering even starts... I'm unaware of a benchmark that simply does transformations on triangles without rendering them, but now that I think about it, it might be interesting to see. (Actually, I take that back-- radiosity may require some hairy ops every time the ray bounces. Its been a while since I've put my head inside any rendering code.)

(For some reason, I thought DDR on a PC was getting up to 1700MB/s... Also, do Athlons and P4s have the same efficiency numbers on DDR? Or is that just for the P4?)

[quote] If it is the bus (which is certainly not out of the question; I'm playing devil's advocate here) and if Apple is going to roll out a motherboard where the processor has an onboard memory controller connected to memory via HyperTransport, that bottleneck should disappear promptly. <hr></blockquote>

That would be a thing of beauty.

You get another cookie! What are you going to do with all of them?

[quote] Or, it suggests that the program is compute intensive rather than memory intensive. <hr></blockquote>

<stewie>Oh... Ha, ha, ha, ha.</stewie>

Given the amount of data a 3D pipeline throws around, I seriously doubt it...

G5 Speculation Revisited

Comments