G4 and current I/O

tiramisubomb · May 22, 2002 12:46AM

Numbers are confusing, and block diagrams are easier to understand. I have looked thru the PM G4 block diagram in Apple Dev Notes and notice a few things that just don't piece together. Perhaps some of u who r more familiar with CPU can give me a better insights.

1. L3 helps can transfer up to 4GB/sec per G4. However, all information passing into the G4 for number crunching were via the Max bus at 133 (only 1GB/sec). So how can the backside cache be of anyhelp. I can see there are performance benefits for L3 if separate bus transfer data to L3 first.

2. Another thing, the recent release of Xserve sports a new DDR memory module with peak bandwidth of 2.1GB/sec. But memory module does not communicate directly to the processor but via a bridge (System controller) through Maxbus. Since all data goes through the Max Bus which operates at only 1GB/sec. Whats the point of having DDR. I don't see any performance benefit.

3. As for games, PC usually have better frame rates compare to Mac, even with the same card. I wonder if the faster performance was due to better system architecture than the Mac. Since AGP bus connects to the G4 via the bridge, is it because of the limited bandwidth the Max Bus offers. AGP 4X takes about 1/2 GB/sec of bandwidth. I wonder, whether the slower graphics performance was due to insuffcient bandwidth of current Max Bus architecture. How about PC running with the same 133 MHZ bus?

programmer · May 22, 2002 9:31AM

[quote]

1. L3 helps can transfer up to 4GB/sec per G4. However, all information passing into the G4 for number crunching were via the Max bus at 133 (only 1GB/sec). So how can the backside cache be of anyhelp. I can see there are performance benefits for L3 if separate bus transfer data to L3 first.

<hr></blockquote>

A cache keeps around previously accessed data that got to the CPU by another means. It does this so that if the CPU needs it again (a frequent occurance) it can have it back much faster than going back to the source. The L3 is, therefore, a benefit but not a replacement for faster memory. The "backside" nature of the cache means it has its own bus coming out the "back" of the processor, rather than being hooked to the processor's "frontside bus". They can make it faster than the frontside because the cache can be kept closer physically and it is private to the processor.

[quote]

2. Another thing, the recent release of Xserve sports a new DDR memory module with peak bandwidth of 2.1GB/sec. But memory module does not communicate directly to the processor but via a bridge (System controller) through Maxbus. Since all data goes through the Max Bus which operates at only 1GB/sec. Whats the point of having DDR. I don't see any performance benefit.

<hr></blockquote>

All the data to the processor goes through the MPX bus. There are other things in the system, however, which can achieve higher throughput or which can operate simultaneously with the MPX bus. On a server these other I/O devices are typically quite active and thus having extra bandwidth that the processor(s) can't use is useful. On a desktop machine with an active graphics chip that is constantly reading across the AGP bus it could be useful as well -- Quartz Extreme will be doing this. It would obviously be better to give the processor's full bandwidth, but since they are limited by their MPX bus Apple has done the best they can with what they have. Hopefully Motorola takes the cuffs off soon.

[quote]

3. As for games, PC usually have better frame rates compare to Mac, even with the same card. I wonder if the faster performance was due to better system architecture than the Mac. Since AGP bus connects to the G4 via the bridge, is it because of the limited bandwidth the Max Bus offers. AGP 4X takes about 1/2 GB/sec of bandwidth. I wonder, whether the slower graphics performance was due to insuffcient bandwidth of current Max Bus architecture. How about PC running with the same 133 MHZ bus?

<hr></blockquote>

There are a bunch of reasons, and it depends on the particular Mac you are talking about. Before the current generation of PowerMac & G4 there was no write combining to non-cached memory (important for fast graphics cards), and the memory subsystem was relatively slow. With the current machines the SDRAM memory has been pretty well optimized, but you're still going to see issues with less efficient drivers and less well optimized software. The software is written for the x86, and there might be a version optimized for the Athlon as well as one or two versions of Pentium... it is unlikely that the Mac port will be as well optimized. The return from the sales into the Mac market just doesn't justify it. Also, the current Apple-provided gcc compiler on MacOSX has pretty poor code optimization at the moment, but they are fixing this with Jaguar... but I doubt this has been much of a factor in Mac games so far.

If you have a well optimized game (for both platforms) running on good graphics drivers on a 1 GHz G4 with MacOS9 or MacOSX.2 and you are comparing it to an x86 running with SDRAM or one of the lesser DDR mobos and they both have exactly the same graphics chipset with the same VRAM size, running in the same resolution, with the same settings, then the Mac will probably come out on top. An Athlon will probably be faster if the game is very heavy on floating point and neither system is memory bound, but if they are each vector optimized (i.e. AltiVec & 3DNow!) the G4 will kill the Athlon. The G4 in this equalized conditions will generally beat a PentiumIII, and really slaughter a PIV (especially if vector optimized). Unfortunately these equalized conditions have probably never occured in any game produced so far. Part of the problem is that even the C code can be written in a slightly different way so that the PowerPC compilers do a much better job than x86 compilers, and visa versa. The industry is divided into two parts: people that write code which isn't optimal on any machine with any compiler, and people that write code which is optimal on the x86. Almost nobody has learned to write code that lends itself better to the PowerPC. I'm hoping that one day the PowerPC guys have compilers that are smarter (note that the x86 compilers aren't smarter, they are just given code that is more ameniable to the x86), but there are just some issues built into the design of the C/C++ language which compilers just can't get around easily. There are also basic design decisions in software which often highlight a particular machine's inadequacies or fail to take advantage of its superiour features... and the Mac will always lose in this department because the software is almost always written on the x86 in the first place to take advantage of its strong points.

Now, of course, the latest PCs have faster memory systems and faster processors so Apple has some catching up to do.

[ 05-22-2002: Message edited by: Programmer ]

*l++ · May 22, 2002 9:36AM

[quote]Originally posted by tiramisubomb:

Numbers are confusing, and block diagrams are easier to understand. I have looked thru the PM G4 block diagram in Apple Dev Notes and notice a few things that just don't piece together. Perhaps some of u who r more familiar with CPU can give me a better insights.

1. L3 helps can transfer up to 4GB/sec per G4. However, all information passing into the G4 for number crunching were via the Max bus at 133 (only 1GB/sec). So how can the backside cache be of anyhelp. I can see there are performance benefits for L3 if separate bus transfer data to L3 first.

2. Another thing, the recent release of Xserve sports a new DDR memory module with peak bandwidth of 2.1GB/sec. But memory module does not communicate directly to the processor but via a bridge (System controller) through Maxbus. Since all data goes through the Max Bus which operates at only 1GB/sec. Whats the point of having DDR. I don't see any performance benefit.

3. As for games, PC usually have better frame rates compare to Mac, even with the same card. I wonder if the faster performance was due to better system architecture than the Mac. Since AGP bus connects to the G4 via the bridge, is it because of the limited bandwidth the Max Bus offers. AGP 4X takes about 1/2 GB/sec of bandwidth. I wonder, whether the slower graphics performance was due to insuffcient bandwidth of current Max Bus architecture. How about PC running with the same 133 MHZ bus?<hr></blockquote>

1. Most applications have a small set of data that they use very often and a large set of data they use infrequently. The small set of data will hence travel once across Maxbus, stay in the cache and substantially diminish the amount of data that has to be fetched from main memory across maxbus.

2. Servers are made for throughput, no the fastest processing. With the processor (1GB/s), independant ATA/100 controllers (up to 200MB/s) and Gigabit ethernets (up to 200MB/s) + possible 64Bits 66Mhz PCI card (up to 266MB/s) accessing memory simulteanously (DMA) it is good for throughput to have faster memory. The advantages would be much less in a non-server environment where most of the memory bandwidth is used by the CPU.

3. I am not sure this is still true from a hardware standpoint. The Apple AGP at one point did not have combined writes (increases the effective AGP bandwidth), but this is no longer true. There are also issues if the T&L (Transform an Ligthing) where done by the processor, etc... But most new graphic cards really do most of the work. Then it becomes a driver issue in the OpenGL vs Direct X implementations. ATI and NVidia have just poured much more effort in the Win32 driver development than they have in the Mac's.

*l++ · May 22, 2002 9:40AM

Well programmer said it first

It just took so damn long to write !

programmer · May 22, 2002 9:51AM

[quote]Originally posted by *l++:

Well programmer said it first

It just took so damn long to write !<hr></blockquote>

Nice to see that we're in agreement though. Yours is sort of an "executive summary" of my novel.

bobthetomato · May 22, 2002 9:58AM

BTW: the question was answered in <a href="http://forums.appleinsider.com/cgi-bin/ultimatebb.cgi?ubb=get_topic&f=1&t=001670&p=5"; target="_blank">this thread</a>

(by Programmer and Outsider)

g::masta · May 22, 2002 10:27AM

you can have the fastest DDR RAM, the fastest proc, the fastest everything, but with that system bus sitting at 133Mhz, the bottleneck will always win!

Peace,

G

tiramisubomb · May 22, 2002 11:26AM

Thank you for all your explanations.

tiramisubomb · May 22, 2002 12:04PM

Anyway, just wonder about the Pentium IV, its the architecture the same? Are all the data connect to the front side bus via a bridge like the G4? I knew they have higher speed buses and their Rambus based RDRAM have around 4GB/sec bandwidth. But is the chip really that much better?

xype · May 22, 2002 1:19PM

[quote]Originally posted by tiramisubomb:

But is the chip really that much better?<hr></blockquote>

The Pentium IV was described as the biggest fu*k-up of Intel as of late. It's architecture is not based around the idea of good performance but around the idea of scaling to 20 ghz because, frankly, Intel has the idea that ghz sell chips. I think Programmer pointed out already that with the clock speed every chip beats the Pentium IV (yes, Pentium III too, by a high margin, that's why Intel didn't really introduce PIIIs much above 1.1 ghz). It's a "brute force" chip - just pump the ghz up as much as you can. Still a AMD Athlon clocked at ~1600 mhz (XP1900 iirc) beats the crap out of a 2.0 ghz Pentium IV (or at least performs about the same).

G4 is a good chip. If Motorola did not make the mistakes they did the Macs would still be ahead of the x86 world. At least I think so.

matsu · May 22, 2002 1:51PM

I don't think P4 is such a bad desktop chip. Most benchmarks show that the 2.2/2.4 and I'll assume just announced 2.53Ghz P4 is faster than anything else out there.

Is it the fastest per clock? Nope. But it doesn't need to be, it was designed to throw A LOT OF CLOCK CYCLES at any computational problem, and for a desktop it works well.

Maybe not the best solution for a room full of servers, or for a laptop, but for 1 and 2 CPU desktop set-ups, it's the fastest consumer chip going, if not the best bang for your buck alernative -- you have to give that one to the AthlonXP's.

amorph · May 22, 2002 2:33PM

If you want to continue the tangent on currently available processor architectures, please do so in General Discussion.

Thanks.

G4 and current I/O

Comments