Athlon64 benches... G5 killer?

powerdoc · September 24, 2003 12:51PM

i don't see the interest of benchmarking high end computers with quake 3.

An average result superior to 300 fps means a minimal 100 fps in the most complex scenes. A good CRT has generally a refresh rate of 85 fps or 100 fps. It means that the screen is now the limitating factor for quake 3 (or you should choose a higher resolution).

Speaking of 300,400 or 10 000 frames for Quake 3 do not bring anythihng. All these computer has enough power for bringing the best user experience.

Here is some suggestions about useless benchmarks :

- square root of 2 in ms

- a script with an addition a multiplication and a sinus

- concataining twelves letters of the alphabet

- calculate the age of captain

- give an advice about the high level of discussion in AO ...

Benchmarks are good, if they give info in critical areas, i mean in areas where we should appreciate a faster computer.

mattyj · September 24, 2003 1:43PM

Quake 3 is a load of crap, it should be scrapped as a benchmark tool.

Even guys working at my local Apple centre (London) say that 10.2.7 JUST runs on the G5. It wasn't build to be for speed and showing off its power as such, it was just so it would be able to run. Wait until panther comes out, then the benchmarks will be worth looking at.

Why is everyone getting so worked up about this? Wait until the real operating system for the G5 is out at least before all the 'doom and gloom'.

rickag · September 24, 2003 1:50PM

I for one am impressed by the Athlon64. Especially, the few prices I viewed in systems available(~$1500 - ~$1800)

As a Mac user, am I worried - no. The Athlon is clocked higher, has 1064 L2 cache and should perform better in certain tasks. Big deal, I say good for AMD, kudos and all that to the Athlon64 designers.

Methinks who should be worried at the moment is Intel not Apple.

IBM will be addressing any performance issues with the 970(ie. integer performance, SIMD performance,only compared to the G4+, which is still far superior to MMX, etc.) and Apple will continually be optimizing for the 970. The multiprocessor capabilities of the 970, based on Xeon comparisons are the best bar none.(I wonder if the time manufacturing the POWER4 architecture may be involved some how - he,he).

I'll say it again, it's not Apple that should worry, it's Intel. Ha, people don't need or want 64bit operating systems, maybe maybe not, but they sure as heck love the speed of these new 64bit beasts.

programmer · September 24, 2003 1:54PM

I don't think Q3 uses dual processors at all, or if it does the work isn't divided very effectively. It also doesn't use AltiVec, and I'm not aware of a version that was G5 optimized. The code was originally architected for the x86 and the Mac version retains some design decisions which do not suit the PowerPC particularly well. I wouldn't hold it up as a typical example of G5 performance.

placebo · September 24, 2003 2:19PM

Only Photoshop and video editing will truly tell...

anonymous karma · September 24, 2003 2:20PM

Uh, no. The problem is that the Athlon 64 / Opteron are NUMA systems, which work great for massive scalability but run into some bottlenecks on a 1-processor system - and some worse issues on 2P systems.

One of the goals of system designers over the past ten years has been to allow each of the various components of the system to speak to each other with a minimum of bandwidth competition. Now, when looking at a buzzword list for the Athlon 64, you'll find a lot of terms like HyperTransport that would indicate that they also tried to pursue this when designing the Athlon 64. Take a look at an (older, but still the same) block diagram:

http://common.ziffdavisinternet.com/...i=23070,00.jpg

Note where the RAM is with respect to the CPU. In an ordinary system (like the single-processor G5), the RAM is just to the south of the CPU, off of a dedicated system controller chip. Now, this is not much different, and indeed it isn't. The type of bottleneck that can occur, however, is when a device (most likely an AGP card) wants to DMA transfer from main memory while the CPU wants to stream generated data directly to some other I/O device (disk or ethernet being the most likely culprit). In a well-designed traditional layout, this would not cause a bottleneck, but in this system it would tie up the bandwidth between the CPU and all of its I/O devices. This case is exceedingly uncommon, however, and the Athlon 64 still usually has bandwidth to spare.

Now take a look at the block diagram for a 2P Opteron workstation:

http://common.ziffdavisinternet.com/...i=23072,00.jpg

Note that the system memory is actually split up between the two CPUs. How does this work? This is an example of a Non-Uniform Memory Access architecture; each CPU knows two different ways of getting at memory. It takes a lot of smarts in the operating system to be able to schedule processes effectively to minimize access to another processor's memory.

I should now explain another piece of terminology: when writing multiprocessor programs, it is very often the case that two processors cache the same segment of memory in their L1 or L2 caches. In order to keep these caches "in sync", cache coherency transactions occur which inform the other processor(s) of the state of that segment of memory.

Now compare the to the pretty diagram for the dual G5. The system controller, here labeled 4, has the ability to simply pass along cache coherency transactions to the other processor (snoop packets, as I believe they're called) - occurring, of course, at the G5's full bus rate. Meanwhile, any other I/O device can be accessing memory and not interfere. On the Opteron, however, 1/2 of main memory is not directly accessible from the I/O devices, and accesses to this part of memory must tie up the bandwidth between the two processors. Scheduling by the OS (which as I said is extremely difficult) can help alleviate this problem, but it is bound to occur in any threaded program which operates on chunks of memory and then sends the memory to an I/O device.

So, given that, why would you choose to do a NUMA design at all? Well, it's scaling a traditional design to >4 processors is insanely difficult, and generally the types of processes running on these systems each access their own isolated portion of memory. But in a two-processor workstation this design simply makes no sense. Is the slightly lower latency to some portion of memory worth the extra bottlenecking? Not to me. Of course there are benchmarks that will still perform quite well in this configuration, but anything that involves actual use of the dual processors as anything more than two nodes in a cluster for an embarrassingly parallel problem will run into these bottlenecks.

(One caveat: the bus speed on the Athlon 64's bus is quite high - often 1.6GHz. This bus is half the bit width of the G5 bus, so at 1.6GHz it provides the same bandwidth as the G5's bus at 800MHz.)

wmf · September 24, 2003 7:48PM

Quote:

Originally posted by Anonymous Karma

Uh, no. The problem is that the Athlon 64 / Opteron are NUMA systems, which work great for massive scalability but run into some bottlenecks on a 1-processor system - and some worse issues on 2P systems.

Your post has the right facts but mostly wrong conclusions. A dual Opteron has four-channel memory (~10GB/s), while Intel and Apple systems only have dual-channel memory (~6GB/s). The dual Opteron smokes in real-world benchmarks; that's all that matters.

(Don't worry folks, I still love the G5. My office has 21 of them on order.)

programmer · September 24, 2003 8:09PM

Quote:

Originally posted by Krassy

do you have an idea why applications with panther shall get a performance bump??? i don't get this by now...

Better scheduler, better processor affinity, optimized graphics engine, optimized drivers, greater parallelism in the kernel, G5 optimized code, optimized memory utilization, improvements to virtual memory subsystem, etc etc. MacOS X is a virtual machine and improving the implementation of that machine will improve the performance of applications running in it.

eugene · September 24, 2003 9:41PM

Quote:

Originally posted by rickag

I for one am impressed by the Athlon64. Especially, the few prices I viewed in systems available(~$1500 - ~$1800)

An Athlon 64 3200+ (2.0 GHz) costs ~$460. First gen chipset based mobos cost ~$150.

I am not impressed at all. It has already run into many of the same limitations the Athlon XP had in terms of scalability...meaning they'll be stuck at 2.0-2.2 GHz for a while yet. Plus, when you can buy a P4 2.4C for $170 + a decent Intel i865PE motherboard for <$100 and overclock it well past 3 GHz on minimal air cooling, what's the better deal?

boemane · September 24, 2003 9:56PM

Quote:

Originally posted by Eugene

The Xeon isn't faster than the P4 for everything. They have more cache, but are limited to a 533 MHz FSB. In addition, the Xeons Apple measure against are 3.06 GHz whereas the P4s are 3.2 GHz. The 3.2 GHz P4 Extreme will trump everything else, and the 3.4 and 3.6 GHz models after that.

Just look at the available numbers.

A SINGLE 2.2 GHz Athlon 64 FX-51 beats a SINGLE 3.2 GHz P4 by a little.

A SINGLE 3.2 GHz P4 beats a DUAL 2.0 GHz G5 by a lot.

What does this say? I don't know really. I'd like to think it says the game is still highly unoptimized for us Mac users.

Yes and no. Both the P4 and the Athlon 64 had a graphics card with TWICE the memory!! 128MB vs 256MB makes a lot of difference in fps, cuz the graphics card does most of the work.

Why do people even use games to measure processor speed ? It measures GPU speed, not CPU speed!

.:BoeManE:.

anonymous karma · September 24, 2003 10:38PM

Quote:

Originally posted by wmf

Your post has the right facts but mostly wrong conclusions. A dual Opteron has four-channel memory (~10GB/s), while Intel and Apple systems only have dual-channel memory (~6GB/s).

Actually a dual Opteron just has two memory controllers. As I said for embarrassingly parallel work the Opteron is nice, but the reason most people SMP and not a cluster of $300 Linux machines is because most problems are not /that/ parallel. Across the bus between the two processors there is only 6.4GB/s and this bandwidth, unfortunately enough, will be conflicted between memory transfers and MOESI coherency transactions. I'm sure there are other applications where the Opteron looks nice, but to me that NUMA design just looks like a bottleneck on a 2P system.

Of course perhaps you were talking about playing games with this expensive system.

kim kap sol · September 24, 2003 10:45PM

BoeManE speaks the truth.

How come we always get stuck with the shittier cards anyways?

BotMatches in UT2k3 test the CPU instead of the GPU though...and G5s seem to do poorly in those benches...not sure why though.

chagi · September 24, 2003 11:43PM

Quote:

Originally posted by kim kap sol

BoeManE speaks the truth.

How come we always get stuck with the shittier cards anyways?

BotMatches in UT2k3 test the CPU instead of the GPU though...and G5s seem to do poorly in those benches...not sure why though.

I personally have high hopes for IBMs ability to ramp the clocking of the current CPUs, as well as their ability to develop next generation chips.

If Apple ships a dual 3 GHz tower early to mid 2004, it's going to be a big leap in the race to be competitive. Intel and AMD will probably be shipping 3.5GHz at most in the same timeframe (clock GHz for Intel, PR rating for AMD).

Also worth noting is the Intel P4, I mean Xeon, "Extreme Edition" chip. 2MB cache just in time to tinkle on the AMD parade, and Tom's Hardware got it in their head to include a 3.6GHz version in their benchmarks.

Speaking of Tom's Hardware, some of the Apple oriented comments are hilarious (not to mention outright wrong).

krassy · September 25, 2003 1:59AM

Quote:

Originally posted by Programmer

Better scheduler, better processor affinity, optimized graphics engine, optimized drivers, greater parallelism in the kernel, G5 optimized code, optimized memory utilization, improvements to virtual memory subsystem, etc etc. MacOS X is a virtual machine and improving the implementation of that machine will improve the performance of applications running in it.

hehe - i hoped for you to give me such an answer

- sounds very cool

eugene · September 25, 2003 2:08AM

Quote:

Originally posted by BoeManE

128MB vs 256MB makes a lot of difference in fps, cuz the graphics card does most of the work.

NO. Not for Quake 3 at 1024x768x32. Nice try.

boemane · September 25, 2003 2:20AM

Quote:

Originally posted by Eugene

NO. Not for Quake 3 at 1024x768x32. Nice try.

Why wont the graphics card make a difference, even at such a low resolution ? Since the GPU does most of the work, it should make a differece in the card has 128 or 256 MB of memory ?

Of course, the bus and other architectural widgets has something to say, but I would thing the graphics card makes the most difference.

.:BoeManE:.

eugene · September 25, 2003 3:46AM

Quote:

Originally posted by BoeManE

Why wont the graphics card make a difference, even at such a low resolution ? Since the GPU does most of the work, it should make a differece in the card has 128 or 256 MB of memory ?

Of course, the bus and other architectural widgets has something to say, but I would thing the graphics card makes the most difference.

.:BoeManE:.

Because the damned game doesn't fill all the memory at that particular resolution and pixel depth.

powerdoc · September 25, 2003 4:45AM

Quote:

Originally posted by Eugene

Because the damned game doesn't fill all the memory at that particular resolution and pixel depth.

At this resolution even 64 MB will be more than sufficiant.

Very few games, currently take advantage of 256 MB vs 128 MB. All the benchmarks i have seen do not show a difference than more 1 % .

zapchud · September 25, 2003 6:32AM

Quote:

Originally posted by Programmer

I don't think Q3 uses dual processors at all, or if it does the work isn't divided very effectively. It also doesn't use AltiVec, and I'm not aware of a version that was G5 optimized. The code was originally architected for the x86 and the Mac version retains some design decisions which do not suit the PowerPC particularly well. I wouldn't hold it up as a typical example of G5 performance.

Actually, Q3 uses dual processors pretty well (180/200 -> 360 FPS on my dual 1GHz). The newest version does also use Altivec for rendering the .md3-models, which helps a little. 10%, or so.

The problem is that, as you say, it's architected for the wrong ISA, and the PPC takes a huge hit when the various effects are turned on. Their algorithms are just not optimized, compared to their x86 equivalents. Another provable area is server-client code. Ever set up a server with the mac? Try load some people in, and see how performance degrades. Even with dual processors. Then do exactly the same with the PC. Ouch. Smooth as a womans ass.

But I agree, Q3 and games generally is not a good way to compare what the processors are really capable of. It's more a benchmark of the porting effort.

pb · September 25, 2003 6:36AM

Quote:

Originally posted by Zapchud

But I agree, Q3 and games generally is not a good way to compare what the processors are really capable of. It's more a benchmark of the porting effort.

Very well said.

Athlon64 benches... G5 killer?

Comments