G5 in Jan - new info

applenut · December 30, 2001 2:35AM

damn, this thread brings back memories of the old AI where people actually knew what they were talking about

overtoasty · December 30, 2001 6:34AM

[quote]Originally posted by macrumorzz:

[QB]

Anyway, as already mentioned by others it doesn't make sense for Apple to move to 64 bits, as only very few apps will take real advantage of the 64 bit registers, and I think that a powerful SIMD unit (like Altivec) is much more efficient in most cases than a 64 bit ALU.

[QB]<hr></blockquote>

If Apple's going to survive, it's got to capture new ground one beach-head at a time; and if Apple wants the scientifc market (bioinformatics, engineering, etc) - it's got to go 64bit to be taken seriously.

davegee · December 30, 2001 7:11AM

[quote]Originally posted by Nonsuch:



"Little Johnny" is not a grown-up working at a real job.

I'm not making any statements about the original poster's veracity, but I am saying that if Apple is going to the trouble of sending out secure hardware prototypes, they would certainly do so under strict agreements with their testers stipulating that the testers will make no effort to open the cases or otherwise ascertain what's inside them. Some idiot cracking open such a unit would not only get himself into trouble, he'd probably keep any developmental hardware out of his company's hands for a good while after.

Would I be tempted to peek inside? Absolutely. Would I actually do it? No.<hr></blockquote>

Not in a physical sence... I was talking about 'peeking' via whatever software methods...

Dave

syn · December 30, 2001 7:36AM

Moving beyond 64-bit is even less likely, instead there will more likely be a more fundamental shift in the processor design paradigm at some point in the future.

FWIW:

That's what I thought too... until a friend of mine at Tubingen University told me one of his teachers lectured them about a 128bit version of the Alpha... He insisted it was 128bit integer, and not say a 128bit SIMD unit on top of the rest...

When one talks about 64bit processors, one thinks about the huge amount of RAM those can handle, and then one thinks "I don't need this on y desktop".

Yet what we don't think about is this: Simply needing to allocate over 4GBs of RAM needs a 64 bit cpu (that is of course excluding hacks such as 36bit adress space etc.). Apple clearly eyes the pro-3D market, and a 64bit machine, priced along the lines of the current line, would quite definitely induce a buying frenzy in the major studios. And as it's been said before, the sheer "first consumer 64bit machine" is enough to warrant such a machine.

Of course this does not way when a G5 chip will ship. It's even quite possible we get a G5, but the 32bit variation of it.

But it's fun to speculate nonetheless

mmicist · December 30, 2001 9:16AM

I thought I would take a bit of time out and try and speculate on what there might be in the G5 which would cause it to be a considerably faster chip than the G4. Why not, it's a holiday

.

My suspicion is that the rumours of about 58 million transistors and a very large die may be close to the mark, as Apple would have said to themselves, let's move away from the attitude of embedded device designers for this thing, as it seems to be limiting the performance unacceptably, and allow ourselves a huge area and a lot of power dissipation from the start

. The power dissipation and cost can be dealt with in later revisions for portables and low cost devices (or they can use G4 class devices).

There are two main ways of making a faster chip, increase the clock frequency, and increase the work done each clock cycle.

First off, I don't actually know what is the limiting factor on the scaling of the G4, something(s) in the chip is(are) causing it to have problems with high frequencies, and I will just assume this has been mended (at least to a certain extent) in the G5.

From what I can see, the G5 has a 10 stage pipeline, instead of the 7 stage of the latest G4s, this will certainly help speed things up. Otherwise I suspect speed improvements will come mainly from process changes and corrections of errors in other areas.

These changes require relatively little addition to the chip area, a few extra rename registers to cope with the deeper pipeline and execution units

For the increase in work per clock cycle, I suspect the most important element would be an improved memory system, this seems to include an enlarged L2 cache of 512K, which would mean an extra 13 million or so transistors, but little increase in area. I really hope they have an embedded memory controller using DDR RAM, as this would massively increase the bandwidth and reduce the latency (delay) of the memory system, and of itself give a considerable increase in work per clock cycle. If the memory controller could handle dual channels (128 bits wide) this would be killer bandwidth (up to 5 times the maximum of current machines, which appear to have usable bandwidth well below their low theoretical limit) which is vital for things like streaming video processing and indeed scientific modelling.

The other increase in work per clock cycle would presumably come from increasing the number execution units. As far as the integer units are concerned, there probably isn't much point in putting in more units, as it is unlikely much more instruction level parallelism can be found, although another unit for address generation to take full advantage of the new memory system would maybe be worthwhile.

Increasing the FPU units would almost certainly provide worthwhile speed-up, as most FPU intensive code is not heavily branched, but looped, and I would expect to see at least one more multiply/add unit in the FPU, and the new memory system would be able to keep it fed with the data it needs.

As far as Altivec is concerned, rumours that they have had to work hard to get the same per clock efficiency as the G4 units would suggest that they have not significantly chenged the number of execution units here, although I personally would like it if they have a double precision vector FPU unit to match the Pentium 4's capabilities in this regard.

At a system level, the inclusion of point to point busses would considerably improve multi-processor performance, and I expect to see at least two system busses, either rapidIO or Hypertransport, although I would opt for Hypertransport as the more likely, especially if they have been able to tap into AMD work on the busses for their upcoming Hammer series chips. (Note, I discount the possibility of AMD fabbing the G5 for Apple, as they currently have no excess capacity at their Dresden plant, and are looking to outsource the production of some of their own devices. Sharing design work on peripherals etc., however, I think quite likely).

So my guess is two 16 bit Hypertransport 400MHz double pumped bidirectional busses, or maybe one 32 bit one, this is limited by the number of pins required for each bus (each 16 bit bus uses 103 pins).

The only problem with this analysis is that it doesn't leave enough pins (if you believe the approx. 550 pin package rumour) for the L3 cache interface, which Motorola have definitely talked about. Personally, I would drop this interface as unnecessary, since there is 512K of level 2 cache, and the onboard memory system would mean a ridiculously large and fast L3 would be required in order to see any significant benefit in performance.

I'll stop here for now, and put what I think this all means for the performance of the new chip in another post if anybody wants it.

Note I'm not making any guesses as to whenthis new chip may be appearing.

Michael

heinzel · December 30, 2001 9:37AM

Mmicist, thank you for the in-dept analysis. I wonder if the withering away of the Alpha together with the expectable performance of the G5 MacOSX wouldn't leave a perfect window of opportunity for Apple in the server market?

xype · December 30, 2001 10:12AM

[quote]Originally posted by heinzel:

Mmicist, thank you for the in-dept analysis. I wonder if the withering away of the Alpha together with the expectable performance of the G5 MacOSX wouldn't leave a perfect window of opportunity for Apple in the server market?<hr></blockquote>

I think if Apple can produce enough G5 chips it can of course make inroads into the server market - given that the G5 chips are well-scalable and that OSX is scalable as well. If Apple partners up with SGI (as some rumors suggested) I could see Apple taking a lot of the server market. Unless they get greedy..

mmicist · December 30, 2001 10:53AM

[quote]Originally posted by heinzel:

Mmicist, thank you for the in-dept analysis. I wonder if the withering away of the Alpha together with the expectable performance of the G5 MacOSX wouldn't leave a perfect window of opportunity for Apple in the server market?<hr></blockquote>

My pleasure.

Certainly would give them a chance in the server market, but I don't know that they would want to get into that fight, its very vicious at the moment, and they certainly don't want to antagonise IBM too much. The G5 I have postulated, whilst much better than G4 for servers, would not really be a competitor for the Power4 or new UltraSparcs, as it doesn't really have the internal bandwidth and some other bells and whistles, I suspect it is really very difficult to design a chip suited to both consumer needs and high end server requirements.

If the G5 is that nice, however, we may see third party G5 machines running Linux/*bsd as servers to compete with x86 machines.

Michael

xype · December 30, 2001 11:02AM

[quote]Originally posted by mmicist:

If the G5 is that nice, however, we may see third party G5 machines running Linux/*bsd as servers to compete with x86 machines.

Michael<hr></blockquote>

Indeed we may likely see some inroads in the cheap-o server merket if Apple decides to get into it, since a G5 rackmount server would probably outperform Intel/AMD solutions and would work nicely as a file/network server in companies (actually digital content creation studios and graphic shops pop to mind) and with it's 64 bitness it might even be a nice solution for a rendering farmlet.

the mactivist · December 30, 2001 11:59AM

[quote]Originally posted by mmicist:



they certainly don't want to antagonise IBM too much.<hr></blockquote>

Wouldn't IBM have had at least some hand in the development of the G5 - and isn't it even possible that IBM is making these chips?

Just wondering. Seems to me like IBM would benefit from having another solid RISC processor to integrate into their products. And it seems to me that the market Apple would be after is the graphics imaging, high end rendering market as opposed to the file server / network server market.

I may have just repeated you, sorry if I did.

cheers,

TM

the mactivist · December 30, 2001 12:02PM

[quote]Originally posted by applenut:

damn, this thread brings back memories of the old AI where people actually knew what they were talking about

I agree - I think I've learned more in this thread than I possibly ever have on AI. Thanks to everyone for the intelligent posts. And thanks to the original poster, even if you're a scammer, for being somewhat level headed.

And if you're for real, thanks for taking the risk to tell us whatever you could... (And if you have anything more you'd like to share, please feel free!)

renan · December 30, 2001 4:07PM

Macskull, it is pretty bad you have to put someone down, so you can feel better.It proves you are one S.O.B..She did a good job on you.Maybe, Apple will include broadband on the motherboard for the G-5 server later in the year.Or most probably a third party PCI board for support.Cisco will be using systems with broadband.

g-news · December 30, 2001 5:23PM

But even a 12x G4 is not a 64bit machine....

Anyway, here are a few facts for a change, vague, but 100% true, unless Apple goes belly-up:

-We will see a G5 sometime

-That G5 will be faster than the G4

-We will be amazed by MWSFs announcements and releases, no matter what they are, we have always been in the past, so there is no reason why we wouldn't be amazed this time.

8 days to go, and you will know more, rumoring and guessing is fun, arguing about it is not, take it easy, drink some tea and wait.

G-news

macrumorzz · December 30, 2001 5:58PM

Here is some more interesting info that might interest people. Maybe you have wondered what that "pipeline" stuff is. I will try to explain it in a few sentences, of course I will have to simplify everything a bit, so it might not be 100% accurate.

Every processor command must be decoded by the CPU, and this is done in the pipeline (resp. one of the pipelines). The CPU must identify the command, extract register and option information, fetch the necessary data and much more. This can be done using only few steps (short pipeline) or in many steps (long pipeline).

The advantage of a short pipeline ist that the commands will be ready rather quickly, and it has many more advantages (in case of a wrong branch prediction all commands in the pipeline are lost, for example). Usually a CPU with a shorter pipeline offers better performance than a CPU with longer pipeline.

On the other hand a short pipeline is much more complex, the circuits for each stage use much more transistors. Complex transistor circuits mean much heat, and this may cause problems. The heat of the entire CPU die is not the only problem when you design a processor, there may be single circuits on your die that use too much transistors and become very hot (and get toasted if you're not careful).

That's why the 7-stage G4 can be clocked higher than the original 4-stage G4. I could imagine that Moto had really huge problems with the G4 pipeline as it was too complex, so they could not increase the clock rate (increasing the clock rate also means more heat, unless you reduce the die size and the power consumption of the CPU).

So one of the most difficult things when designing a CPU is to place your circuits and transistors on the right place on the die, to prevent places where too much heat is produced. Creating longer pipelines is one way to do so.

macrumorzz · December 30, 2001 6:06PM

One more thing: the 58 mio transistors that have been rumored could be accurate. Usually I calculate 20 mio transistors for 256 kb L2 cache (although this depends a lot on the L2 design too). If the G5 should then have 512 kb L2 cache, leaving 18 mio transistors for all circuits and the L1 cache. Let's consider the G5 has 2x64 kb L1 cache, that should leave us with less than 8 mio transistors for all logic circuits.

Generally you can double the number of transistors when going from 32 to 64 bits, but the G5 is somewhat special. AltiVec transistors will not double, on the other hand some more circuits could be implemented (like ocean, RapidIO,...). This is rather hard to say as we don't know much about the units the G5 will use, neither about these technologies, but it *could* be correct.

programmer · December 30, 2001 6:20PM

[quote]Originally posted by The Mactivist:



And if you're for real, thanks for taking the risk to tell us whatever you could... (And if you have anything more you'd like to share, please feel free!)

<hr></blockquote>

I'm not sure what you mean by "for real"... I don't think anyone in the latter half of this thread claims to know anything of what is actually going on in the PowerPC camp, we're just a bunch of technically savvy Mac-fans who are hoping Apple has finally managed to claw its way back to the bleeding edge of processor design. Its been there before -- on the day of their introduction the 68040, 601, and 604 were all very competitive. The G3 lost its edge somewhat because it was just a relatively minor enhancement to the 603e, and the G4 was essentially a G3 with AltiVec (still one of the best, if not the best, vector units available). The core of the 7450 has had its pipelines lengthened to allow higher clockrates, and some of its internal buses widened... but it isn't much more superscaler than the G3 was. The PowerPC also hasn't yet taken advantage of the rapid increase in available transistor count.

The blurb about pipelines posted above is interesting, but keep in mind that the PowerPC has far less instruction "decoding" to do than x86 processors. The PowerPC instruction set was well designed from the start to be easily decodable, whereas the x86 guys have to have a whole bunch of pipeline stages to decode the instructions and figure out how to turn them into actual opcodes for their current core. The flip side of that is that they can change their opcodes at will and still run x86 software because they're translating them anyhow.

As mmicist said, the G5 rumours so far indicate that they've finally jumped into the deep end of the transistor pool with both feet (58 is fully within reason -- the geForce3 has 65 million!). Extending the pipeline length to 10 and using the latest fabrication technologies allows the clock rate increases we've been hearing about. A few extra execution units, even wider busses, more registers, larger tables, etc can all make for performance improvements but individually they're all pretty minor increments which is why designers on a strict transistor budget wouldn't add them... diminishing returns. An embedded designer will add more of something until the major payoff is acheived, a designer for the non-embedded market will add more until he runs out of transistors.

Intel's move toward what they call "hyper-threading" is interesting. They basically make the processor extremely superscalar, but then allow it to run multiple threads at once. This means the execution units are (hopefully) shared in a very fine-grained manner between multiple threads. Cool notion, as long as your threads don't all want to use the same execution units at the same time.

The alternative is to go multi-core, which is mainly multiple processors on the same die. Doubtless, some blend between these two options can be achieved. I think this is exactly the kind of thing the embedded guys aren't likely to do, but which Apple would choose to do if it had control over its own processor design. Perhaps the G6, right?

I'm not sure what else the G5 could do to run faster -- the majority of instructions already run with 1 clock throughput, and there is a limit to how many instructions you want to having running at once (instructions frequently depend on the results of other instructions, which limits how many you can do all at once, without delay). I haven't done enough PowerPC programming recently to have a feel for what the chip's bottlenecks are, aside from the ever-present memory bottleneck. If they can get that one licked (even partly), it'll be a huge leap forward. HyperTransport will help there.

I glanced over the RapidIO and HyperTransport synopsis, and it feels like RapidIO is overly complex for what Apple would want and HyperTransport delivers higher throughputs sooner. I could see, going forward, Apple using HT and the new Intel expansion bus standard. nVidia is onboard with HT as well, so CPU <-> GPU communications via HT would give 3D graphics a real kick in the pants. Right now feeding the GPU is a huge bottleneck, and its just going to get worse as the programmable shader technology really swings into high gear. I want my lightning fast AltiVec unit(s) to be able to feed data to the GPU without stalling on the memory bus.

[ 12-30-2001: Message edited by: Programmer ]

kidred · December 30, 2001 6:32PM

Programmer-

Over at MacNN someone said the new card in the G4/G5 will be the nForce, a Nvidia card with HT. Would that make a big difference for the 3D market? A graphics card that handles the HT? They siad it would be intoed at MWSF.

So a G4 running at 1.2ghz or so with DDR and this card, would that be good enough for now or make new inroads for Apple?

tco · December 30, 2001 6:36PM

nForce is a motherboard, not a graphics card.

kidred · December 30, 2001 6:38PM

[quote]Originally posted by TCO:

nForce is a motherboard, not a graphics card.<hr></blockquote>

Oh. OK, so what does that mean in terms of it's concept with the G4? I'm not a techie so does that mean there will be two motherboards? Then what's the graphics card to be used? The GForce 3 or the rumored Elsa card?

mslee · December 30, 2001 6:40PM

Technically, the nForce does have GF2MX200-class integrated graphics....

G5 in Jan - new info

Comments