G5 Speculation Revisited

ed m. · July 7, 2002 8:09PM

Here we go again... Another Macworld is looming and of course the rumor mills are buzzing with speculation about the fabled "G5". I've been posting a *distilled* collection of what's basically been rumored around the web. I'll do so again, here. Most of the information is unchanged from when I previously posted it (just with minor updates). Even though we all know that a G5 is unlikely, it sure is fun to talk about it just the same. Here is what I've gathered so far. If anyone has any other interesting tidbits to add, feel free to do so. In fact, e-mail me the info and supporting link; I'll tuck it away for safe keeping and when the G5 really does arrive, we can determine just how close the speculation really was ;-)

I consolidated all of the speculation/claims from MOSR, The Register and Architosh. I've snooped around quite a bit and of the sites that your likely to see this information posted include: The Mac Observer, osOpinion, Architosh, The Register, MOSR and MacSlash. Likewise, I e-mailed it to a few other select columnists. You will notice areas of overlap amongst the sites and the articles that talk about the upcoming G5 and speculate about what their "sources" have told them. Keep in mind that I provide ALL THE URL's that seemed relevant. They've become the basis of *my* speculation. I'd also love to hear some feedback too. Anyway, here is a quick rundown of what's been stated on other sites (you can identify the overlap yourselves) :

************************************************** *******************************

The Register's alleged G5 Specifications

European Marketing Communications Manager, Paul Clark, confirmed the existence of the G5 and said that it utilizes:

- Book E

- true 64-bit addressing

- full 32-bit compatibility

- Rapid IO support Book E

Other details of the G5 (MPC 85xx)

- 0.13 micron HIP 7.0 process

- 58 million transistors

- die-size of 192mm²

- 575-pin configuration

Other Claims made by the Register's source:

- 256-bit on-chip bus

- 64-bit integer unit

- 64-bit floating-point unit

(Both the integer and Floating-point units are said to both be four to five times faster than their G4 equivalents)

- 400 MHz "effective bus speed"

- 128KB L1 cache

- 512KB on-die L2 cache

- support for 2-8MB of external L3 cache

- Ten-stage pipeline.

- Low power consumption

- Low heat dissipation

- Speed ranges from 800MHz. to 2.6GHz. (in some rare instances)

************************************************** *******************************

Architosh claims based on Motorola Information I sent there:

- Extensible Architecture

- New Pipeline

- New bus topology/RapidIO Interconnect Architecture

- 32 & 64-bit products with backward compatibility

- Symmetric Processing Capabilities

- 0.13 micron process with SOI initial G5 product

- 800MHz to 2GHz +

- "on-chip" Memory controller

- 400 Mhz. bus

- DDR SDRAM (20GB/s)

- RapidIO

- modular design

************************************************** ****************

MOSR speculation:

[[[

Some people have speculated that the G5 is going to be a multicored chip. This is absolutely false! The major performance enhancements of the G5 come from a totally revamped ultrascalar core design. The FPU and Integer units were completely redesigned from the ground up. There is a multicored G5 design in the works for future apple servers. ]]]

<a href="http://www.macosrumors.com/"; target="_blank">http://www.macosrumors.com/</a>;

This article seems weird because I thought that the MPC8540 *is already* multicore? I could be wrong. It could have just been talking about the fact that it's modular in design You'll have to double check the data sheets I sent to you on the MPC8540.

- DDR RAM

- Gagawire

- 400 MHz. bus

- No multicore design

- AltiVec

************************************************** ********************************

Keep in mind that I've been piecing this together for the longest time now. It's still a bit unclear what exactly will comprise the new G5 that Apple will utilize in their pro-desktop machines, but it's always fun to speculate. In this case, there are a number of clues right under our noses... Check them out:

--------------------------------------------------------------------------------------------------

In a nut shell, here is what I pieced together for OSO and a few other sites:

There a LOT of URL's listed here. Be sure to check them all out!

Start here:

<a href="http://www.osopinion.com/perl/story/14756.html"; target="_blank">http://www.osopinion.com/perl/story/14756.html</a>;

However, for clarity and thoroughness, I decided to post all of the information here.

Here it is:

************************************************** ******************************

This is an interesting page on Moto's site. Notice the first in the G5 family of Microprocessors is labeled: MPC85xx.

Here is the current roadmap of the PPC:

<a href="http://e-www.motorola.com/webapp/sps/site/overview.jsp?nodeId=03M943030450467M983989030230"; target="_blank">http://e-www.motorola.com/webapp/sps/site/overview.jsp?nodeId=03M943030450467M983989030230</a>;

Now here is the MPC8540 chip info:

<a href="http://www.motorola.com/SPS/RISC/smartnetworks/products/hostproc/index.htm"; target="_blank">http://www.motorola.com/SPS/RISC/smartnetworks/products/hostproc/index.htm</a>;

This next page is more specific since it CLEARLY lists the chip as being part of the MPC85xx (i.e., G5) product line. It looks to be a 32/64 bit hybrid processor. Now, I don't believe that this is actually [the] G5, but rather a [cousin] or close relative in the 85xx family. It appears that Moto. may have snuck this one in under our noses (but it's no secret anymore). I *do* suspect that the actual "G5" that will be used by Apple will probably include AltiVec as well as other technologies found on the new MPC8450 chip. Remember that Moto's designs are modular, so the question is, which technologies will likely make it into the desktop G5's that Apple will be using?

<a href="http://e-www.motorola.com/webapp/sps/site/prod_summary.jsp?code=MPC8540&nodeId=01M98655"; target="_blank">http://e-www.motorola.com/webapp/sps/site/prod_summary.jsp?code=MPC8540&nodeId=01M98655</a>;

So, we now have a MPC8540 chip (Just go to Moto's site)... Again, this chip is CLEARLY part of the G5 (85xx) family (and it isn't *the* G5), but who really knows what the actual G5 will be? It *might* be dual core, It will likely implement AltiVec and other technologies present in the now available MPC8540 such as RapidIO and an integrated memory controller. That will probably be the most significant feature of the new chip. Rather than having some "supporting chipset" and controller, the implementation will be directly on the chip! It's been rumored that adding this feature alone is worth up to a 40% performance increase. Both technologies are already present in the MPC 8450. Keep in mind that according to Moto's Road map, the G5 is designated as the MPC 85xx... Other rumor sites claim that it's 75xx.

************************************************** ******************************

I suspect that Apple will soon be introducing the G5 processor systems. I believe we can get a *hint* as soon as soon as Apple equips their consumer models (i.e., iBook and iMac) with the G4 processor.

[Update]: We've already seen this happen to the iMac's ;-)

As soon as that happens, my guess is that we will see the G5 roll out in their pro-desktop systems. This is so there won't be any competition between consumer and pro. Expect a newly designed motherboard, and memory scheme as well (no supporting chips or bus per se', but rather that "integrated memory controller" ). Here is what I expect to see based on some gathered information:

- Next generation AltiVec unit sporting much faster performance possibly even a 256-bit variant at that. (AltiVec2 ?) This is a BIG guess and *based on no actual fact*. Again, it's just a bit of guessing and hoping. Most likely it will probably be a beefed-up version of the current AltiVec which is by no means a slouch. However, expect the new unit to completely maintain backward compatibility with first generation AltiVec. They could be speculating about something called "parallel AltiVec".

Hints taken from page 16 of the PDF document found here:

<a href="http://www.navo.hpc.mil/pet/sip/PDF/Germann_SIP_2.pdf"; target="_blank">http://www.navo.hpc.mil/pet/sip/PDF/Germann_SIP_2.pdf</a>;

If the URL doesn't work, contact me for the PDF. I have it (somewhere)

[email protected]

It's safe to say that Sky Computers is a reliable barometer for future chip releases and they clearly state that Moto is going to be dropping MaxBus in favor of RapidIO. All you have to do is snoop around Sky's website to find the final document. It's there in final release. Otherwise, just check it out here:

<a href="http://www.skycomputers.com/hardware/IntrotoSKY.pdf"; target="_blank">www.skycomputers.com/hardware/IntrotoSKY.pdf</a>

[Update]: There will probably be no 256-bit version of AltiVec simply because none is needed. However, it does seem that Apple and Moto might instead choose to push the "bandwidth" argument and improve other parts of the silicon instead. Think "bandwidth bandwidth bandwidth bandwidth..."

-Apple will break the 1 GHz clock speed barrier with the new G5 upon it's release (this is a given). Not only that, it's quite silly since we already have 1GHz. G4's :-)

-The G5 will be multicore (eventually)

-The G5 will be a true 64-bit processor that will be able to run all 32-bit apps natively and not require emulation. Just like a 64-bit operating system can run 32-bit applications at native speed if coded correctly. At least that's what my sources tell me on the Darwin sites. As a matter of fact I know that a 64-bit OSX already exists. As to how far it's come along... Well, no one will say exactly. The point is, even if it isn't ready, the G5 will run OSX at native speed anyway. What's more, these developers state that key parts of the OS can be optimized for the 64-bit G5 first, before the rest of the application is ported; thus making for sort of a 32-bit/64-bit hybrid of the OS. The same can be done for applications it would appear; In fact, it's already being done when applications developers code for AltiVec.

Also worth noting is that code does *not* generally get faster when migrated from 32 to 64 bits, unless it was doing work for which 32 bit wide registers were actually too small (e.g. multiprecision arithmetic). However, that's a whole other story...

Now regarding the G5 and the "so-called" new chipset that it will have... Here is something interesting to consider Taken from "The Register"

[[[ The G5 will sport a 400 MHz frontside bus - like Intel's Pentium 4, though its performance could be limited by whatever memory technology Apple connects to it across the system bus. ... Speaking of which, we hear work is progressing on a new chipset, designed for the G5 ]]]

This was taken from here:

<a href="http://www.theregister.co.uk/content/39/21692.html"; target="_blank">http://www.theregister.co.uk/content/39/21692.html</a>;

However, in another very recent article, Motorola has gone on the record as saying that future PPC's will have integrated memory controllers right on the CPU. You can read the article here:

<a href="http://www.eetimes.com/story/OEG20010828S0091"; target="_blank">http://www.eetimes.com/story/OEG20010828S0091</a>;

In any case, it seems that the information that Motorola provided in the eeTimes article turned out to be correct. Here are some key points from that article.

[[[To maximize data bandwidth and reduce memory latency, Motorola Inc. said it will likely integrate a DRAM controller directly onto a future high-end PowerPC processor]]]

So, it looks like Moto will integrate the controller right onto the new G5 CPU itself, instead of designing a completely new chip set as the Register reported. And keep in mind that the MPC8540 already exists and has an integrated controller.

[[[ processors are reaching their limit in how much extra performance they can derive from bigger cache ]]]

[[[In the future, however, the company wants to avoid integrating more cache onto the processor. "It would be a huge cost in die size," ]]]

[[[By doing so, the processor could bypass an external bus and have a direct link to the DRAM. ]]]

Here again it would run directly against what the Register reported. Moto is looking to bypass any external bus(s) altogether. This would certainly cut-down on mobo costs for Apple with regard to integration, size, manufacturing etc. Again, a custom or "new" chipset designed specifically for the G5 appears questionable. The MPC8450 hints at that.

[[["It makes a lot more sense to add high-speed memory controllers on processors," ... Anytime you have a bus, you have to arbitrate for the bus. Rather than let it go hungry, you could feed the processor as fast as it can be fed." ]]]

[[[Handa cited double-data-rate (DDR) SDRAM running at 266 or 333 MHz as memories that could have a direct pipeline to the processor, but he declined to say specifically what kind of DRAM Motorola would consider. He said RDRAM "is not in the markets we play in, at least not today."]]]

This one particular statement completely contradicts the Register as reporting the G5 as having a 400 MHz. bus. If you want my opinion, I hold more credence in the eeTimes article since it quotes individuals directly from Motorola and the fact that the first chip in the G5 family has this integrated controller. The eeTimes article mentioned quotes from people who have names, whereas the Register makes vague references to "sources close to Apple..." and "moles". Not only that, but the Register's article seems "cooked up" at times- as if it was someone's "wish list" for a future machine with every stated rating number being higher than current Pentium processors. Apple will have a much faster and more advanced system if they build their machines around what the eeTimes article revealed and technologies present in the MPC8450. Integrating the memory controller right onto the CPU and providing a direct link to memory is far better than designing a custom chipset and a 400 MHz. bus. The G5 will probably follow the "System on a Chip" path. Take a look at some of the other technologies already present in the MPC8540! I guess we'll just have to wait and see.

Then there is the rumor that Apple and Nvidia are working together on some sort of "dual-engine graphics accelerator" The URL can be found here:

<a href="http://www.architosh.com/news/2002-04/2002c1-0412-applegraph1.phtml"; target="_blank">http://www.architosh.com/news/2002-04/2002c1-0412-applegraph1.phtml</a>;

I'm willing to bet that this will turn out to be more of a "graphics subsystem" that's built right onto the motherboard that will act in tandem with the CPU(s) *and* whatever graphics card(s) is present. In other words, I have a feeling that whatever Apple decides to implement, you can bet it will benefit ALL the desktop models. It will probably be a coprocessing subsystem so graphics card manufacturers can still build and offer cards for the Mac. I don't think Apple will want to shut out those companies. So, I think a subsystem with an "additive" performance effect is likely. In fact, it might be the way to go. Anyone have any other speculation on this?

Be sure to check out these URL's from Architosh.com :

<a href="http://www.architosh.com/news/2001-11/2001a-1130-appleg5.phtml"; target="_blank">http://www.architosh.com/news/2001-11/2001a-1130-appleg5.phtml</a>;

<a href="http://www.architosh.com/news/2001-12/2001c-1201-g5moto-info.phtml"; target="_blank">http://www.architosh.com/news/2001-12/2001c-1201-g5moto-info.phtml</a>;

And The Register article here:

<a href="http://www.theregister.co.uk/content/39/23158.html"; target="_blank">http://www.theregister.co.uk/content/39/23158.html</a>;

And the current PDF that I found on Motorola's site here:

<a href="http://e-www.motorola.com/collateral/SNDFH1101.pdf"; target="_blank">http://e-www.motorola.com/collateral/SNDFH1101.pdf</a>;

************************************************** ****************

All URL's mentioned in my post are listed here:

<a href="http://www.eetimes.com/story/OEG20010828S0091"; target="_blank">http://www.eetimes.com/story/OEG20010828S0091</a>;

<a href="http://e-www.motorola.com/collateral/SNDFH1101.pdf"; target="_blank">http://e-www.motorola.com/collateral/SNDFH1101.pdf</a>;

<a href="http://www.theregister.co.uk/content/39/21692.html"; target="_blank">http://www.theregister.co.uk/content/39/21692.html</a>;

<a href="http://www.architosh.com/news/2001-11/2001a-1130-appleg5.phtml"; target="_blank">http://www.architosh.com/news/2001-11/2001a-1130-appleg5.phtml</a>;

<a href="http://www.architosh.com/news/2001-12/2001c-1201-g5moto-info.phtml"; target="_blank">http://www.architosh.com/news/2001-12/2001c-1201-g5moto-info.phtml</a>;

<a href="http://www.theregister.co.uk/content/39/23158.html"; target="_blank">http://www.theregister.co.uk/content/39/23158.html</a>;

<a href="http://www.navo.hpc.mil/pet/sip/PDF/Germann_SIP_2.pdf"; target="_blank">http://www.navo.hpc.mil/pet/sip/PDF/Germann_SIP_2.pdf</a>;

<a href="http://www.skycomputers.com/hardware/IntrotoSKY.pdf"; target="_blank">www.skycomputers.com/hardware/IntrotoSKY.pdf</a>

<a href="http://e-www.motorola.com/webapp/sps/site/prod_summary.jsp?code=MPC8540&nodeId=01M98655"; target="_blank">http://e-www.motorola.com/webapp/sps/site/prod_summary.jsp?code=MPC8540&nodeId=01M98655</a>;

<a href="http://e-www.motorola.com/webapp/sps/site/overview.jsp?nodeId=03M943030450467M983989030230"; target="_blank">http://e-www.motorola.com/webapp/sps/site/overview.jsp?nodeId=03M943030450467M983989030230</a>;

<a href="http://www.motorola.com/SPS/RISC/smartnetworks/products/hostproc/index.htm"; target="_blank">http://www.motorola.com/SPS/RISC/smartnetworks/products/hostproc/index.htm</a>;

<a href="http://e-www.motorola.com/collateral/SNDFH1101.pdf"; target="_blank">http://e-www.motorola.com/collateral/SNDFH1101.pdf</a>;

<a href="http://www.osopinion.com/perl/story/14756.html"; target="_blank">http://www.osopinion.com/perl/story/14756.html</a>;

<a href="http://www.architosh.com/news/2002-04/2002c1-0412-applegraph1.phtml"; target="_blank">http://www.architosh.com/news/2002-04/2002c1-0412-applegraph1.phtml</a>;

************************************************** ****************

Best

---

Ed M. ([email protected])

cobra · July 7, 2002 8:14PM

Uh oh.

ed m. · July 7, 2002 8:19PM

I figured it would keep people reading for a while ;-) But remember to keep in mind that the information I've gathered and submitted to other sites over these past months is backed up with URLs to articles and available facts.

--

Ed

penhead · July 7, 2002 8:53PM

Thanks! Well done putting all that together

It was a good read, too. I feel so much wiser now.

ed m. · July 7, 2002 9:32PM

penhead:

Well, I left out a lot of particulars pertaining to the bus and 64-bit, but if there is any curiosity, just let me know... BTW, I'm surprised that you red the entire thread and all the associated URLs so quickly lol

--

Ed

daver · July 7, 2002 9:48PM

Wow. Um... yeah. :eek:

jrc · July 7, 2002 10:03PM

Bravo!

obi-dun · July 7, 2002 10:07PM

WoWwwwwwwww <img src="graemlins/lol.gif" border="0" alt="[Laughing]" />

bombthroat · July 7, 2002 10:20PM

I agree with Penhead, that was a good read. I can only hope something from that aritcle will be announced at MWNY. I want to get a new PowerMac and am going to regardless of what's announced, it'd just be nice if it were something MAJOR so I can pat myself on the back for waiting.

Keep the information flowing!

bodhi · July 7, 2002 10:58PM

Does anyone NOT remember a year or so back how Motorola made a deafining change to the PowerPC roadmap. What was the "G5" that would have been for Apple (I think part #8500) just disappeared. Then the microprocessor forum came and went without a word from Mot about it. Now I look at the book E area on that Mot PDF linked above and it's clear that the G5 will be targeted for embedded only. What does this all mean? That may have been around the time that Apple went with this "new" company for the next gen processors.

junkyard dawg · July 8, 2002 1:10AM

One of those PDFs had a roadmap for Moto's process usage, and it clearly showed Moto migrating from a 180 nm process to a 130 nm process in mid-2002, and to a 100 nm process in early 2003.

WTF?

I don't trust anything put out by Moto. That company is a total mess...we can only hope that Apple has found a different supplier for the G4's successor, and that this new supplier is more reliable than Moto (not very difficult to do).

amorph · July 8, 2002 2:47AM

[quote]Originally posted by Junkyard Dawg:

One of those PDFs had a roadmap for Moto's process usage, and it clearly showed Moto migrating from a 180 nm process to a 130 nm process in mid-2002, and to a 100 nm process in early 2003.

WTF?<hr></blockquote>

Well, it's not quite mid-2002, and they've had a 130nm process up and running for months now. CPU production is not out of the question.

As for the 100nm process in 2003, well, I haven't seen any signs that they're behind schedule on that, either. We'll know for sure in early 2003.

pfy · July 8, 2002 7:45AM

First let me say hello to everybody since I have just decided to register.

Now on to the topic at hand. I have to say it is very refreshing that someone actually tries to bring some order into the G5 rumor jungle and does not just add to the confusion with some made up conversation their cousin had with an Apple janitor at some Cupertino bar...

Anyway nice article Ed M.

So let me just add a couple of word to what you said about AltiVec2 (or whatever)

[quote] There will probably be no 256-bit version of AltiVec simply because none is needed. However, it does seem that Apple and Moto might instead choose to push the "bandwidth" argument and improve other parts of the silicon instead. Think "bandwidth bandwidth bandwidth bandwidth..."

<hr></blockquote>

I very much disagree with you that no 256-bit version of AltiVec is needed. After all, floating point performance on the current G4 for 64-bit floats is completly inadequate and unless Moto finally puts some efort in designing a good FPU, this can only be alleviated if Altivec support for 4 x 64-bit floats is added (currently only 4 x 32-bit is supported, hence 128-bit registers).

vmx · July 8, 2002 9:38AM

[quote]Originally posted by Amorph (posted 8 July):



Well, it's not quite mid-2002<hr></blockquote>

What part of 7/12 is "not quite mid"?

lemon bon bon · July 8, 2002 3:15PM

"What part of 7/12 is "not quite mid"? "

I think he's talking about Seybold

To Ed M:

Quite simply the best G5 thread ever. Superb.

Lemon Bon Bon

powerdoc · July 8, 2002 3:51PM

Let's do a vote about the G5

a, the PPC chips will not evolve further than the G4 (traduction there will be no G5) and Apple will have therefore the obligation to choose an another family of chip.

b. the G5 will be made by mot

c. the G5 or whatever he will be called , will be made by IBM

d; Apple will design it and it will be made outside

e; AMD will make it

I vote for C : IBM

maverick · July 8, 2002 4:31PM

I, don't think it will be AMD. They would have to retool. Motorola is running their processors as imbedded since they need a programming module, or an additional Programmable DSP to operate due to their Digital DNA initative. It's possible they have a chip "sight unseen", and won't divuldge any info until Apple gives the go ahead. In this debate my money would be on IBM since they are using the G5 in their custom servers, and could easily retool, and process chips for Apple. I say "C"

<img src="graemlins/smokin.gif" border="0" alt="[Chilling]" />

[ 07-08-2002: Message edited by: Maverick ]

[ 07-08-2002: Message edited by: Maverick ]

macjedai · July 8, 2002 10:14PM

[quote]Originally posted by powerdoc:

Let's do a vote about the G5

a, the PPC chips will not evolve further than the G4 (traduction there will be no G5) and Apple will have therefore the obligation to choose an another family of chip.

b. the G5 will be made by mot

c. the G5 or whatever he will be called , will be made by IBM

d; Apple will design it and it will be made outside

e; AMD will make it

I vote for C : IBM

<hr></blockquote>

Not that I have any "inside info", because of where I work ... but I have a feeling that it is IBM as well (I'm remembering what Steve did to ATI after their slip-up, and applying it to Mot and the G4 fiasco, he just had to wait until the contract was "up").

I'm hope'n!

davelee · July 9, 2002 7:32AM

Look at what IBM just sent as part of their 'PowerPC interest':

[quote]

PowerPC: High Performance Embedded Processors of Choice

A report by Andrew Allison, Independent Indusry Analyst

Market shares by architecture in the embedded processor market are in a state of flux as the market evolves into very low power and very

high-performance segments. The high-performance segment is growing rapidly, there are only two architectures in serious contention for it,

and PowerPC® presents a compelling case as the architecture of choice. If PowerPC prevails in the high-performance embedded processor market, it

will overtake the MIPS and SH architectures to become the second highest volume architecture overall.<hr></blockquote>

Can read full at:

<a href="http://www-3.ibm.com/chips/techlib/techlib.nsf/techdocs/9971B999A0648BBF87256BE900533CD6"; target="_blank">PowerPC</a>

Hope the link works.

Nice evidence for IBM yielding support for PPC in server, desktop and embedded markets.

Look for the clues.

[ 07-09-2002: Message edited by: DaveLee ]

ed m. · July 9, 2002 8:43AM

PFY wrote:

[[[ Anyway nice article Ed M. ]]]

Thanks. I'm still snooping around and will post more tidbits as they become available. However, everyone should feel free to use the information that I've already collected and see if they can piece together more of the puzzle. It will be interesting to see how close we can get to the actual silicon.

In reply to:

[[[ So let me just add a couple of word to what you said about AltiVec2 (or whatever)

quote: There will probably be no 256-bit version of AltiVec simply because none is needed. However, it does seem that Apple and Moto might instead choose to push the "bandwidth" argument and improve other parts of the silicon instead. Think "bandwidth bandwidth bandwidth bandwidth..."

I very much disagree with you that no 256-bit version of AltiVec is needed. After all, floating point performance on the current G4 for 64-bit floats is completely inadequate and unless Moto finally puts some efort in designing a good FPU, this can only be alleviated if Altivec support for 4 x 64-bit floats is added (currently only 4 x 32-bit is supported, hence 128-bit registers). ]]]

OK, funny you should mention that because a source of mine pretty much said the same thing. What he didn't say is that it was *needed*. He did say it would be a way to implement it though. I'm sure it would be nice, but what about the other tradeoffs? What about recoding? and do programmers really *want* double precision within the vector unit? I'm not so certain that they do. Intel doesn't use it for any performance advantages as we soon will see. Weird that my source vaguely parallels something similar to what you stated with respect to double-precision calcs within the vector unit. He states:

[[[The only way this would be worthwhile would be to double the width of the vector register so that we get 4x parallelism for double precision FP arithmetic. ]]]

So, your comment holds value. Who knows, maybe we WILL see some type of implementation that utilizes this approach. Probably not directly on or within the vector unit. (Perhaps on another chip or cluster/bank of chips residing on the motherboard?) I continue to hear that Apple is working with Nvidia on some sort of solution that will be part of the motherboard and provide gobs of FP work potential. And it's my understanding that the components and technologies used in the PPC architecture are completely modular. That means they can be taken from chip to chip. We'll have to wait to see how it pans out though...

While we are on it, perhaps I should take this time to provide a little insight into the current G4's as I understand them (and hopefully squash some myths). For sake of simplicity and to provide a speedy reply to the forum, I'll provide a somewhat hacked together rant that will attempt to clear some things up.

One thing is that the G4 is no where near "register-starved" like it's x86 counterparts. I can see a long-winded discussion (with a lot of tangents) brewing, so bare with me and feel free to correct anything I've misunderstood... ;-)

It is my understanding that a 128 bit-vector unit is quite sufficient and to top it off you'd want to keep it exactly the way it is (for now). If there was an implementation [now] It might be of the "parallel AltiVec" solution where two or more *complete* AltiVec units will be functioning in tandem on a single die or on other chips perhaps (yes, I'm guessing). The reason we aren't seeing the speed in the G4 is simply because the developers aren't completely familiar with the technology (read: familiar with x86 and people don't usually like change); not only that, they really aren't familiar with Mac hardware in general as much as they should be. Of course they'll tell you otherwise. Also keep in mind that I'm not an expert, but I do read and I do listen to what the experts discuss and what they conclude. In any case, all the info provided in my posts is as I, myself understand it. So with that out of the way, back to the rant...

One thing to remember is that the *numbers* aren't what's important. Take the G4's bus...As with any bus It's the THROUGHPUT that counts! Macs have the highest throughput with respect to their bus. Period. In case you don't believe me and require direct, positive proof, just check out this tidbit posted by Chris Cox (Adobe) on a message forum:

[[[133 MHz bus means nothing, except in comparison to other PowerPC chips with different bus speed. The throughput (bandwidth) of the bus is what matters.

The Athlon XP has a 200 MHz double clocked bus and uses DDR DRAM -- but can only move 700 MB/s (on a good day).

The P4 has a fast bus and dual channel RDRAM and can move 1500 MB/s -- but only on very simple operations like memcpy, memset, memcmp. For complex operations it's not much better than PC133 (where it only moves about 600 MB/s).

The PowerMac G4 has a lowly 133 MHz bus, and moves 1085 MB/s, sustained over large (32 Meg) buffers. For less optimized code it can sustain 930 MB/s (again over large buffers).

Why is the slowest bus moving so much memory? Better bus design, better DRAM controller, better cache design, and several other details that only serious solderheads would understand.

Oh, and clockspeeds -- again, the number doesn't matter. Throughput (in this case computation completion) matters. That's why a P4 at 1500 MHz is usually slower than a P3 at 1000 MHz, and an Athlon XP at 1.6 GHz runs circles around a P4 at 2.0 GHz. The PowerPC has a lower clockspeed - sure. But it also has lower latencies, larger caches, more functional units, more pipelining, better cache control, and lots of other things that make it competitive with Intel chips at over twice the clockspeed.

If you're talking about double precision floating point - then the G4 has a LOT of advantages over Intel and AMD chips. My own render code is double precision, and it's about 40% faster on the G4 than on a P4 with twice the clockspeed.

And integer calculations... Well, that's nicely demonstrated with Photoshop or the distributed.net client -- all integer, and they're faster even without the vector code.

About the only disadvantage I've found on the G4 is the ATA66 supplied on the motherboard. But I normally replace that with a Ultra160 card anyway.

Yes, AltiVec is a specialized thing -- you have to have parallelism in your code, and you have to be working with integer data or single precision floating point data to use it. But that covers an awful lot.

Yes, there are some algorithms that don't vectorize (like median or euclidean distance metrics). But rendering code normally has lots of parallel opportunities.

BTW - the description of AltiVec (and generally any chip feature) changes for each press release, depending on the target audience (embedded, performance, desktop, etc.).

Without looking at any of their actual code, I can't tell you exactly why they're getting such poor performance on the Macintosh. But similar code in other applications is performing much better. And generally I find that if you run the same code, or code with a similar effort given to optimizing [specifically] for each platform, then the G4 wins.

Maybe when I get done with Photoshop 7, your programmers, engineers and I should sit down for a beer or 12....]]] -Chris Cox

Anyway, the point is the Mac's bus is not slow. As a matter of fact it's much more efficient and is nearly being utilized to it's full capacity. Can't get that on PC's.

Now if you want to focus on real-world benchmarks (instead of looking at the misleading numbers and marketing hype)

Perhaps more people should look at the SETI client. Or better yet, hop on over to "www.distributed.net". Have a gander at the RC5 client RIGHT NOW the G4 Macs are BY FAR the fastest machines out there. with an ENORMOUS lead over ANYTHING ELSE. Here are the numbers:

Record id:............................8331

CPU Name:...........................PowerPC G4 (MPC7455)

MHz. :..................................1000 x 2

CPUs:..................................2

OS:..................................... MacOS X 10.1.2

Client (current build)...........2.8015 RC5

MegaKeys/second/MHz. :..21.13

keys/s/MHz/processor:......10564827 (on average)

Actual Speed :...................21,129,654 <---- Incredibly fast! (higher number equates "faster")

The numbers are equally impressive for the OGR client.

Now, find me a PC with anything that close. Dual processor or single processor configurations. Here are the latest:

RC5 client:

Pentium III/1275 3,617,050 keys/second/cpu

Athlon/1739 6,207,015 keys/second/cpu

Pentium 4/2556 3,644,544 keys/second/cpu

They are pathetic compared to the Mac and in case you want to see the numbers for yourself, you can find them here:

<a href="http://n0cgi.distributed.net/speed/"; target="_blank">http://n0cgi.distributed.net/speed/</a>;

Everyone, take a look at where the AMD, P3, P4 and Itanium stand. They are incredibly slow. These are machines with huge numbers-ratings as compared to the Mac. Yet the Mac wins out.

The point about the 128-bit AltiVec is this....There is no other SIMD architecture that could possibly yield such a dramatic effect on RC5, no matter how much brainpower was invested into programming for it. And that's on the seemingly slothful 133 MHz. bus? Please, forget the numbers associated with ratings, they do not show how the bus is actually being used. To reiterate, the MHz. associated with the bus only shows "how many clocks"... It does not denote the amount of work being done on, between and through those cycles.

Another myth I often hear resembles the following:

"It would help performance a whole lot if they made a separate version of G5, from 64-bit code. No use of a 64-bit processor when it runs 32-bit code."

This is wrong. Making a separate version only complicates things. That is a part of the problem that the PC realm is experiencing. Endless combinations and variations all with particular and subtle differences in requirements. None of which can be absolutely certain to work together in perfect harmony in every combination, resulting in a huge guessing game.

What is wrong with a 64-bit processor that runs 32-bit code natively? The main word to key-in on is *natively*. No performance hit. Get it? Suddenly, developers have more options. They can either gradually migrate to 64-bit or they might opt to only code particular sections of the app to take advantage of the 64-bit processor. Similar to how AltiVec is already being utilized on the current G4's. And AltiVec is 128-bit. The App will be a hybrid application. People are *completely* wrong to assume that going to 64-bit automatically translates into a speedier, faster or more efficient app/code.

In many cases migrating to 64-bit could actually be *slower*. Why? It's fairly simple if you take the time to look at it and understand what's really going on. It isn't so obvious to people who have fallen for the "number's game" that Intel (with processors) and Micro$oft (with version numbers) have mastered in their Marketing campaigns. Here is where all the assumptions (by less informed people) started to arise:

During the transition from 16 to 32 bits applications and systems *did* gain speed. This was because many of the 16 bit machines often had to work with 32 bit numbers. A range from 0 to 65535 was simply not enough for common data. Therefore, in that case there is a *significant* difference between processing two 16 bit chunks or a single 32 bit chunk. However, here is the kicker (and something many people don't realize)... There are very few common applications that really need more than 32 bit wide data, or more than 32 bits of address space. Period. So in these cases, moving an app to 64-bit code would be a huge waste of time with little return and in many cases, it would even hurt performance. What this means is that there are *very few* kludges currently at work which process several chunks of 32 bit data. This, in turn, means that going to 64 bit will only speed up *those* very few kludges, while all pointer variables everywhere will consume double as much memory! In Apples case, having a single 64-bit CPU solution that can execute 32-bit code natively is a much smarter approach since it will give the developers of the *common* apps ample time to migrate their 32-bit apps to 64-bit. There isn't going to be a performance difference after migrating those apps to 64-bit anyway. So why the push toward 64-bit? We'll look at why Apple is moving to 64-bit. It probably isn't entirely for the reasons you might expect. More on that later...

As for the developers who cold use the extra horsepower. Guess what? The 64-bit CPU that is running 32-bit code natively is right there, just as happy to run 64-bit code. This means that developers will have vastly more flexibility when coding their apps. The key areas of the big "number-cruncher" apps can be moved to 64-bit, where 64-bit addressing would be an enormous boost. Here is how a trusted friend and PPC/AltiVec programmer explains it:

He expanded on this comment taken from a Darwin development board:

<snip> PPC uses a 16 bit offset from a register to determine the load/store address. The instructions don't change, nor do their operands when you go from 32 to 64 bit. Only the data in the registers differs, which would be controlled by whether you loaded a pointer using a load word or load double word. .... cont. <end snip>

Reply:

[[[Yes, the instruction format is identical. Yet there is a price for going to 64 bit: immediate values (constants embedded in machine instructions) are 16 bits wide. Filling a 32 bit register takes two instructions; filling a 64 bit register with an arbitrary bit pattern requires five instructions (synthesize two 32 bit values in four instructions, then combine them with a fifth).]]] - (Anonymous source)

By now you should understand why going from 32-bit to 64-bit doesn't automatically translate into more performance. Likewise, I'm sure you will see the added flexibility and advantages developers will have when coding for a a 64-bit processor that can run 32-bit code natively as well as true 64-bit code. And do keep in mind that OSX-64 will likely be able to run 32-bit apps natively as well. This is vastly different that Intel's approach in ITANIC which relies on emulation and M$'s approach with Win-64 doesn't seem to be any better. Anyway, 64 bit addressing won't help those few apps directly unless you've got gobs of RAM. Possibly 2Gig or greater. Now, The real reason for Apple moving to 64-bit...

Hmmm.. It sort of goes along this line of reasoning: In the very near future there will be desktops with more than 4Gig of RAM, so we will need to address that. That is to say going to 64 bits today is more of a forward-looking move. There are a few applications in existence that really benefit from a 64 bit machine, but those are server-type, "big iron" stuff. For example, The AltaVista search engine used to run its database on an Alpha machine with 16 gigs of RAM. This basically allowed it to service most of the requests without ever going to disk.

However, the sheer size of the problems and datasets being processed keeps increasing steadily. Some people claim that "typical" applications occupy 1.0 to 1.5 more address bits per year. If you just look at the 'default' amount of memory in simple, entry-level PCs, you can draw similar conclusions; Just do the math. Current off-the-shelf offerings range from 128MB to 512MB. This means at the high end, only three doublings are left before we reach the 4GB limit of 32 bit machines. Therefore, more addressing is needed. 64-bit will provide that.

On the desktop, video processing could end up being *the* application that drives the transition to 64 bit, not because it would make a huge difference in speed, but because it is much more convenient to handle files above 4GB on a 64 bit machine. I know this information to be accurate because I've contacted 3 different legitimate sources and all of them concur.

Now to discuss AltiVec and the performance (and myths) of the G4.

With respect to 3D (and I really don't want to get off-topic here, but... ) How many are familiar with the old excuse:

"Um, but you see, it's not *our* fault the speed is underwhelming; there are just some things that AltiVec simply cannot be used for".

How often have we all heard this? The fact is that the "some things" always turned out to be ONE thing or specific things... "Our apps require double precision and AltiVec cannot be used in any way to perform double precision calculations"

Again, consumers were feeling disappointed and annoyed at Apple. As usual, I snooped around and found some interesting tidbits that many people fail to notice then I checked them for accuracy and validity by asking some legitimate sources... Many users and marketing-types absolutely swear by the "quality" of renders that a double-precision calc would produce. I notice that these claims fail to mention any threshold with respect to human limitations of sight and vision. There is a point where the human eye, no matter how good your vision, will not be able to discern/resolve any increase in resolution even if it was there. And since we are talking about full-motion animated 3D scenes, many tricks can be played out on vision.

From what I've discovered, It's reasonable to believe that you don't need double precision for 3D, unless you are really, really sloppy with your algorithms. Double precision calcs are usually employed because you can get away with a lot more slop. Here is a small rant about this endless nonsense about double precision in the vector unit. I obtained the Info from a trusted source -- a Ph.D. and AltiVec programmer... I decided to cut and paste the info so I could reply more quickly to this forum discussion.

[[[Q: Is an updated double precision-centric AltiVec unit the way to go? A: No.

This is why:

The vector registers have room for four single precision floats to fit in each one. So for single precision, you can do four calculations at a time with a single AltiVec instruction. AltiVec is fast because you can do multiple things in parallel this way.

Most AltiVec single precision floating point code is 3-4 times faster than the usual scalar single precision floating point code for this reason. The reason that it is more often only three times faster and not the full four times faster (as would be predicted by the parallelism in the vector register I just mentioned) is that there is some additional overhead for making sure that the floats are in the right place in a vector register, that you don't have to deal with in the scalar registers. (There is only one way to put a floating point value in a scalar register.)

Double precision floating point values are twice as big (take up twice as many bytes) as single precision floating point values. That means you can only cram two of them into the vector register instead of four. If our experience with single precision floating point translates to double precision floating point, then the best you could hope to get by having double precision in AltiVec is a (3 to 4)/2 = 1.5 to 2 times speed up.

Is that enough to justify massive new hardware on Motorola's or Apple's part? In my opinion no. This is especially true when one notes that using the extra silicon to instead add a second or third scalar FPU could probably do a better job of getting you a full 2x or 3x speed up, and the beauty part of this is that it would require absolutely no recoding for AltiVec. In other words, it would be completely backwards compatible with code written for older machines, give *instant speedups everywhere* and require no developer retraining whatsoever. This would be a good thing.

Even if you still think that SIMD with only two way parallelism is better than two scalar FPU's, you must also consider that double precision is a lot more complicated than single precision. There is no guarantee that pipeline lengths would not be a lot longer. If they were, that 1.5x speed increase might evaporate -- Quickly.

Yes, Intel has SSE2, which has two doubles in a SIMD unit. Yes, it is faster -- for Intel. It makes sense for Intel for a bunch of reasons that have to do with shortcomings in the Pentium architecture and nothing to do with actual advantages with double precision in SIMD.

To begin with Intel does not have a separate SIMD unit like PowerPC does. If you want to use MMX/SSE/SSE2 on a Pentium, you have to shut down the FPU. That is very expensive to do. As a work around, Intel has added Double precision to its SIMD so that people can do double precision math without having to restart the FPU. You can tell this is what they had in mind because they have a bunch of instructions in SSE2 that only operate on one of the two doubles in the vector. They are in effect using their vector engine as a scalar processing unit to avoid having to switch between the two. Their compilers will even recompile your scalar code to use the vector engine in this way because they avoid the switch penalty.

Okay, so Intel has double precision in their vector unit and despite what I have said, you still think that is absolutely wonderful. But do they Really have a double precision vector unit? The answer is not so clear. Their vector unit actually does calculations on the two doubles in the vector in a similar "one at a time fashion" to the way an ordinary scalar unit would. They only can get one vector FP op through [every two cycles] for this reason. AltiVec has no such limitation.

AltiVec can push through one vector FP op per cycle, doing four floating point operations simultaneously (up to 20 in flight concurrently). AltiVec also has a MAF core, which in many cases does two FP operations per instruction. This is the reason why despite large differences in clock frequency, AltiVec can meet and often beat the performance of Intel's vector engine.

The other big dividend that they get from double precision SIMD is the fact that they can get two doubles into one register. When you only have eight registers this is a big deal! [PowerPC has 32 registers for each of scalar floating point and AltiVec!] In 90% of the cases, we programmers don't need more space in there and the registers the PPC provides are just fine.

Simply put, (from a developers position) we just don't need double precision in the vector engine, and we wouldn't derive much benefit from it if we had it. The worst thing that could possibly happen for Mac developers is that we get it, because that would mean that the silicon could not be used to make some other part of the processor faster and more efficient, and a lot of code would need to be rewritten for little to no performance benefit. It wouldn't be a logical tradeoff.

The only way this would be worthwhile would be to double the width of the vector register so that we get 4x parallelism for double precision FP arithmetic.

And with respect to 3D apps *requiring* double precision...

Most 3D rendering apps do not NEED double precision everywhere. They just need it in a few places, and often (if they really decide to look) they may find that there are more robust single-precision algorithms out there that would be just as good. In the end they should be using those algorithms anyway, because the speed benefits for SIMD are twice as good for single precision than they are for double precision.

Apps like that can get a lot more mileage out of the PowerPC if they just increase the amount of parallelism as much as possible in their data processing. Don't just take one square root at a time, do four etc. And this isn't even taking into account multiprocessing just yet or even AltiVec for that matter. The scalar units alone, by virtue of their pipelines, are capable of doing three to five operations simultaneously! However if you don't give them 3-5 things to do at every given moment, this power goes unused. Unfortunately, this can be noticed in quite a few Mac applications already on the market where performance doesn't seem to be as solid as it should be. What is baffling is why Mac many developers aren't taking advantage of this power. What it boils down to is that most of these apps just do one thing at a time (for the most part), and in turn are wasting 60-80% of the CPU cycles. That's a lot of waste. What's nice is that the AltiVec unit is also pipelined, so it is important to do a lot in parallel there too. The only problem is that developers actually have to make a conscious effort to use the processor the way it was designed to be used. ]]] - (Anonymous source)

Anyway, I hope that cleared up a few things that have been on the minds of some Mac users. Again, I'm not an expert, but do happen to research, read and ask questions in an attempt to gain a better understanding of what's really going on.

Best

--

Ed M.

lemon bon bon · July 9, 2002 9:15AM

Intriguing. Are you saying even 'big' developers perhaps even Adobe aren't optimising apps like Photoshop to do 'several' things and are instead lazily allowing the minimum amount of work to be done?

Instead of focus on Altivec for the next processor...bar a few tweaks...your links/info' indicates that the G5's integer and fpu units are redesigned from the ground up. If the G5 can combine some brute force and the G4's potential to do 'more' things per cycle...then it could be the 'processor to end all processors'. (A quote from a Moto' rep from a Macworld news article a while back...)

Will it start at 2 gig? Keep digging...I'm curious to know if Moto' have indeed 'lost the contract' in light of the G4 at 500mhz debacle.

Good thread. Excellent reading. Fascinating.

Being educated.

Lemon Bon Bon

G5 Speculation Revisited

Comments