Apple, IBM and Magma - Powermacs sooner

overtoasty · November 5, 2002 9:07PM

[quote]Originally posted by Junkyard Dawg:

That's a cool little homage to the Nutcracker in 2112 overture, right at the end. Rockin' album! Thanks for reminding me of it, I'm playing it now...

[ 11-04-2002: Message edited by: Junkyard Dawg ]<hr></blockquote>

Dah, nah-nah-nahh, nah-nah-nahh, nah-nah-nahh nahh, nahh nah-nah-nahh, nah-nah-nahh, nah-nah nahh-nahh, nahh nah-nah-nahh nah-nah nahh! nah-nah nah nah-nah-nahh nah-nah-nahh nahh! nah-;

Dah, nah-nah-nahh, nah-nah-nahh, nah-nah-nahh nahh, nahh nah-nah-nahh, nah-nah-nahh, nah-nah nahh-nahh, nahh nah-nah-nahh nah-nah nahh! nah-nah nah nah-nah-nahh nah-nah-nahh nahh! nah-;

cowerd · November 5, 2002 9:12PM

[quote]From IBM's 10/16 8k:

"The new 300 millimeter plant is in test production and on schedule. We have

orders that would fully load the facility well into midyear of 2003. From what

we can tell about our competition, we may be the only manufacturer capable of

130 and 90 nanometer technologies." <hr></blockquote>

That's not what people working in the fab tell me.

david m · November 6, 2002 1:25AM

Wild VLIW Speculation:

[quote]Originally posted by wmf:

Claim 1 appears to be a small variation on clock gating. The VLIW stuff sure doesn't make any sense, thought.<hr></blockquote>

Could it be that Apple hopes to optimise the way instructions are packed when they go across the bus? I'm thinking in terms of the 970's need to pack instructions in groups of 5. If Apple was able to identify common instruction groups they could "compress" the instruction stream by using a short "code word" to represent the instruction pack. If they found enough of them then they could get a considerable spead-up across the bus. They would need to unpack the instructions somewhere near the chip though (I assume the original packing would be done by the compiler).

The advantage of such a system (if it worked at all!) would be the "speed up" of slowish memory bus from RAM.

Maybe such a system explains the rumours of "Apple PI" -- Packed Instructions -- and maybe even the rumour about DSP in the bridge chip: not so much DSP as instruction unpacking.

Gosh is that a thin limb I see...

ps. I was MemeTransport and WakaWakaWaka but haven't been able to log in under those names.

[ 11-06-2002: Message edited by: David M ]

rickag · November 6, 2002 8:02AM

[quote]Originally posted by cowerd:



That's not what people working in the fab tell me.<hr></blockquote>

What are they telling you?

producer · November 6, 2002 8:41AM

"ps. I was MemeTransport and WakaWakaWaka but haven't been able to log in under those names."

and also Dorsal M

david m · November 6, 2002 9:23AM

[quote]Originally posted by Producer:

...and also Dorsal M

<hr></blockquote>

Nah, my speculation is strictly ventral!

junkyard dawg · November 6, 2002 12:50PM

[quote]Originally posted by cowerd:



That's not what people working in the fab tell me.<hr></blockquote>

It's what people working ON the fab have told me.

programmer · November 6, 2002 4:09PM

[quote]Originally posted by David M:

Wild VLIW Speculation:

Could it be that Apple hopes to optimise the way instructions are packed when they go across the bus? I'm thinking in terms of the 970's need to pack instructions in groups of 5. If Apple was able to identify common instruction groups they could "compress" the instruction stream by using a short "code word" to represent the instruction pack. If they found enough of them then they could get a considerable spead-up across the bus. They would need to unpack the instructions somewhere near the chip though (I assume the original packing would be done by the compiler).

The advantage of such a system (if it worked at all!) would be the "speed up" of slowish memory bus from RAM.

Maybe such a system explains the rumours of "Apple PI" -- Packed Instructions -- and maybe even the rumour about DSP in the bridge chip: not so much DSP as instruction unpacking.

Gosh is that a thin limb I see...

ps. I was MemeTransport and WakaWakaWaka but haven't been able to log in under those names.

<hr></blockquote>

Code size very very rarely causes a bandwidth problem -- its much easier to generate a lot of data than a lot of code. The core (i.e. important part) of most algorithms fit nicely into the L1 instruction cache while they execute, unles their data which can be huge. On desktop systems code compression shouldn't be a very high priority.

xaqtly · November 6, 2002 5:43PM

Can I just add that while I can't really contribute to this thread because most of it is over my head, I really enjoy reading it! Keep it up.

costique · November 10, 2002 9:37AM

[quote]Originally posted by Programmer:



Code size very very rarely causes a bandwidth problem -- its much easier to generate a lot of data than a lot of code. The core (i.e. important part) of most algorithms fit nicely into the L1 instruction cache while they execute, unles their data which can be huge. On desktop systems code compression shouldn't be a very high priority.<hr></blockquote>

As far as I could understand, David meant that a 970 loses a great deal of its efficiency if there are empty slots in groups of instructions to be executed. If all of these groups are filled up the CPU hits a peak of performance. It would be very good, I think. The question here is how much processing power it takes to filter the whole instruction stream and decide what to do with every particular instruction. I don't believe it's possible with current hardware. Am I right?

tomb of the unknown · November 10, 2002 2:07PM

[quote]Originally posted by David M:

[QB]I'm thinking in terms of the 970's need to pack instructions in groups of 5.<hr></blockquote>

Not at all. What you are referring to is how the 970 pipelines OPs. There's no benefit to "packing" instructions before they reach the unit. This would mean losing the advantage of one-to-one instructions to OPs the 970 currently enjoys. (For the most part. Some instructions must first be "cracked" into two OPs, while a smaller number of "millicode" instructions are decoded into more than two OPs.)

The whole idea in the 970 is to avoid VLIWs.

Apple is just protecting their intellectual property. When you patent something that someone could patent as an "improved" version just because they saw a way to do it with VLIWs, then you could lose the benefit of having the patent.

tomb of the unknown · November 10, 2002 2:21PM

[quote]Originally posted by moki:

In any event, sure, Apple has had prototypes of 970-based machines for a while... that's how it works.<hr></blockquote>

No, that's not how it works. Clients rarely if ever see prototypes. Samples yes, prototypes no.

[quote]Prototypes always exist long before products do, the same way novels exist in unfinished form long before you can buy 'em at Borders.<hr></blockquote>

Prototypes exist before samples do. They are usually just composed of one small portion of the circuitry of the chip. They're used to test design concepts and to model behavior in new process sizes. As such, they might have incomplete circuits, at 13 nm say, with 18 nm components hooked up just to provide support for whatever it is you want to test. In no way, would any of this be of interest to anyone other than the design engineers who are basically hoping their theories of how the physics will work can be proven in the prototype. (If not, it usually means they'll have to redesign around results obtained from the prototype.)

Prototypes don't model the CPU itself, so they aren't useful to the client. You can't determine latencies or timings from them, so they're not helpful in designing chipsets, or what have you, to support the CPU.

Samples, on the other hand, may allow you test such things, but samples are much closer to finished product.

Think of prototypes as chapter outlines. They don't tell you enough to know what the book is going to be like -- unlike a rough draft which gives you some hint of the book's potential. A sample is like a rough draft. There may be some fine-tuning yet to do, but overall, you have some hint of the potential there.

[ 11-10-2002: Message edited by: Tomb of the Unknown ]

merlion · November 10, 2002 8:27PM

[quote]Originally posted by Junkyard Dawg:

Interesting!

Is that Rush, 2112, in your sig? I really dug that album back in high school.<hr></blockquote>

Every time I crack open a Heineken I have to sing part of a line from 2112 - "...hold that Red Star proudly high in hand!" <img src="graemlins/smokin.gif" border="0" alt="[Chilling]" />

[ 11-10-2002: Message edited by: Merlion ]

moki · November 11, 2002 4:50AM

[quote]Originally posted by Tomb of the Unknown:



No, that's not how it works. Clients rarely if ever see prototypes. Samples yes, prototypes no.

quote:<hr></blockquote>

I was speaking of prototype Apple boxes, not prototypes of the CPUs from IBM.

programmer · November 11, 2002 10:20AM

[quote]Originally posted by costique:

As far as I could understand, David meant that a 970 loses a great deal of its efficiency if there are empty slots in groups of instructions to be executed. If all of these groups are filled up the CPU hits a peak of performance. It would be very good, I think. The question here is how much processing power it takes to filter the whole instruction stream and decide what to do with every particular instruction. I don't believe it's possible with current hardware. Am I right?<hr></blockquote>

Most instruction streams are already carefully "packed" in this sense -- its known as "instruction scheduling" and is performed by virtually all compilers nowadays. GCC and CodeWarrior will be updated to understand the particular details of the 970's grouping mechanism and then all code compiled with the new compilers will run faster. Its worth noting that all processors have all sorts of odd behaviours which affect how the optimal instruction streams get generated, and most of them aren't documented nearly as clearly as this grouping mechanism in the POWER4/970.

I have to wonder if Metrowerks will update their compiler for the 970 -- they are owned by Motorola, after all.

rhumgod · November 11, 2002 10:57AM

[quote]Originally posted by moki:

I was speaking of prototype Apple boxes, not prototypes of the CPUs from IBM.<hr></blockquote>

Prototype boxes essentially mean a near-complete system, as far as I can tell. That being the case, if people have had these prototypes for some time now, wouldn't you assume that they would be nearing completion within, say 2 to 4 months time?

junkyard dawg · November 11, 2002 12:17PM

I think it's time to make it official:

CONFIRMED: PPC 970-based Powermacs to debut early 2003!

rhumgod · November 11, 2002 12:57PM

editted...

[ 11-11-2002: Message edited by: Rhumgod ]

gar · November 11, 2002 3:04PM

[quote]Originally posted by Rhumgod:



Prototype boxes essentially mean a near-complete system, as far as I can tell. That being the case, if people have had these prototypes for some time now, wouldn't you assume that they would be nearing completion within, say 2 to 4 months time?<hr></blockquote>

why assuming 2 to 4 months, if they have these boxes since october/november 2001 (first rumors of a "G5", or were these boxes containing the cancelled motorola 7500 processor, coinsidentally running 1.6 Ghz) why not, as excepted, second half of 2003? or sad, but also assumable, early 2004?

rhumgod · November 11, 2002 4:31PM

[quote]Originally posted by gar:

why assuming 2 to 4 months<hr></blockquote>

I think the boxes from last year were a different cpu entirely (ie; Motorola's since cancelled G5). I am assuming that 2-4 months, on top of the already 6 months or so that the 'new' boxes have been prototyping (again, allegedly). I am just arriving at a conclusion of 8-10 months, based on product rollouts of previous updates from Apple, and extending them a bit for new system board logic and bus interconnects, and so forth. I think it makes sense unless there were serious problems discovered.

Also, the move from 68k to ppc 601 took a little over a year for IBM to build a PowerPC that was delivered to Apple in September, 1992 and the Piltdown Man was released on March 14, 1994 - however, a rewrite was needed in the jump from 68k code to ppc code. This conversion should be a fairly simple recompile, and occur much faster, as many developers have suggested and have done already. Check penguinppc64.org for latest 64-bit builds of linux on IBM power-based systems.

[ 11-11-2002: Message edited by: Rhumgod ]

Apple, IBM and Magma - Powermacs sooner

Comments