what happened to all the G6 plans?

hmurchison · February 6, 2005 2:20AM

Quote:

Just up late, sipping a Grant's Perfect Porter and speculatin' away.

Fine Washington Beer.

brendon · February 6, 2005 11:35AM

Quote:

Originally posted by Programmer

And I'm telling you that you are thinking about it backwards. A system-on-chip does not bring its features to one of its components, components bring their features to a system.

OK I will take a few stabs at this and try to further the conversation and hopefully not derail any currently meaningful strings.

OK I don't know how instructions to a GPU are dispatched. Do they currently go to the CPU and then get routed to the GPU? Or do the two of these components work closely? I can see that in this SOC frame the SOC uses other components much more autonomously. They talk but not much more. The system chip routes all GPU calls to the GPU and the general processing is conducted in the CPU and the results are then shared back to the system through the communication channels. SOC handles all the work load of the reporting, where to put the results in memory and where and how to fetch to next set of data and instructions. It appears that the way the 9XX series work it would be much better for them to have a memory controller on board, for faster access. So basically the way for the components to talk to the CPU is through the on chip rapid IO channels. It appears that this allows for much more flexability in memory design. In that the interaction is through the IO channels and the SOC does not care about what configuration the memory has as long as it can keep the pipe filled when the SOC needs instructions or has data to store. In this configuration would the SOC care if there was or wasn't a GPU other than to know it would not have to do all of the work? The same o for other support chips. I guessin SOC the support chips like a GPU work for not as much with the CPU. All access to the CPUs are through the IO to the SOC, the CPUs are not working with the other components the other components are working with the system chip.

Just trying to scratch the surface of the backwards statement above.

programmer · February 6, 2005 11:59AM

Hey look, Amorph's got his thinking cap on. That beer must be good stuff.

Quote:

Originally posted by Amorph

Some of the more far-out claims of performance might actually be plausible if you put the same big caveat on them that you'd put on GPU FLOPS numbers: Namely, that you get mind-boggling performance on a narrow set of solutions, and anything from dismal performance to a hardware fault on anything else. GPU floating-point performance is not comparable to CPU floating point performance, because CPUs are generalist and GPUs are not.

Look at something like the Pentium4 or PowerPC 970 -- on some pieces of code it really flies, on other things you get absolutely dismal performance. Take a piece of code written and heavily optimized for a 68000 in 1985 and recompile it for a current processor. Sure it'll run a heck of a lot faster, but if you then rewrite and re-optimize it for the new processor you will see a tremendous performance improvement. So calling a CPU a generalist isn't entirely correct. Its ISA allows it to be used in a general fashion, but a very large set of those uses will run slowly (or at least far below the chip's peak potential).

Quote:

The other advantage of having narrowly purposed processing units is that you can pipeline them deeply and make them go very, very fast. If they don't have to handle many conditionals (i.e., if they mainly plow unconditionally through large amounts of data, like filters and streams) then they can reach near-maximal efficiency doing so.

Your description covers the Pentium4 nicely. Unfortunately the P4 brings with it a whole lot of baggage created over the last 20 years of Intel chip evolution.

Quote:

But this arrangement only works if your system requirements are known and fixed, if the bottlenecks and performance bugbears have been identified, and if processing units are available to efficiently accelerate the bits you need accelerated.

This part of your argument I don't agree with.

programmer · February 6, 2005 12:14PM

Quote:

Originally posted by Brendon

OK I don't know how instructions to a GPU are dispatched. Do they currently go to the CPU and then get routed to the GPU?

Typically a GPU has a few memory mapped control locations, but most of the data is passed via memory buffers. The driver essentially writes a pointer into a special hardware location that tells the GPU to start reading from that location, and then the GPU decodes the data stored there. This is a gross over-simplification, but it gets the idea across.

It is up to the CPU to ensure that the data at the location is meaningful to the GPU. This can be done by using the CPU to fill the buffer with the right data, to read data straight from disk, or even to use the GPU to fill in the buffer (this is often done by rendering an image to a texture and then using the texture in a later render).

As for SoC, here is a useful diagram for a Motorola PowerPC SoC:

This is a Motorola microcontroller, so it is heavily oriented toward I/O. This chip has more I/O capabilities than most entire Macintoshes. That is not what Apple would want -- they can afford another chip to handle I/O in a Mac, so they'd probably replace much of that I/O. Compare this to what was described in the ISSCC program.

For the GPU, remember how huge the GPUs currently are and the need to make them interchangeable. Putting them on the main processor is probably not a good idea if you want a competitive GPU.

brendon · February 6, 2005 1:00PM

Quote:

Originally posted by Programmer

This is a gross over-simplification, but it gets the idea across.

As for SoC, here is a useful diagram for a Motorola PowerPC SoC:

This is a Motorola microcontroller, so it is heavily oriented toward I/O. This chip has more I/O capabilities than most entire Macintoshes. That is not what Apple would want -- they can afford another chip to handle I/O in a Mac, so they'd probably replace much of that I/O. Compare this to what was described in the ISSCC program.

For the GPU, remember how huge the GPUs currently are and the need to make them interchangeable. Putting them on the main processor is probably not a good idea if you want a competitive GPU.

Over-simplification or not thanks for the help just the same.

The diagram is useful, again thanks.

OK Apple and others would not want the GPU on the SoC, which is what I thought, but did o not communicate clearly. What they may want is a special unit that is dedicated to interacting with standard GPUs, removing this burden from the CPU. Other support chips may be delt with the same way. A special unit may only interact with a standard sound board. Each extra unit on the SoC could be made to do a special task very well and yet if done properly still stay very general or standard. Like BASF the extra unit does not make the graphics it makes the graphics chip work better, because it is designed to do this one task very well. Or maybe a better example encoding and decoding. A special unit that does these tasks very well for a variety of formats and data types. Again the CPU does not do this, because this part of the system is built to do that. So it is like Altivec on steriods, but not narrowly focused, broadly focused. So as not to put all functions on the chip, just dedicated units that can make interacting with sound boards and GPUs easier and better. Now if the units can be programmed leading to further flexibility...

ompus · February 6, 2005 10:17PM

Quote:

Originally posted by Programmer

And I'm telling you that you are thinking about it backwards. A system-on-chip does not bring its features to one of its components, components bring their features to a system.

So is CELL's primary feature its ability to bring those components, and hence features, on board?

Quote:

A CELL Processor is a multi-core chip consisting of a 64b Power architecture processor, multiple streaming processors, a flexible IO interface, and a memory interface controller.This SoC is implemented in 90nm SOI technology. The chip is designed with a high degree of modularity and reuse to maximize the custom circuit content and achieve a high-frequency clock-rate.

Whatever a Cell Processor is, its "high degree of modularity" and the ability to reuse circuit content suggest that it is intended to be easily customized.

amorph · February 6, 2005 11:54PM

Quote:

Originally posted by Programmer

Hey look, Amorph's got his thinking cap on. That beer must be good stuff.

It is. I highly recommend it.

Warning: This post was written without the benefit of said brew.

Quote:

Look at something like the Pentium4 or PowerPC 970 -- on some pieces of code it really flies, on other things you get absolutely dismal performance. Take a piece of code written and heavily optimized for a 68000 in 1985 and recompile it for a current processor. Sure it'll run a heck of a lot faster, but if you then rewrite and re-optimize it for the new processor you will see a tremendous performance improvement. So calling a CPU a generalist isn't entirely correct. Its ISA allows it to be used in a general fashion, but a very large set of those uses will run slowly (or at least far below the chip's peak potential).

Well, no, that's just the difference between a generalist processor and a magical processor built by elves.

Badly written or poorly suited code can choke anything. What I mean by generalist is that, like the 970 and the P4, they're designed to solve as many problems as possible as well as is practicable for whatever sort of code the engineers foresaw dealing with. Obviously, if someone decides to throw some code optimized for their old VAX into the mix, all bets are off.

More precisely, I meant generalist as opposed to specialist hardware like the shader engines in GPUs, or the pixel pipeline in the old SGI workstations, etc.

Quote:

This part of your argument I don't agree with.

Hmm. Then, combined with your above answer, I suppose the hint is that I'm binding the hardware and the software a little too close together? Or assuming that too fine a degree of hardware specialization is necessary...

Or I just need a beer.

Either way, I suppose we'll know in a few hours.

programmer · February 7, 2005 12:13AM

Quote:

Originally posted by Amorph

Or I just need a beer.

Definitely the beer.

costique · February 7, 2005 5:01AM

Interesting, isn't it, that the self-same session 10.2 (page 36) features IBM, Sony and Toshiba people at once? Its title, 'The Design and Implementation of a First-Generation CELL Processor', also suggests we will soon know what the heck it's all about. And, I suspect, the implementation word means that the chips themselves are not very far away. Am I right, Programmer? And will we soon be able to discuss the software side of this, i.e. kernel and driver support?

ompus · February 7, 2005 1:41PM

Here are some quotes I pulled off the web about the IBM, Sony, Toshiba Cell ...

Richard Doherty, director of the computer industry analysts Envisioneering, in San Francisco, says:

"At first blush I think it's safe to say that it will be 10 to 20 times faster than the fastest graphics cards and processor... it is going to revolutionise computer science for entertainment and business."

Apparently the Cell has 8 different "Synergistic Processor Elements." These SPE's are SIMD-based.

The prototype Cell chip is 221 sq. mm, integrates 234 million transistors, and is fabricated with 90 nanometer SOI technology by IBM. Using "FlexIO" and "eXtended Data RAM" for chip interconnects, the Cell can theoretically move 100 Gbytes per second of total bandwidth.

Please excuse any mistakes. I don't understand the technology, and I've already seen contradictory statements from the media.

amorph · February 7, 2005 1:48PM

Quote:

Originally posted by Programmer

Definitely the beer.

Faugh! Hoist on my own pétard!

I thought the CPU (or PE, in the Cell terminology) would delegate and play traffic cop, the way they always have on IBM machines, but I was led astray by Hannibal (it's his fault, I tell you! Where's my beer!), and figured the control would be done in software.

When will I learn not to second guess myself? Now I'm even more wrong than I would have been. Dammit.

Well, it's about what I expected, even if I didn't nail it. A nice idea, full of potential, which Apple could use a variant of. Just have Core* JIT compile itself some apulets.

aphelion · February 7, 2005 6:50PM

Quote:

Originally posted by Programmer

Nope, I'm not saying anything of the sort. If they did, however, that information alone is not sufficient to say they share anything (besides bullet points in the feature list) with Cell.

Well it would appear that what the G5 and Cell do share is a 64 bit PPC core with VMX (a.k.a. "Altivec"), additionally the Cell has some Power5 derived features such as SMT.

programmer · February 7, 2005 8:55PM

Quote:

Originally posted by Aphelion

Well it would appear that what the G5 and Cell do share is a 64 bit PPC core with VMX (a.k.a. "Altivec"), additionally the Cell has some Power5 derived features such as SMT.

Features do not the same core make.

Of course if IBM has some technology pieces lying around they are likely to re-use and re-combine them to come up with the new processor they want. Remember that this thing is designed for low cost, low power and very high clock rates... on the same process we get the 970FX. This core is not the same one as you find in the POWER5 servers, but I'd bet that they didn't throw it all out and start from scratch.

Amorph: in their press release they do talk about the Power core playing traffic cop... but that doesn't mean its not a reasonably "Power"ful core in its own right. You did well, young Jedi.

aphelion · February 7, 2005 9:14PM

Quote:

Originally posted by Programmer

Features do not the same core make...

The PowerPC core in the Cell seems to have additional features compared to the current G5, the next revision of the IBM 970xx will, in my opinion, share these features, it is the sharing of these advanced features (and processes) that I have been pointing out all along.

programmer · February 7, 2005 10:30PM

Quote:

Originally posted by Aphelion

The PowerPC core in the Cell seems to have additional features compared to the current G5, the next revision of the IBM 970xx will, in my opinion, share these features, it is the sharing of these advanced features (and processes) that I have been pointing out all along.

Core features yes (e.g. SMT), but not the SoC features (unless the next chip is, in fact, a Cell chip).

hiro · February 7, 2005 11:38PM

Quote:

Originally posted by Aphelion

The PowerPC core in the Cell seems to have additional features compared to the current G5, the next revision of the IBM 970xx will, in my opinion, share these features, it is the sharing of these advanced features (and processes) that I have been pointing out all along.

Some additional, some missing. SMT yes. But no out of order processing on Cell, which misses the 970's huge plus for a lot of the code out in the wild right now. The PPC core will do exceptionally well at properly/carefully scheduled code, and spend a lot of time waiting in the majority real world sloppy code. Hmm, kind of sounds like something designed for carefully hand crafted game console code and high end embedded applications.

I would think this might make for an interesting Asymmetric Multi Processing implementation with a full big brother G5 as one core and a Cell replacing the second core and GPU, but doing it as a full fledged processor, not a satellite unit.

programmer · February 8, 2005 9:59PM

Quote:

Originally posted by Hiro

I would think this might make for an interesting Asymmetric Multi Processing implementation with a full big brother G5 as one core and a Cell replacing the second core and GPU, but doing it as a full fledged processor, not a satellite unit.

Unfortunately this chip's memory interface and bandwidth requirements makes that unlikely. I'd be more interested in an alternate Cell chip with a Power core that includes the POWER4/5-style 5-way group dispatch mechanism, OoOE implementation, and strong scalar FPUs. They might have to clock it at half the speed of the rest of the chip, but even at 2+ GHz it would be an impressive piece of hardware... mated to the Cell's SPEs and high speed buses.

Time will tell.

what happened to all the G6 plans?

Comments