New Powermacs to use Cell Processor?

snoopy · February 25, 2005 11:26AM

Quote:

Originally posted by Programmer

. . . Successful product design projects usually do not directly involve potential customers in the design process. They consider (or ask) what the customer needs, but the project itself is focused and as streamlined as possible. Failure to do that usually results in a project failure. . .

Sorry if I wasn't clear. I have not been advocating customer involvement in the design of Cell, but just getting information to customers, so they can begin work on concepts and product design using Cell. (You are right about "Too many cooks in the kitchen...") Customers will need samples eventually, but since a whole lot of design takes place before samples are needed, STI would want to prevent customers from getting committed to other technology. Receiving information too late could delay Cells introduction in final products by a whole generation. That's product generation, not Biblical generation.

EDIT: Oops, I didn't see the post from wizard69 when I was writing the above. I think there could have been a place for customer involvement in Cell's design, but it would have been in the early stages and no different from how STI might have consulted with many experts before proceeding too far down the design path.

snoopy · February 25, 2005 6:04PM

Quote:

Originally posted by wizard69

. . . Cell doesn't have anyone to duke it out with. I suspect STI's biggest problems will be production ramp and code development. . .

I'm hoping Cell will eventually let the PPC compete with x86 for the personal computer market.

I'm allowed to dream big, right?

Not knowing all that is involved with Cell code development, it seems like Apple's one big software job is to update OS X for Cell's SPE's and thereby accelerate basic OS services. Most applications don't need changing, and those that could benefit form SPEs can be change later, when developers get around to it.

amorph · February 25, 2005 6:34PM

Quote:

Originally posted by wizard69

Doesn't this sort of fly in the face of what Cell accomplished or what AIM accomplished with Alt-Vec. Apple was heavily involved in the development of Alt-Vec and could be involved heavily in the development of Cell.

An Apple employee was the project lead. But the nuts and bolts work was done by Motorola and IBM engineers hunkered down in Somerset.

Quote:

In the general business sense though your statment about customers not being involved in the design process is just wrong. It doesn't matter if it is a ball bearing uint for a car or a wing for an airplane of a PC chip success needs customer involment.

At what level? Do you have the customer leaning over an engineer at a CAD station, telling him where to place the traces? Or do you work out the spec with the customer, leave the implementation to the engineers, and then repeat the spec-implement process until the result pleases the customer? Programmer is simply arguing that you leave the customer out of the implementation phase.

Quote:

Yes all these fabs do indicate that a huge number of Cell based porcessors will soon be on the market. This clearly indicates a wide take up of the design.

Or, it clearly indicates a narrow take up into mass-produced products. In terms of manufacturing capacity, 1,000,000 units shipped per month in one product will be indistinguishable from 1,000 units shipped per month in each of 1,000 products.

Quote:

The evidence is clear that Apple knew about the 970 well before it was publicly announced at ISSCC and it was a year after that when we got good hardware.

Apple is on record as asking IBM to build (what became) the 970. Of course they knew about from the inception.

If they weren't one of the customers who approached IBM at about the same time, looking for what would become Cell, then they could have found out at any later date.

dhagan4755 · February 25, 2005 6:39PM

OK, for the sake of discussion, say Apple is going to use the Cell in their product lines. What would they call the product? Power Mac Cell? PowerBook Cell? Power Mac G6? PowerBook G6?

Lastly, what is the chances Apple will use THIS chip in an Apple product vs a FUTURE Cell? 10% 50% 90%?

brendon · February 25, 2005 6:59PM

Quote:

Originally posted by DHagan4755

OK, for the sake of discussion, say Apple is going to use the Cell in their product lines. What would they call the product? Power Mac Cell? PowerBook Cell? Power Mac G6? PowerBook G6?

Lastly, what is the chances Apple will use THIS chip in an Apple product vs a FUTURE Cell? 10% 50% 90%?

Put me down for the 50% mark because Apple could easily ues this as a coprocessor, but they may ask IBM for a more VMX like SMD since this route would allow for the easiest transition for Apple and the folks that program for the Mac. My guess is more the latter than the former.

programmer · February 25, 2005 11:14PM

Quote:

Originally posted by Brendon

Clarification: If Apple continues to ship the pro computers with two chips and IBM puts two cores on each chip that would be four processors, each with two FPUs and two VMX units for a grand total of 8FPUs and 8VMX units.

If you get 2 970MP chips, then I get 2 Cell chips... and mine can talk to eachother with something like 60 MB/sec bandwidth, plus an aggregate 50 MB/sec or so of main memory bandwidth. And 20 threads of execution, yadda yadda yadda.

Quote:

As you point out, the "Cores" (image, audio, video, data) of Tiger are designed for work across many cores, and Apple will have four cores to spread the work across as well as the GPU. Not saying that Cell is not all that but just saying that the 970 ain't dead either. June will be an interesting month and we will know much more then.

My guess is that 2005 will see the ultimate development of the 970, and starting in 2006 IBM will be flogging Cell to everyone.

Quote:

From Wizard69:

Doesn't this sort of fly in the face of what Cell accomplished or what AIM accomplished with Alt-Vec. Apple was heavily involved in the development of Alt-Vec and could be involved heavily in the development of Cell.

There is no "A" in STI.

Quote:

This simply isn't modern business practice.

Sure it is, especially with something like Cell where clearly Sony was the major motivator. The members of STI drive a radical new design forward, each with their own requirements and plans. Part of that, for IBM at least, is likely to be how they can sell it to other Power customers as part of the Power strategy/ecosystem/whatever. When it is at a point that samples can be produced and results can be demonstrated then you invite over your other potential customers and try to sell it on its merits. For somebody like Apple they could have been shown a few months before the ISSCC announcements. Before that they probably had bubbles on their internal road maps showing a potential product that would be great if it panned out as expected. Really, however, Apple couldn't do much about it before now anyhow except keep architecting their system services to support "additional computational hardware resources".

Quote:

Yes but lets not confuse the two one is a second processor and the other is the result of SMT. The two shouldn't be compared untile we get a handle on how well IBM's SMT works on Cell. Plus we have the possibility that the cores on the MP might be threaded.

Remember that 90% of compute resources are typically consumed by 10% (or less) of the code. The other 90% of the code is usually scalar crap that has to run reasonably fast... but frankly it does just fine even on a 1 GHz G4, and one thread on a 4+ GHz PPE will be plenty fast. The great (and horrible) thing about this crappy scalar code is that half (or more) of the time the processor is stalled on branches or other things, so your other SMT thread can pick up the slack.

The 10% where performance really really matters is usually on high volume computation that is (on the Mac) most often handled with the VMX vector unit. The Cell's SPEs are vector units and vector algorithms can usually be ported fairly easily between different kinds of vector units -- the hard part is figuring out the vector algorithm in the first place. Given that IBM has designed both the VMX and SPE instruction sets, its likely a good bet that they aren't hugely dissimilar... and if GCC is auto-vectorizing for both then it should cover the differences internally. And apparently the SPEs do double precision as well, so things that have to be in the FPUs on the 970 can be done on the the SPEs in the Cell.

So the result is that the Cell runs the scalar Power code about as well as a current generation PowerPC, plus at the same time it runs the really expensive vector calcs at 8-16 times as fast (8 SPE cores at double the current clock rates). Unless you are bandwidth limited, of course, but then the still Cell has you beat by quite a bit.

No, the Cell won't be faster than the 970 on every single piece of code you throw at it.... but it'll absolutely kill it on most of the code you actually care about.

wizard69 · February 26, 2005 2:08AM

Quote:

Originally posted by Amorph

An Apple employee was the project lead. But the nuts and bolts work was done by Motorola and IBM engineers hunkered down in Somerset.

Yes and I bet htat lead was involved deeply with those engineers at Somerset.

Quote:

At what level? Do you have the customer leaning over an engineer at a CAD station, telling him where to place the traces? Or do you work out the spec with the customer, leave the implementation to the engineers, and then repeat the spec-implement process until the result pleases the customer? Programmer is simply arguing that you leave the customer out of the implementation phase.

Maybe not at that level all the time, but do understand that my experiences are with manufacturing equipment. For core equipment the Engineers would be heavily involved with the vendor. You simply would not give up the heart of your machine and your income stream to any problems.

In the case of the 970 I can not see the project succedding without Apple involved. If we accept that Apple did the bridge ASIC, there would have been a huge amount of interaction right there. Further considering Apples deep interest in Vector support I wouldn't be surprised to find a few Apple engineers involved there.

Quote:

Or, it clearly indicates a narrow take up into mass-produced products. In terms of manufacturing capacity, 1,000,000 units shipped per month in one product will be indistinguishable from 1,000 units shipped per month in each of 1,000 products.

I fully expect strong take up into mass produce products. Especially if it is as easy as STI has indicated to spin variants of the processor. This clearly is one of STI's goals, we will probally be bombarded with CELL inside stickers.

Quote:

Apple is on record as asking IBM to build (what became) the 970. Of course they knew about from the inception.

Thus implying the heavy involvment of Apple.

Quote:

If they weren't one of the customers who approached IBM at about the same time, looking for what would become Cell, then they could have found out at any later date.

I would suspect that Apple has been tuned into Cell for a very long time or at least the PPE technology. One of the reasons here is the very blunt nature with which Apple has squashed rumors about a 970 based laptop. The other is the nature of Apples arraingement with IBM, which is basically one of a single supplier single user, which frankly isn't good for anybody.

Dave

wizard69 · February 26, 2005 2:45AM

Quote:

Originally posted by Programmer

My guess is that 2005 will see the ultimate development of the 970, and starting in 2006 IBM will be flogging Cell to everyone.

There is no "A" in STI.

And just how could Apple publicly get involved in Cell before the issue of compatability could be addressed? Even now Apple announcing the switch to Cell would cause a lot of confusion in the non technical world. Many would see it as similar to the trasnisiton from 68K to PPC, even though it is pretty obvious to those with a technical back ground that it isn't an issue.

Quote:

Sure it is, especially with something like Cell where clearly Sony was the major motivator. The members of STI drive a radical new design forward, each with their own requirements and plans.

Yes and that involves being involved in the design of the processor.

Quote:

Part of that, for IBM at least, is likely to be how they can sell it to other Power customers as part of the Power strategy/ecosystem/whatever. When it is at a point that samples can be produced and results can be demonstrated then you invite over your other potential customers and try to sell it on its merits.

I rather think it is more of a case that IBM had this great little processor in the PPE that drinks few amps that they have attempted to sell. Sony says well that is nice but what we really need is lots of SIMD with this type of functionality - can we work together on this.

IBM gets back with Sony explaining to them that they already have a silent partner that may like to get involved but for undisclosed reasons can't be made public. Sony likewise says that they would like to expand manfacturing capability and involve Japanese producers. So we end up with STI as the public face of Cell.

Meanwhile the PPE is almost ready for the silent partner, but then that partner realizes that some of Cells capability could be leveraged to huge advantage by them. They thus delay the *(&^*Book so that they can play in this new high performance field.

As is common in the industry the designers of Cell want to strut their stuff and do so at ISSCC. The silent partner doesn't want to be exposed so they talk all day and night about the SPE's and ignore the PPE. Meanwhile the silent partner produces a bump of a *(&^*Book that is almsot useless but silences the raving lunatics that they have as customers.

Quote:

For somebody like Apple they could have been shown a few months before the ISSCC announcements. Before that they probably had bubbles on their internal road maps showing a potential product that would be great if it panned out as expected. Really, however, Apple couldn't do much about it before now anyhow except keep architecting their system services to support "additional computational hardware resources".

So you do think Apple is in the loop as it appears that we read the same thing. Maybe we disagree on timing. I do suspect that Apple has known about the PPE for as long as the 970 has been around. Further I suspect htat Apple never had any intentions of producing a 970 based laptop.

Quote:

Remember that 90% of compute resources are typically consumed by 10% (or less) of the code. The other 90% of the code is usually scalar crap that has to run reasonably fast... but frankly it does just fine even on a 1 GHz G4, and one thread on a 4+ GHz PPE will be plenty fast. The great (and horrible) thing about this crappy scalar code is that half (or more) of the time the processor is stalled on branches or other things, so your other SMT thread can pick up the slack.

So you like the thought of Cell and its PPE as much as I do!

Quote:

The 10% where performance really really matters is usually on high volume computation that is (on the Mac) most often handled with the VMX vector unit. The Cell's SPEs are vector units and vector algorithms can usually be ported fairly easily between different kinds of vector units -- the hard part is figuring out the vector algorithm in the first place. Given that IBM has designed both the VMX and SPE instruction sets, its likely a good bet that they aren't hugely dissimilar... and if GCC is auto-vectorizing for both then it should cover the differences internally. And apparently the SPEs do double precision as well, so things that have to be in the FPUs on the 970 can be done on the the SPEs in the Cell.

Well here is what is really interestin to me. Just how similar are the SPE's to a standard PPC. There are the previously stated changes to the register file, but the question in my mind is could the AltVec code be runned unmodified on the SPE's. you certainly won't take advantage of the improvements but some code could end up executing on SPE's very easily.

Quote:

So the result is that the Cell runs the scalar Power code about as well as a current generation PowerPC, plus at the same time it runs the really expensive vector calcs at 8-16 times as fast (8 SPE cores at double the current clock rates). Unless you are bandwidth limited, of course, but then the still Cell has you beat by quite a bit.

No, the Cell won't be faster than the 970 on every single piece of code you throw at it.... but it'll absolutely kill it on most of the code you actually care about.

Well this all depends on what sort of code you care about and just how flexible the SPE's are. This may not be the hardware to run database software on for example. Or it could be very good hardware for database type work if the SPE's have that capability. I could see Apple taking a two prong approach, with processors, until the market is happy with what they see happening with Cell. Yes of some markets Cell will run away at incredible speed.

fallenfromthetree · February 26, 2005 7:05AM

I'm sure I missed some crucial element in this discussion,

but what I've read seems to indicate that the Cell will

be PowerPC controlled, thus giving me the impression

that the best use for Cell would not be as the primary CPU

but as a secondary processor or processing buss dedicated to nearly limitless audio and video processing.

So in my view, it seems more likely that Apple might be looking at various uses for a CELL CARD to handle dedicated audio and video and more, thus replacing the need for PCI, PCI-X AND PCI-Express.

So Apple could safely release a dual core Power PC now

with provisions on the motherboard to accept CELL as a future plug-in.

Does any of this theory hold water?

brendon · February 26, 2005 7:16AM

Quote:

Originally posted by FallenFromTheTree

I'm sure I missed some crucial element in this discussion,

but what I've read seems to indicate that the Cell will

be PowerPC controlled, thus giving me the impression

that the best use for Cell would not be as the primary CPU

but as a secondary processor or processing buss dedicated to nearly limitless audio and video processing.

So in my view, it seems more likely that Apple might be looking at various uses for a CELL CARD to handle dedicated audio and video and more, thus replacing the need for PCI, PCI-X AND PCI-Express.

So Apple could safely release a dual core Power PC now

with provisions on the motherboard to accept CELL as a future plug-in.

Does any of this theory hold water?

That has been my thought, but I am just a speculator, how about it Wiz69 and Programmer, what do you think? to me in this scenario Apple gets it all, a great processor for the Cores of Tiger and access to much easier porting and better running games, as well as something for the Science Core.

programmer · February 26, 2005 9:03AM

Quote:

Originally posted by FallenFromTheTree

I'm sure I missed some crucial element in this discussion,

but what I've read seems to indicate that the Cell will

be PowerPC controlled, thus giving me the impression

that the best use for Cell would not be as the primary CPU

but as a secondary processor or processing buss dedicated to nearly limitless audio and video processing.

So in my view, it seems more likely that Apple might be looking at various uses for a CELL CARD to handle dedicated audio and video and more, thus replacing the need for PCI, PCI-X AND PCI-Express.

So Apple could safely release a dual core Power PC now

with provisions on the motherboard to accept CELL as a future plug-in.

Does any of this theory hold water?

No.

What you are missing is that the Power(PC) is on the Cell chip. What people mean when saying that is that the Power core that is part of the chip is controlling the other cores. It is the ring leader. This has nothing to do with putting the Cell in as a co-processor -- that would actually kill performance because any time you try communicating between chips you have to slow down and take a big latency hit (which really hurts at 4+ GHz).

fallenfromthetree · February 26, 2005 9:57AM

Oh well it sounded cool.

Suddenly a 24 month Apple lease program is sounding very attractive.

programmer · February 26, 2005 12:03PM

Quote:

Originally posted by wizard69

I rather think it is more of a case that IBM had this great little processor in the PPE that drinks few amps that they have attempted to sell. Sony says well that is nice but what we really need is lots of SIMD with this type of functionality - can we work together on this.

Yes, except that IBM also had the system-on-chip expertise and ISA design expertise. So hopefully the SPEs aren't the disaster of an ISA that the PS2 vector units are (which is part of why there aren't any C/C++ compilers for the PS2's vector units).

Quote:

IBM gets back with Sony explaining to them that they already have a silent partner that may like to get involved but for undisclosed reasons can't be made public.

This I doubt. They probably didn't even tell Sony they had other customers for this new Power core -- it is an product that IBM created to license to anybody interested. It is very likely they have another big customer for the core... and its not Apple. Apple might be interested, but without the rest of the Cell attached it is much less compelling.

Remember that IBM said explicitly that the 970FX would be a laptop chip. That may have changed since 90nm didn't pan out the way everyone thought, but that doesn't change the fact that they were planning for it to scale down in terms of power.

Quote:

So you do think Apple is in the loop as it appears that we read the same thing. Maybe we disagree on timing. I do suspect that Apple has known about the PPE for as long as the 970 has been around. Further I suspect htat Apple never had any intentions of producing a 970 based laptop.

"In the loop" is very different than being "involved in design". What Apple needed to contribute to the processor could be done between an Apple and an IBM engineer in about 5 minutes of verbal conversation. IBM didn't consult Apple on the design of the 970's core -- they already had it from the POWER4. Apple said "we need VMX", so IBM tacked it on. The bus design was the one point where Apple may have had more influence because they've designed the only 970 northbridge we know of. In the case of Cell, however, that is provided by RamBus.

Quote:

So you like the thought of Cell and its PPE as much as I do!

Probably more. Have I said otherwise? My point is merely that Apple wasn't involved in the chip's development, not that they aren't a likely customer.

Quote:

Well here is what is really interestin to me. Just how similar are the SPE's to a standard PPC. There are the previously stated changes to the register file, but the question in my mind is could the AltVec code be runned unmodified on the SPE's. you certainly won't take advantage of the improvements but some code could end up executing on SPE's very easily.

http://www.realworldtech.com/page.cf...WT021005084318

This article and the one at ArsTechnica should tell you what you need to know. It seems to me that many people that read these articles miss half their content, so go back and read it again... it is packed with useful information.

Quote:

Well this all depends on what sort of code you care about and just how flexible the SPE's are. This may not be the hardware to run database software on for example. Or it could be very good hardware for database type work if the SPE's have that capability. I could see Apple taking a two prong approach, with processors, until the market is happy with what they see happening with Cell. Yes of some markets Cell will run away at incredible speed.

These things are vector processors. It is probably going to be possible to run scalar code on them (IBM said that they were working with "open source compiler writers", i.e. GCC, about providing compilers for SPE), but that doesn't mean it'll run fast.

randycat99 · February 26, 2005 10:12PM

This is a vague memory, but isn't it said somewhere in the ArsTech paper that the SPE draws its origins from a PPC601 core, except with functionality bent toward high performance SIMD operations? I remember reading this somewhere. Like a 601 with all the integer logic torn out and replaced with hardcore floating point plumbing?

programmer · February 26, 2005 10:36PM

Quote:

Originally posted by Randycat99

This is a vague memory, but isn't it said somewhere in the ArsTech paper that the SPE draws its origins from a PPC601 core, except with functionality bent toward high performance SIMD operations? I remember reading this somewhere. Like a 601 with all the integer logic torn out and replaced with hardcore floating point plumbing?

No, he was just drawing an analogy. It is a completely new piece of hardware.

That's how bad rumors get started.

snoopy · February 27, 2005 10:36AM

Quote:

Originally posted by Programmer

. . . And apparently the SPEs do double precision as well, so things that have to be in the FPUs on the 970 can be done on the the SPEs in the Cell. . .

This is what makes the most sense to me. With 8 SPEs doing double precision, it seems wasteful to have 2 FPUs. It's my impression that the FPUs in a typical system sit idle much of the time, but when they are needed they work like crazy. If the SPEs do replace the FPUs, does the OS handle the conversion? That is, if an FPU instruction is issued, does the OS convert this to equivalent SPE instructions? My computer science ignorance is showing here.

There seems to be a confusion about this matter. The ars technica article shows two FPUs in the PPE. However, the chip photo in the Real World Technologies article shows a PPE that appears to be only twice the size of the SPE, neglecting caches. Not big enough for 2 FPUs it would seem. Also, I didn't see the FPU mentioned. However, the confusion in this article its mention that the SPE only do single precision, with other limitations.

Maybe we have to wait to get the facts?

programmer · February 27, 2005 12:08PM

Quote:

Originally posted by snoopy

This is what makes the most sense to me. With 8 SPEs doing double precision, it seems wasteful to have 2 FPUs. It's my impression that the FPUs in a typical system sit idle much of the time, but when they are needed they work like crazy. If the SPEs do replace the FPUs, does the OS handle the conversion? That is, if an FPU instruction is issued, does the OS convert this to equivalent SPE instructions? My computer science ignorance is showing here.

No, the SPEs are seperate cores. The cost of communicating between cores is usually quite high -- and far more than the cost of sharing data between instructions within a single core. A lot of people don't get this, but it is really quite simple... a core is a set of registers and pipelined execution units that are working on a single stream of instructions. These days the processors take 2-5 instructions per clock cycle and feed them into the pipelines. Those instructions have very high speed access to that core's pool of registers, and the kinds of instructions are mixed together. Imagine a single threaded processor executing a stream of integer (I), floating point (F), vector (V), load/store (L, S), and branch (B) instructions. It might look like this:

... LIIS LFIS LVII IFFS BFLI VSLI LBLS ...

Where each group is sent in a single clock cycle into the core to modify that core's registers (I've envisioned a 4-way dispatch here). The results of one instruction are sent to a register and can be used by a later instruction, and how long that takes is a function of the processor design and the kind of instruction -- primarily the length of the pipeline handling the instruction.

If you have multiple cores they typically communicate by writing values out to the bus, and depending on the system it may have to go all the way to main memory, directly across a shared FSB, or to a shared L2 cache. In the case of the Cell it has to go across the shared on chip bus (EIB, I think they call it?). Access to these things takes anywhere from 10-1000 cycles depending on the hardware, and then the core you are sending the data to has to read it.

You can visualize multiple cores as multiple streams of instructions (here is a 4-way and a couple of 2-way dispatch cores):

... LIIS LFIS LVII IFFS BFLI VSLI LBLS LIBS SIFF VILB SIVL LSIF SSIL ...

... BL IF VL IF BB IS LL LI FV VV SI FI FF II IF FS ...

... LL II SS LL FF IF IF IF IF IB SS SS SS SS II SL IF ...

... IL IL IL VV VV VV VV VV VV VV IS IS IS IS IS IS ...

Sending data to the right takes 1-10 cycles, and sending data up or down takes 10-1000 cycles and requires L and S instructions to do it. It should be clear that trying to redirect one kind of instruction to another core would slow the whole show down enormously. Even worse, if you are trying to do this on existing code then the processor would have to recognize an instruction it couldn't deal with and take special action... and this process is very expensive and usually involves throwing out all the instructions currently in the pipeline and executing another set that knows how to send the information to the other core.

Back in the days of 10-50 MHz processors they used to have some execution units on external chips (e.g. the MMU on the 68020 or the FPU on the 68030. That was possible because the chips were so slow relative to the speeds of the connections between the chips. These days, in the multi-GHz era, even the time to send signals between cores on the same chip is too high compared to how fast the cores are grinding through instructions.

Quote:

There seems to be a confusion about this matter. The ars technica article shows two FPUs in the PPE. However, the chip photo in the Real World Technologies article shows a PPE that appears to be only twice the size of the SPE, neglecting caches. Not big enough for 2 FPUs it would seem. Also, I didn't see the FPU mentioned. However, the confusion in this article its mention that the SPE only do single precision, with other limitations.

Maybe we have to wait to get the facts?

I'd be surprised if the PPE had two FPUs -- it is designed for low power and small size so that they can get as many SPEs on the chip as possible. Also, since it is a dual-issue chip it doesn't make sense for both of the instructions issued to be FPU instructions (that would happen quite rarely). Hopefully they just put one good FPU into the PPE.

As for the SPE confusion -- RWT says that the double precision is much slower than the single precision. This makes me wonder if they are doing some kind of emulation to achieve the higher precision. Apple has some routines for doing quad precision 128-bit floating point operations in the VMX unit... I wonder if this is along the same lines?

imiloa · February 27, 2005 1:33PM

Quote:

Originally posted by Programmer

These days, in the multi-GHz era, even the time to send signals between cores on the same chip is too high compared to how fast the cores are grinding through instructions.

i know you're right on this. but just for the sake of discussion...

i haven't been following the specifics closely, but as i understand it, OS X's rendering technology is evolving to offload vector tasks to the GPU, right? one of the advantages of PCI-express is the bandwidth both to and *from* the GPU, allowing the OS to leverage the GPU's vector logic for non-graphics tasks.

well, the bandwidth from CPU to GPU, even over PCIexpress is going to be significantly slower than what could be arranged on a daughter card, right?

so if apple is already investing in the GPU concept, wouldn't daughter-card coprocessing be worthwhile to explore?

all this said, i completely agree that bundling everything on one die is the way to go. just seems like inefficient coprocessing is already underway.

hiro · February 27, 2005 3:04PM

Quote:

Originally posted by snoopy

This is what makes the most sense to me. With 8 SPEs doing double precision, it seems wasteful to have 2 FPUs. It's my impression that the FPUs in a typical system sit idle much of the time, but when they are needed they work like crazy. If the SPEs do replace the FPUs, does the OS handle the conversion? That is, if an FPU instruction is issued, does the OS convert this to equivalent SPE instructions? My computer science ignorance is showing here.

There seems to be a confusion about this matter. The ars technica article shows two FPUs in the PPE. However, the chip photo in the Real World Technologies article shows a PPE that appears to be only twice the size of the SPE, neglecting caches. Not big enough for 2 FPUs it would seem. Also, I didn't see the FPU mentioned. However, the confusion in this article its mention that the SPE only do single precision, with other limitations.

Maybe we have to wait to get the facts?

The SPE's have the capability to do double precision the old fashioned way, multiple single precision ops. DP work on an SPE takes about a 10x speed hit for that conversion. Sure you can do it, but it ain't blindingly fast, just another current generation high end processor. The design was a specific tradeoff by Sony to do the things common in media really fast, while sacrificing the high-end scientific processing niche uber-capability. The right call for their target market and control of design costs.

programmer · February 27, 2005 4:38PM

Quote:

Originally posted by imiloa

i know you're right on this. but just for the sake of discussion...

i haven't been following the specifics closely, but as i understand it, OS X's rendering technology is evolving to offload vector tasks to the GPU, right? one of the advantages of PCI-express is the bandwidth both to and *from* the GPU, allowing the OS to leverage the GPU's vector logic for non-graphics tasks.

well, the bandwidth from CPU to GPU, even over PCIexpress is going to be significantly slower than what could be arranged on a daughter card, right?

so if apple is already investing in the GPU concept, wouldn't daughter-card coprocessing be worthwhile to explore?

all this said, i completely agree that bundling everything on one die is the way to go. just seems like inefficient coprocessing is already underway.

Don't misunderstand -- I was answering a specific question, which was "can we offload a functional unit of the CPU (i.e. FPU or VMX) to SPEs"? The answer to that question is "no, because the FPU and VMX units (and their instructions) are too tightly coupled to the Power core (and its instruction stream)".

The GPU, on the other hand, is quite seperate and isn't used to process a Power instruction stream. The tasks handed to it are also usually, by their nature, asynchronous to the computations running on the CPU (i.e. take this vertex and texture data and go fill in a frame buffer). Other kinds of computation (i.e. XGrid) are designed to take chunks of computation and send them far, far away (i.e. across a network) to be computed by somebody else and send the results back when done.

It is possible to put a Cell on a motherboard or add-in board as a co-processor... I just don't think it is nearly as interesting as making all the processors in the system Cell's. Sending a computation to an on-chip SPE is going to be much more efficient than sending it to another Cell, or across a LAN, or even across the Internet. The tight coupling of the Power core to 8 SPEs via the very very fast EIB on-chip bus is a big part of what makes the Cell so compelling. If Cell turns into "The Next Big Thing" then the x86 boys will have to take this route, at least until Intel and/or AMD jump on the board with IBM (if IBM is interested in helping x86 live... I doubt it). Systems already using a Power(PC) have the huge advantage of this being built right into the Cell. Why waste 60W on a 970 when you could put another Cell in there and get the PPE, 8 SPEs, memory controller, and high speed I/O instead?

New Powermacs to use Cell Processor?

Comments