For all you guys who said Intel's roadmap is more predictable...

programmer · November 3, 2005 9:31AM

Quote:

Originally posted by THT

Who's going to have enough market to support a 20 billion $ fab for 22 nm in 2012+? The only one I can think of is the US Federal government.

Yeah, right -- in between funding wars and hurricane repairs. I think Bill Gates has more chance of funding it personally than the US gov. Besides, its not even clear that sub-45nm devices can actually work and work reliably. Heck, even 45nm is going to be dicey.

Quote:

AMD's K10 is delayed or dead

New cores high end cores are insanely expensive and difficult to develop, it should surprise no one that projects are failing. I've said it before, and I'll say it again: Cell is the harbringer of the future. Many simple cores on one chip running at a high speed. Gives better yields, allows higher clock rates, and delivers better performance for software designed to be concurrent.

tht · November 3, 2005 10:12AM

Quote:

Originally posted by Programmer

Yeah, right -- in between funding wars and hurricane repairs. I think Bill Gates has more chance of funding it personally than the US gov. Besides, its not even clear that sub-45nm devices can actually work and work reliably. Heck, even 45nm is going to be dicey.

Yes. True. It's going to be very dicey from now on. But, I have to imagine that sub-45 nm CMOS manufacturing will be considered a capital asset and therefore will receive government subsidies when the CEOs come calling for their local Congresspeople.

Quote:

New cores high end cores are insanely expensive and difficult to develop, it should surprise no one that projects are failing. I've said it before, and I'll say it again: Cell is the harbringer of the future. Many simple cores on one chip running at a high speed. Gives better yields, allows higher clock rates, and delivers better performance for software designed to be concurrent.

CMP, yes of course. But Cell really isn't a harbinger of that. Simplified cores, but many more of them them contemporary CMP, I don't think so, yet. Software is inherently lazy and complicated cores offer a big payoff. Cores will likely stay complicated with OOOE, "higher order" instruction execution (FMADDs, micro/macro ops fusion), and all the other complicated stuff I have no idea about.

sybaritic · November 3, 2005 8:07PM

In terms of real world user experience (for those of us who will be working with 1080p video, for instance), what sort of speed gains are we likely to see in the next three to five years? Assuming that the 45 nm process isn't a complete train wreck ...

programmer · November 4, 2005 1:02AM

Quote:

Originally posted by Sybaritic

In terms of real world user experience (for those of us who will be working with 1080p video, for instance), what sort of speed gains are we likely to see in the next three to five years? Assuming that the 45 nm process isn't a complete train wreck ...

It will be highly dependent on what you're doing with the machine, how well your software vendor has taken advantage of multiple cores and AltiVec/SSE, how memory bound their algorithms are, what (currently non-existant) microprocessor internal architecture you are using, and what happens in terms of memory bandwidth. Way too many variables to make a good prediction about what the impact of having 10-50 cores and a billion transistors on your chip will be.

programmer · November 4, 2005 1:09AM

Quote:

Originally posted by THT

Cores will likely stay complicated with OOOE, "higher order" instruction execution (FMADDs, micro/macro ops fusion), and all the other complicated stuff I have no idea about.

We'll see... the game changed at 90nm, and bigger changes are coming at smaller sizes. Things which really alter the equation. There are degrees of core complexity as well -- the current Pentium 4 is, what, 150 million transistors? The 970 core is about 55 million. The Pentium3 was ~25 million? And those are all OoOE cores. The 604e and Pentium were ~10 million, IIRC. With a billion transistors you could fit something like 100 Pentiums.

EDIT: I just checked and I was wrong about the 604e/Pentium -- they are only in the 5 million transistor range. So you could stretch their pipelines out quite a bit, give them a fair bit of local memory, and still have 50 of them in a billion transistors (making a Cell-style architecture).

tht · November 4, 2005 9:13AM

Quote:

Originally posted by Programmer

We'll see... the game changed at 90nm, and bigger changes are coming at smaller sizes. Things which really alter the equation. There are degrees of core complexity as well -- the current Pentium 4 is, what, 150 million transistors? The 970 core is about 55 million. The Pentium3 was ~25 million? And those are all OoOE cores. The 604e and Pentium were ~10 million, IIRC. With a billion transistors you could fit something like 100 Pentiums.

Yes, you could do that. But I would still bet that the processor with a billion transistors will do certain things 100 times faster than 100 Pentiums on the same die. Certain things that are very important to the user.

It's the same question as before. The software mountain to climb for optimizing multithreaded apps is much more costly than developing hardware to accelerate sloppy code.

cubist · November 4, 2005 10:20AM

But a die with, say, 100 cores, could have 20% of them lasered out if they were defective, allowing for a much higher yield. Plus, the simpler cores are easier to design and deliver, with fewer 'errata'. Compiler technology is not sitting still, and programmers are learning to deliver multithreaded code.

Programmer's Cell approach is much more forward-looking. We can deliver systems-on-a-chip with a common architecture and a streamlined instruction set. As Apple used to advertise, "Simplicity is the ultimate sophistication".

programmer · November 4, 2005 10:36AM

Quote:

Originally posted by THT

Yes, you could do that. But I would still bet that the processor with a billion transistors will do certain things 100 times faster than 100 Pentiums on the same die. Certain things that are very important to the user.

You'll lose that bet. Look at the relative performance of IA-64 and lesser cores -- the difference is much much smaller than the ratio between the transistor counts. And a billion transistor single processor chip will not run at a higher clockspeed, in fact it'll probably run at less than 2 GHz. Having a single processor implies the need to communicate across the whole die, where as tiny cores only communicate within their (very small) own core. The shorter lines of communication mean hugely lower power levels, and much high potential clock rates. That's why Cell has been clocked as high as 5.6 GHz on 90 nm, and this gets more extreme at 65 and 45 nm. Furthermore, nobody can design a huge processor that is that complex, but we can replicate a simple core 100 times (and test it to prove that it works). And what does a processor designer do with that many transistors, besides a whole lot of cache? Wider super-scalar and deeper pipelines have already passed the point of useful gains. SIMD has already reaped most of its benefit at lower transistor counts. Bigger buffers don't make much of a marginal difference. SMT/HyperThreading... well that's pretty close to more cores.

We are seeing very significant diminishing returns on increased processor complexity.

Quote:

It's the same question as before. The software mountain to climb for optimizing multithreaded apps is much more costly than developing hardware to accelerate sloppy code.

You let us software guys worry about that. We've passed the inflection point in the curves.

pbg4 dude · November 4, 2005 12:17PM

At this point, with multiple cores becoming more present down to the consumer level (Athlon X2, Pentium D), $2B spent on compiler development would net more of a gain compared to $2B on CPU/fab development. At least it seems that way to me.

Besides, if we want computers to advance in areas like facial recognition, etc., parallel processing will need to be utilized. You can look at someone's face and instantly recognize if you know them or not. A single/dual/quad processor can't work in this parallel fashion, they have to examine the whole image bit by bit. If you had a thousand processors that could 'see' the whole picture, new software avenues become open for exploration.

tht · November 4, 2005 3:45PM

Quote:

Originally posted by Programmer

You'll lose that bet. Look at the relative performance of IA-64 and lesser cores -- the difference is much much smaller than the ratio between the transistor counts. And a billion transistor single processor chip will not run at a higher clockspeed, in fact it'll probably run at less than 2 GHz. Having a single processor implies the need to communicate across the whole die, where as tiny cores only communicate within their (very small) own core.

Take a step back. The 1 billion transistor single core processor vs the 1 billion transistor 100 core processor is an exercise in hyperbole. You only get 32 with Cell SPEs

I had in mind 4 complicated cores for that 1 billion transistors, not one. Not to mention that an ~8 stage pipeline, 6-wide issue in-order CPU was not on my mind either.

The basic proposition we're discussing is that many simplified cores such as used in Cell are better in performance than a few complicated cores (970mp, etc). Yeah, I think there is a big payoff for a complicated core based on what we've talked before. Multithreaded app design is hard, only a few threads will actually do compute intensive work (<4), and legacy software will not run well on these simplified cores. They will all contribute to CMP chips with more complicated cores being successful rather than the Cell model.

Quote:

We are seeing very significant diminishing returns on increased processor complexity.

I can agree that OOOE, superscalar and deep pipelining have reached or are reaching their limits. I don't agree that there aren't other techniques available, possibly hardware-based media encoding instructions and higher order math instructions.

Quote:

You let us software guys worry about that. We've passed the inflection point in the curves.

I await for those Cell/PS3 games that outclass x86/PC games then.

tht · November 4, 2005 3:55PM

Quote:

Originally posted by PBG4 Dude

At this point, with multiple cores becoming more present down to the consumer level (Athlon X2, Pentium D), $2B spent on compiler development would net more of a gain compared to $2B on CPU/fab development. At least it seems that way to me.

Oh, Intel has probably spent more than $2G on Itanium compiler development. Probably just as much if not more on Pentium compiler development.

But no, fab development is the number asset for a semiconductor manufacturer and that's where investment money should go.

Quote:

Besides, if we want computers to advance in areas like facial recognition, etc., parallel processing will need to be utilized. You can look at someone's face and instantly recognize if you know them or not. A single/dual/quad processor can't work in this parallel fashion, they have to examine the whole image bit by bit. If you had a thousand processors that could 'see' the whole picture, new software avenues become open for exploration.

Those one thousand processors, still see the whole image bit by bit.

programmer · November 4, 2005 4:16PM

Quote:

Originally posted by THT

Take a step back. The 1 billion transistor single core processor vs the 1 billion transistor 100 core processor is an exercise in hyperbole. You only get 32 with Cell SPEs I had in mind 4 complicated cores for that 1 billion transistors, not one. Not to mention that an ~8 stage pipeline, 6-wide issue in-order CPU was not on my mind either.

Fair enough, but I think the "ideal" balance is more like 20 x 50M or 40 x 25M. Something like 20-30 stage 2-4 issue (possibly OoOE) cores (with SIMD) clocked at 4+ GHz. Possibly with local memory (Cell style) instead of cache.

The specialized circuits/instructions you mention are already appearing, but are typically additional functional units attached to the on-chip SoC bus. No need to complicate the core(s) by building these things into them. Modularity is desireable from a hardware design and fab point of view.

Quote:

Multithreaded app design is hard, only a few threads will actually do compute intensive work (<4)

The genius of the Cell model is that it will make your statement incorrect.

powerdoc · November 5, 2005 12:52AM

I don't think that the cell is a desktop chip.

For me the future of architecture is a 10-20 stage multipipelined multicore architecture with large caches and fast interconnection and a great SIMD unit. This chip will also have to deal with the bus bottleneck. Currently this core do not exist.

The cell is a specialised architecture recquiering a severe software optimisation. It has an astounding level of performance on the paper, but I wait this chip on currents applications (as it is a console chip, I may wait for a long time).

nepy05 · November 5, 2005 5:05AM

Quote:

Originally posted by Lemon Bon Bon

Mandatory reading for all those PPC fan boys.

Lemon Bon Bon

However, one goes up, one must come down. i think people will keep on crashing the clumsy X86 even after Apple's surrender. May be using a more efficient Linux-PPC system.

Personally, I hope they can keep the G5-iMac for few years, it is a nice machine and I use it as a semi-laptop. Any of you try to use a battery to load an iMac? I think it is possible.

I don't worry much about the Gaming. Cell will destroy any X86 on that side. Maybe MicroSoft will light up the future of PPC line

programmer · November 5, 2005 2:12PM

Quote:

Originally posted by Powerdoc

I don't think that the cell is a desktop chip.

For me the future of architecture is a 10-20 stage multipipelined multicore architecture with large caches and fast interconnection and a great SIMD unit. This chip will also have to deal with the bus bottleneck. Currently this core do not exist.

The cell is a specialised architecture recquiering a severe software optimisation. It has an astounding level of performance on the paper, but I wait this chip on currents applications (as it is a console chip, I may wait for a long time).

You do realize that, aside from the large caches, you basically described the Cell, right? And I suggest that large caches are exactly what we do not want in a heavily multi-processor machine. Multi-threaded programs are hard to write and optimize, in large part, because of the cached shared memory model. Cell adopts an explicit model so that main memory access is controlled explicitly by software. The leading objection to this model is that it doesn't work with existing software, but then (almost) everyone agrees that the current model for multi-threading is hugely problematic and error-prone, so perhaps this is actually a very good idea in the long run.

And drop the idea that the Cell is only a console chip. The first Cell is, yes, but the Cell is an architecture and if IBM has its way, no doubt there will be future variations. Beefier Power cores, more processors, more I/O ports, different memory controllers, etc. are all possible (and the other way too --scaled down instead of up).

nautical · November 5, 2005 2:21PM

What changes are coming from this development in the industry, where CPU-manufacturers can less and less depend on process-shrinking and Megahertz-adding for gained performance?

I've read a lot of people slamming the x86 for instance, calling it crufty, aged and inelegant, and that development resources would be better spent elsewhere, to paraphrase John Siracusa.

Will this development in the industry eventually force such a change and a total redesign of the fundamentals inside our future CPU:s?

powerdoc · November 6, 2005 11:18AM

Quote:

Originally posted by Programmer

You do realize that, aside from the large caches, you basically described the Cell, right? And I suggest that large caches are exactly what we do not want in a heavily multi-processor machine. Multi-threaded programs are hard to write and optimize, in large part, because of the cached shared memory model. Cell adopts an explicit model so that main memory access is controlled explicitly by software. The leading objection to this model is that it doesn't work with existing software, but then (almost) everyone agrees that the current model for multi-threading is hugely problematic and error-prone, so perhaps this is actually a very good idea in the long run.

And drop the idea that the Cell is only a console chip. The first Cell is, yes, but the Cell is an architecture and if IBM has its way, no doubt there will be future variations. Beefier Power cores, more processors, more I/O ports, different memory controllers, etc. are all possible (and the other way too --scaled down instead of up).

Perhaps large cache are nightmare, but for today software, large cache lead to better performances.

Cell architecture is perhaps better but recquiere a complete rewriting of all softwares.

programmer · November 6, 2005 1:53PM

Quote:

Originally posted by Powerdoc

Perhaps large cache are nightmare, but for today software, large cache lead to better performances.

Cell architecture is perhaps better but recquiere a complete rewriting of all softwares.

Most of today's software doesn't leverage multi-core as it is, so the difference isn't as big as you make it out to be. It would be straightforward for IBM to build a Cell with 2 Power cores in addition to all the vector processors, and that would run all the dual core oriented software just fine. It has been repeatedly demonstrated, however, that that kind of software does not scale well to more processors. Refactoring the design to the Cell model will scale further, and will likely scale across distributed networks of machines as well (the Cell chip is essentially a distributed network on a chip). When chips with 10-100 processors arrive it is the software that has been reformated for that model which will succeed.

powerdoc · November 6, 2005 3:26PM

Quote:

Originally posted by Programmer

Most of today's software doesn't leverage multi-core as it is, so the difference isn't as big as you make it out to be. It would be straightforward for IBM to build a Cell with 2 Power cores in addition to all the vector processors, and that would run all the dual core oriented software just fine. It has been repeatedly demonstrated, however, that that kind of software does not scale well to more processors. Refactoring the design to the Cell model will scale further, and will likely scale across distributed networks of machines as well (the Cell chip is essentially a distributed network on a chip). When chips with 10-100 processors arrive it is the software that has been reformated for that model which will succeed.

Your arguments seems convincing, but why Apple did not choose the cell architecture ?. If my memory is correct, Apple considered this choice.

sybaritic · November 6, 2005 11:00PM

Knowing next to nothing about coding and the cell versus Intel debate, I nonetheless suspect that Apple looked into the near future and saw that Intel could come through with low-heat chips for laptops, indefinitely. Programmer's model clearly leaves open the possibility that Apple might someday "return" to the nascent IBM approach (a la cell), perhaps at a time when Apple's presumed greater market share will make the software optimization process less onerous and more financially attractive to developers. In other words, ten or fifteen years down the line, who can say?

But returning to real world speed advances in the next five years or so, I understand Programmer's reluctance to make any hard predictions - particularly considering all of the variables involved - but I can't help but enjoy the sound of "having 10-50 cores and a billion transistors on your chip," whatever speed that brings! As the U.S. president shortsightedly said, "Bring it on."

Merci, Programmer.

For all you guys who said Intel's roadmap is more predictable...

Comments