For all you guys who said Intel's roadmap is more predictable...

24

Comments

  • Reply 21 of 63
    programmerprogrammer Posts: 3,458member
    Quote:

    Originally posted by THT

    Who's going to have enough market to support a 20 billion $ fab for 22 nm in 2012+? The only one I can think of is the US Federal government.



    Yeah, right -- in between funding wars and hurricane repairs. I think Bill Gates has more chance of funding it personally than the US gov. Besides, its not even clear that sub-45nm devices can actually work and work reliably. Heck, even 45nm is going to be dicey.



    Quote:

    AMD's K10 is delayed or dead



    New cores high end cores are insanely expensive and difficult to develop, it should surprise no one that projects are failing. I've said it before, and I'll say it again: Cell is the harbringer of the future. Many simple cores on one chip running at a high speed. Gives better yields, allows higher clock rates, and delivers better performance for software designed to be concurrent.
  • Reply 22 of 63
    thttht Posts: 5,452member
    Quote:

    Originally posted by Programmer

    Yeah, right -- in between funding wars and hurricane repairs. I think Bill Gates has more chance of funding it personally than the US gov. Besides, its not even clear that sub-45nm devices can actually work and work reliably. Heck, even 45nm is going to be dicey.



    Yes. True. It's going to be very dicey from now on. But, I have to imagine that sub-45 nm CMOS manufacturing will be considered a capital asset and therefore will receive government subsidies when the CEOs come calling for their local Congresspeople.



    Quote:

    New cores high end cores are insanely expensive and difficult to develop, it should surprise no one that projects are failing. I've said it before, and I'll say it again: Cell is the harbringer of the future. Many simple cores on one chip running at a high speed. Gives better yields, allows higher clock rates, and delivers better performance for software designed to be concurrent.



    CMP, yes of course. But Cell really isn't a harbinger of that. Simplified cores, but many more of them them contemporary CMP, I don't think so, yet. Software is inherently lazy and complicated cores offer a big payoff. Cores will likely stay complicated with OOOE, "higher order" instruction execution (FMADDs, micro/macro ops fusion), and all the other complicated stuff I have no idea about.
  • Reply 23 of 63
    In terms of real world user experience (for those of us who will be working with 1080p video, for instance), what sort of speed gains are we likely to see in the next three to five years? Assuming that the 45 nm process isn't a complete train wreck ...
  • Reply 24 of 63
    programmerprogrammer Posts: 3,458member
    Quote:

    Originally posted by Sybaritic

    In terms of real world user experience (for those of us who will be working with 1080p video, for instance), what sort of speed gains are we likely to see in the next three to five years? Assuming that the 45 nm process isn't a complete train wreck ...



    It will be highly dependent on what you're doing with the machine, how well your software vendor has taken advantage of multiple cores and AltiVec/SSE, how memory bound their algorithms are, what (currently non-existant) microprocessor internal architecture you are using, and what happens in terms of memory bandwidth. Way too many variables to make a good prediction about what the impact of having 10-50 cores and a billion transistors on your chip will be.
  • Reply 25 of 63
    programmerprogrammer Posts: 3,458member
    Quote:

    Originally posted by THT

    Cores will likely stay complicated with OOOE, "higher order" instruction execution (FMADDs, micro/macro ops fusion), and all the other complicated stuff I have no idea about.



    We'll see... the game changed at 90nm, and bigger changes are coming at smaller sizes. Things which really alter the equation. There are degrees of core complexity as well -- the current Pentium 4 is, what, 150 million transistors? The 970 core is about 55 million. The Pentium3 was ~25 million? And those are all OoOE cores. The 604e and Pentium were ~10 million, IIRC. With a billion transistors you could fit something like 100 Pentiums.







    EDIT: I just checked and I was wrong about the 604e/Pentium -- they are only in the 5 million transistor range. So you could stretch their pipelines out quite a bit, give them a fair bit of local memory, and still have 50 of them in a billion transistors (making a Cell-style architecture).

  • Reply 26 of 63
    thttht Posts: 5,452member
    Quote:

    Originally posted by Programmer

    We'll see... the game changed at 90nm, and bigger changes are coming at smaller sizes. Things which really alter the equation. There are degrees of core complexity as well -- the current Pentium 4 is, what, 150 million transistors? The 970 core is about 55 million. The Pentium3 was ~25 million? And those are all OoOE cores. The 604e and Pentium were ~10 million, IIRC. With a billion transistors you could fit something like 100 Pentiums.



    Yes, you could do that. But I would still bet that the processor with a billion transistors will do certain things 100 times faster than 100 Pentiums on the same die. Certain things that are very important to the user.



    It's the same question as before. The software mountain to climb for optimizing multithreaded apps is much more costly than developing hardware to accelerate sloppy code.
  • Reply 27 of 63
    cubistcubist Posts: 954member
    But a die with, say, 100 cores, could have 20% of them lasered out if they were defective, allowing for a much higher yield. Plus, the simpler cores are easier to design and deliver, with fewer 'errata'. Compiler technology is not sitting still, and programmers are learning to deliver multithreaded code.



    Programmer's Cell approach is much more forward-looking. We can deliver systems-on-a-chip with a common architecture and a streamlined instruction set. As Apple used to advertise, "Simplicity is the ultimate sophistication".
  • Reply 28 of 63
    Quote:

    Originally posted by THT

    Yes, you could do that. But I would still bet that the processor with a billion transistors will do certain things 100 times faster than 100 Pentiums on the same die. Certain things that are very important to the user.



    You'll lose that bet. Look at the relative performance of IA-64 and lesser cores -- the difference is much much smaller than the ratio between the transistor counts. And a billion transistor single processor chip will not run at a higher clockspeed, in fact it'll probably run at less than 2 GHz. Having a single processor implies the need to communicate across the whole die, where as tiny cores only communicate within their (very small) own core. The shorter lines of communication mean hugely lower power levels, and much high potential clock rates. That's why Cell has been clocked as high as 5.6 GHz on 90 nm, and this gets more extreme at 65 and 45 nm. Furthermore, nobody can design a huge processor that is that complex, but we can replicate a simple core 100 times (and test it to prove that it works). And what does a processor designer do with that many transistors, besides a whole lot of cache? Wider super-scalar and deeper pipelines have already passed the point of useful gains. SIMD has already reaped most of its benefit at lower transistor counts. Bigger buffers don't make much of a marginal difference. SMT/HyperThreading... well that's pretty close to more cores.



    We are seeing very significant diminishing returns on increased processor complexity.



    Quote:

    It's the same question as before. The software mountain to climb for optimizing multithreaded apps is much more costly than developing hardware to accelerate sloppy code.



    You let us software guys worry about that. We've passed the inflection point in the curves.
  • Reply 29 of 63
    pbg4 dudepbg4 dude Posts: 1,611member
    At this point, with multiple cores becoming more present down to the consumer level (Athlon X2, Pentium D), $2B spent on compiler development would net more of a gain compared to $2B on CPU/fab development. At least it seems that way to me.



    Besides, if we want computers to advance in areas like facial recognition, etc., parallel processing will need to be utilized. You can look at someone's face and instantly recognize if you know them or not. A single/dual/quad processor can't work in this parallel fashion, they have to examine the whole image bit by bit. If you had a thousand processors that could 'see' the whole picture, new software avenues become open for exploration.
  • Reply 30 of 63
    thttht Posts: 5,452member
    Quote:

    Originally posted by Programmer

    You'll lose that bet. Look at the relative performance of IA-64 and lesser cores -- the difference is much much smaller than the ratio between the transistor counts. And a billion transistor single processor chip will not run at a higher clockspeed, in fact it'll probably run at less than 2 GHz. Having a single processor implies the need to communicate across the whole die, where as tiny cores only communicate within their (very small) own core.



    Take a step back. The 1 billion transistor single core processor vs the 1 billion transistor 100 core processor is an exercise in hyperbole. You only get 32 with Cell SPEs I had in mind 4 complicated cores for that 1 billion transistors, not one. Not to mention that an ~8 stage pipeline, 6-wide issue in-order CPU was not on my mind either.



    The basic proposition we're discussing is that many simplified cores such as used in Cell are better in performance than a few complicated cores (970mp, etc). Yeah, I think there is a big payoff for a complicated core based on what we've talked before. Multithreaded app design is hard, only a few threads will actually do compute intensive work (<4), and legacy software will not run well on these simplified cores. They will all contribute to CMP chips with more complicated cores being successful rather than the Cell model.



    Quote:

    We are seeing very significant diminishing returns on increased processor complexity.



    I can agree that OOOE, superscalar and deep pipelining have reached or are reaching their limits. I don't agree that there aren't other techniques available, possibly hardware-based media encoding instructions and higher order math instructions.



    Quote:

    You let us software guys worry about that. We've passed the inflection point in the curves.



    I await for those Cell/PS3 games that outclass x86/PC games then.
  • Reply 31 of 63
    thttht Posts: 5,452member
    Quote:

    Originally posted by PBG4 Dude

    At this point, with multiple cores becoming more present down to the consumer level (Athlon X2, Pentium D), $2B spent on compiler development would net more of a gain compared to $2B on CPU/fab development. At least it seems that way to me.



    Oh, Intel has probably spent more than $2G on Itanium compiler development. Probably just as much if not more on Pentium compiler development.



    But no, fab development is the number asset for a semiconductor manufacturer and that's where investment money should go.



    Quote:

    Besides, if we want computers to advance in areas like facial recognition, etc., parallel processing will need to be utilized. You can look at someone's face and instantly recognize if you know them or not. A single/dual/quad processor can't work in this parallel fashion, they have to examine the whole image bit by bit. If you had a thousand processors that could 'see' the whole picture, new software avenues become open for exploration.



    Those one thousand processors, still see the whole image bit by bit.
  • Reply 32 of 63
    programmerprogrammer Posts: 3,458member
    Quote:

    Originally posted by THT

    Take a step back. The 1 billion transistor single core processor vs the 1 billion transistor 100 core processor is an exercise in hyperbole. You only get 32 with Cell SPEs I had in mind 4 complicated cores for that 1 billion transistors, not one. Not to mention that an ~8 stage pipeline, 6-wide issue in-order CPU was not on my mind either.



    Fair enough, but I think the "ideal" balance is more like 20 x 50M or 40 x 25M. Something like 20-30 stage 2-4 issue (possibly OoOE) cores (with SIMD) clocked at 4+ GHz. Possibly with local memory (Cell style) instead of cache.



    The specialized circuits/instructions you mention are already appearing, but are typically additional functional units attached to the on-chip SoC bus. No need to complicate the core(s) by building these things into them. Modularity is desireable from a hardware design and fab point of view.



    Quote:

    Multithreaded app design is hard, only a few threads will actually do compute intensive work (<4)



    The genius of the Cell model is that it will make your statement incorrect.
  • Reply 33 of 63
    powerdocpowerdoc Posts: 8,123member
    I don't think that the cell is a desktop chip.

    For me the future of architecture is a 10-20 stage multipipelined multicore architecture with large caches and fast interconnection and a great SIMD unit. This chip will also have to deal with the bus bottleneck. Currently this core do not exist.



    The cell is a specialised architecture recquiering a severe software optimisation. It has an astounding level of performance on the paper, but I wait this chip on currents applications (as it is a console chip, I may wait for a long time).
  • Reply 34 of 63
    Quote:

    Originally posted by Lemon Bon Bon

    Mandatory reading for all those PPC fan boys.



    Lemon Bon Bon




    However, one goes up, one must come down. i think people will keep on crashing the clumsy X86 even after Apple's surrender. May be using a more efficient Linux-PPC system.

    Personally, I hope they can keep the G5-iMac for few years, it is a nice machine and I use it as a semi-laptop. Any of you try to use a battery to load an iMac? I think it is possible.

    I don't worry much about the Gaming. Cell will destroy any X86 on that side. Maybe MicroSoft will light up the future of PPC line
  • Reply 35 of 63
    programmerprogrammer Posts: 3,458member
    Quote:

    Originally posted by Powerdoc

    I don't think that the cell is a desktop chip.

    For me the future of architecture is a 10-20 stage multipipelined multicore architecture with large caches and fast interconnection and a great SIMD unit. This chip will also have to deal with the bus bottleneck. Currently this core do not exist.



    The cell is a specialised architecture recquiering a severe software optimisation. It has an astounding level of performance on the paper, but I wait this chip on currents applications (as it is a console chip, I may wait for a long time).




    You do realize that, aside from the large caches, you basically described the Cell, right? And I suggest that large caches are exactly what we do not want in a heavily multi-processor machine. Multi-threaded programs are hard to write and optimize, in large part, because of the cached shared memory model. Cell adopts an explicit model so that main memory access is controlled explicitly by software. The leading objection to this model is that it doesn't work with existing software, but then (almost) everyone agrees that the current model for multi-threading is hugely problematic and error-prone, so perhaps this is actually a very good idea in the long run.



    And drop the idea that the Cell is only a console chip. The first Cell is, yes, but the Cell is an architecture and if IBM has its way, no doubt there will be future variations. Beefier Power cores, more processors, more I/O ports, different memory controllers, etc. are all possible (and the other way too --scaled down instead of up).
  • Reply 36 of 63
    What changes are coming from this development in the industry, where CPU-manufacturers can less and less depend on process-shrinking and Megahertz-adding for gained performance?



    I've read a lot of people slamming the x86 for instance, calling it crufty, aged and inelegant, and that development resources would be better spent elsewhere, to paraphrase John Siracusa.



    Will this development in the industry eventually force such a change and a total redesign of the fundamentals inside our future CPU:s?
  • Reply 37 of 63
    powerdocpowerdoc Posts: 8,123member
    Quote:

    Originally posted by Programmer

    You do realize that, aside from the large caches, you basically described the Cell, right? And I suggest that large caches are exactly what we do not want in a heavily multi-processor machine. Multi-threaded programs are hard to write and optimize, in large part, because of the cached shared memory model. Cell adopts an explicit model so that main memory access is controlled explicitly by software. The leading objection to this model is that it doesn't work with existing software, but then (almost) everyone agrees that the current model for multi-threading is hugely problematic and error-prone, so perhaps this is actually a very good idea in the long run.



    And drop the idea that the Cell is only a console chip. The first Cell is, yes, but the Cell is an architecture and if IBM has its way, no doubt there will be future variations. Beefier Power cores, more processors, more I/O ports, different memory controllers, etc. are all possible (and the other way too --scaled down instead of up).




    Perhaps large cache are nightmare, but for today software, large cache lead to better performances.

    Cell architecture is perhaps better but recquiere a complete rewriting of all softwares.
  • Reply 38 of 63
    programmerprogrammer Posts: 3,458member
    Quote:

    Originally posted by Powerdoc

    Perhaps large cache are nightmare, but for today software, large cache lead to better performances.

    Cell architecture is perhaps better but recquiere a complete rewriting of all softwares.




    Most of today's software doesn't leverage multi-core as it is, so the difference isn't as big as you make it out to be. It would be straightforward for IBM to build a Cell with 2 Power cores in addition to all the vector processors, and that would run all the dual core oriented software just fine. It has been repeatedly demonstrated, however, that that kind of software does not scale well to more processors. Refactoring the design to the Cell model will scale further, and will likely scale across distributed networks of machines as well (the Cell chip is essentially a distributed network on a chip). When chips with 10-100 processors arrive it is the software that has been reformated for that model which will succeed.
  • Reply 39 of 63
    powerdocpowerdoc Posts: 8,123member
    Quote:

    Originally posted by Programmer

    Most of today's software doesn't leverage multi-core as it is, so the difference isn't as big as you make it out to be. It would be straightforward for IBM to build a Cell with 2 Power cores in addition to all the vector processors, and that would run all the dual core oriented software just fine. It has been repeatedly demonstrated, however, that that kind of software does not scale well to more processors. Refactoring the design to the Cell model will scale further, and will likely scale across distributed networks of machines as well (the Cell chip is essentially a distributed network on a chip). When chips with 10-100 processors arrive it is the software that has been reformated for that model which will succeed.



    Your arguments seems convincing, but why Apple did not choose the cell architecture ?. If my memory is correct, Apple considered this choice.
  • Reply 40 of 63
    Knowing next to nothing about coding and the cell versus Intel debate, I nonetheless suspect that Apple looked into the near future and saw that Intel could come through with low-heat chips for laptops, indefinitely. Programmer's model clearly leaves open the possibility that Apple might someday "return" to the nascent IBM approach (a la cell), perhaps at a time when Apple's presumed greater market share will make the software optimization process less onerous and more financially attractive to developers. In other words, ten or fifteen years down the line, who can say?



    But returning to real world speed advances in the next five years or so, I understand Programmer's reluctance to make any hard predictions - particularly considering all of the variables involved - but I can't help but enjoy the sound of "having 10-50 cores and a billion transistors on your chip," whatever speed that brings! As the U.S. president shortsightedly said, "Bring it on."



    Merci, Programmer.
Sign In or Register to comment.