Programmer, I am not saying you are wrong but, why do you say there is an ideal size and what is it? I have never heard that before not that I know any better.
Some time ago I read a lengthy discussion on the topic and it included a proper explanation. I have completely forgotten it, however, so I can't provide any direct evidence. The editor of the Microprocessor Report wrote a piece last week on the 970 and, IIRC, he stated there that the 970 was pretty much the "ideal size". It probably has to do with the density of the heat source, connections, number of die in a wafer, etc.
Programmer, I am not saying you are wrong but, why do you say there is an ideal size and what is it? I have never heard that before not that I know any better.
The sort of "high level" explanation is that you design a chip assuming a certain process size - that gates will be so long, wires so wide, etc. The chip, obviously, will run best when everything is as big, long, and wide as the designer intended it to be. When electronic circuits shrink, everything gets closer together, which means that timing issues and current leak and quantum tunneling and magnetic fields and any number of other bugaboos start reducing the ability of the chip to function. (What silicon-on-insulator, copper interconnects, SiLK, and all the other process tweaks you hear about all do, in a nutshell, is make it easier for current to go where it's supposed to go, rather than tunnelling around through random locations, or leaking out of the wires.)
It's possible to overcome these issues by revising the design and/or the process to reduce leakage, fix timing bugs, etc. but as with all design, you get diminishing returns, and eventually trying to fab the CPU on a too-small process becomes not unlike trying to knit lace with yarn. So you start with a clean slate and design with the new process in mind, using finer thread and smaller needles, as it were.
The jump from 180nm to 130nm is particularly hairy by all accounts, and 90nm is fraught with perils as well (although we have it from a Motorola executive that the issues involved in fabbing at 130nm and at 90nm are not so different).
Some time ago I read a lengthy discussion on the topic and it included a proper explanation. I have completely forgotten it, however, so I can't provide any direct evidence. The editor of the Microprocessor Report wrote a piece last week on the 970 and, IIRC, he stated there that the 970 was pretty much the "ideal size". It probably has to do with the density of the heat source, connections, number of die in a wafer, etc.
I'll add on to Programmers great start with another item:
Pads. Pads connect to the pins that carry data, addresses and power into and out of the chip. These pads take alot of room, and of course the more pins on a computer chip the more pads.
There is a minimum amount of space the pads require, so that is the minimum size the die area of the chip will occupy, no matter how few transistors are on the chip. Any area of the chip that does not have transistors, but is part of the minimum die size necessary to hold the pads is wasted space.
Any transistor that does useful work and put into that space is essentially "free", minus design cost. Useful work here could be extra L2 cache, cache tags, a higher associativity cache design, logic circuits (simple integer units anyone?) or perhaps another level of cache or another type of private memory for the chip (space for register sets for faster task switching).
So for the die shrink of the 970 (lets call it the 970+) it might be economically cheaper to increase the cache to reduce the wasted space in the processor.
The 970 is here, and here for awhile. Don't expect the 980 [whatever it will be called] to arrive far after the POWER5 chip is up and running in full production status.
Personally, I'm expecting a somewhat smaller gap between the introduction of the 980 after the POWER5 than there was between the 970 and POWER4.
-- The POWER5 is targetted at lower-end hardware than the POWER4
-- IBM's engineers now have a bit more hands-on experience at scaling down POWER chips to desktop variants
-- The POWER4 was presumably designed with no particular thought about eventually deriving a desktop version, but the POWER5 ought to be a different kettle of fish
Therefore, we may see the 980 a fair bit sooner than we had dared to hope. If there was a three year gap between the 970 and POWER4, I'll guess that there will just be a two year gap between the 980 and POWER5.
As a consequence, I don't think we'll see dual-cored 970s. We'll get a minor speedbump for the 970s down the track as the current process matures, a new 90nm process, and then a minor speedbump as that process matures. Then it's time for the G6, baby!
Therefore, we may see the 980 a fair bit sooner than we had dared to hope. If there was a three year gap between the 970 and POWER4, I'll guess that there will just be a two year gap between the 980 and POWER5.
IBM uses a design process which is heavily reliant on advanced circuit layout software. This allows them to put together a design much more efficiently (in terms of manpower) than Intel currently can. It does mean they don't get the advantages of "hand tweaking" the design, but to me this is directly analogous to the shift from writing software in assembly language to writing in a higher level language -- eventually the machines (or process in this case) becomes so fast that what you do at the higher level matters far more than at the low level. Designer productivity means more effort can be spent getting the overall design right, and changes are easier to make so the R&D cycle can make many more large scale improvements in the design than if you are having to build the low level parts manually and readjusting anything is a huge amount of work. At first there are always the skeptical or specialists at hand coding assembly, but eventually they realize that they've lost and go work on obscure projects or small pieces of the bigger project where their skills are still appropriate.
The reason I tell you all this is that one thing computer design tools are usually good at is applying a large-scale sweeping change according to some specified set of rules (search & replace on text, is a trivial example). If during the development of the 970 they developed a "transformation" from POWER4 layout to 970 layout then during POWER5 development they might be able to get 90+% of the way to the 980 by pushing a button. If the design is assembled in modules, then plugging on the 970's AltiVec unit might come with just another button click. And then that team of excellent IBM engineers can sit down and focus on optimizing the thing rather than just building it. This is where really great stuff can happen.
Since IBM claims POWER5 on the market by Q2 2004, as pointed out in that other thread here, it might not be impossible that we'll see the 980 in 2004. Before Dec 31st.
Apart from the technical interest and bragging-rights appeal of all this, I find it very comforting to hear that our new desktop CPU supplier du jour is adopting such professional strategies. Not wanting to name names, I'll just say that this is a refreshing change from such things as, um, running dirty fabs to cut costs. But let's not go there!
I believe the new 9x0 is being designed concurrently with the Power5. Should IBM adopt this rational design approach, the Power5 and 9x0 will be variations of the same design. This may reduce the time to market considerably.
The 970 was a redesign and adaptation of the Power4 and considerable reengineering was necessary. This took time. It seems possible that future Powerx and 9x0 CPUs will be introduced simultaneously.
This technical convergence of CPUs also suggests a marketing and OS convergence.
Heavy iron servers and workstations for big business from IBM and light iron servers, high performance desktops and portables for small business and individuals from Apple; all using OSX with both IBM and Apple systems using compatible software.
As for why the VMX unit was done after the fashion of the old G4, it was a time-to-market thing. "What app developers will find is this will still perform pretty well. There wasn't a great advantage to doing it the other way."
But it is something that they're looking at changing. It's not certain that it'll change, but changing it is on the table. (Note that they absolutely could not speak in either specifics or generalities about the future of the chip, so even if it was for sure that they'd change it they'd pretty much have to give the answer that they gave me.)
Perhaps we'll see some slight chip modification during the 970+ stage [process shrink]. At 90nm, I imagine they'd have to re-place [move] some units to new locations to counter any heat-related/latency related problems they would encounter, so why not add/change some units.
The initial blade server will be based on the Power PC 970 processor (known internally as the GPUL), which made its debut this month in Apple Computer Inc.'s Power Mac G5 line. A mid-2004 replacement for the blade as well as the ULE products will run on an updated version of that chip, known as the GPUL2.
Well since IBM is adding Auto Vectorization to GCC. I expect nothing less than a kickass Altivec unit with acceleration for Double Floats. That's all I ask.
Well since IBM is adding Auto Vectorization to GCC. I expect nothing less than a kickass Altivec unit with acceleration for Double Floats. That's all I ask.
You won't get it. As noted in some new Apple documentation, if you want to work on vectors of doubles just use the scalar FPUs on the 970. They are as fast as double-precision added to AltiVec would be, and more flexible. Plus if you feel like it you can use the AltiVec unit in parallel.
Comments
Originally posted by Kurt
Programmer, I am not saying you are wrong but, why do you say there is an ideal size and what is it? I have never heard that before not that I know any better.
Some time ago I read a lengthy discussion on the topic and it included a proper explanation. I have completely forgotten it, however, so I can't provide any direct evidence. The editor of the Microprocessor Report wrote a piece last week on the 970 and, IIRC, he stated there that the 970 was pretty much the "ideal size". It probably has to do with the density of the heat source, connections, number of die in a wafer, etc.
Originally posted by Kurt
Programmer, I am not saying you are wrong but, why do you say there is an ideal size and what is it? I have never heard that before not that I know any better.
The sort of "high level" explanation is that you design a chip assuming a certain process size - that gates will be so long, wires so wide, etc. The chip, obviously, will run best when everything is as big, long, and wide as the designer intended it to be. When electronic circuits shrink, everything gets closer together, which means that timing issues and current leak and quantum tunneling and magnetic fields and any number of other bugaboos start reducing the ability of the chip to function. (What silicon-on-insulator, copper interconnects, SiLK, and all the other process tweaks you hear about all do, in a nutshell, is make it easier for current to go where it's supposed to go, rather than tunnelling around through random locations, or leaking out of the wires.)
It's possible to overcome these issues by revising the design and/or the process to reduce leakage, fix timing bugs, etc. but as with all design, you get diminishing returns, and eventually trying to fab the CPU on a too-small process becomes not unlike trying to knit lace with yarn. So you start with a clean slate and design with the new process in mind, using finer thread and smaller needles, as it were.
The jump from 180nm to 130nm is particularly hairy by all accounts, and 90nm is fraught with perils as well (although we have it from a Motorola executive that the issues involved in fabbing at 130nm and at 90nm are not so different).
Originally posted by Programmer
Some time ago I read a lengthy discussion on the topic and it included a proper explanation. I have completely forgotten it, however, so I can't provide any direct evidence. The editor of the Microprocessor Report wrote a piece last week on the 970 and, IIRC, he stated there that the 970 was pretty much the "ideal size". It probably has to do with the density of the heat source, connections, number of die in a wafer, etc.
I'll add on to Programmers great start with another item:
Pads. Pads connect to the pins that carry data, addresses and power into and out of the chip. These pads take alot of room, and of course the more pins on a computer chip the more pads.
There is a minimum amount of space the pads require, so that is the minimum size the die area of the chip will occupy, no matter how few transistors are on the chip. Any area of the chip that does not have transistors, but is part of the minimum die size necessary to hold the pads is wasted space.
Any transistor that does useful work and put into that space is essentially "free", minus design cost. Useful work here could be extra L2 cache, cache tags, a higher associativity cache design, logic circuits (simple integer units anyone?) or perhaps another level of cache or another type of private memory for the chip (space for register sets for faster task switching).
So for the die shrink of the 970 (lets call it the 970+) it might be economically cheaper to increase the cache to reduce the wasted space in the processor.
Originally posted by visigothe
The 970 is here, and here for awhile. Don't expect the 980 [whatever it will be called] to arrive far after the POWER5 chip is up and running in full production status.
Personally, I'm expecting a somewhat smaller gap between the introduction of the 980 after the POWER5 than there was between the 970 and POWER4.
-- The POWER5 is targetted at lower-end hardware than the POWER4
-- IBM's engineers now have a bit more hands-on experience at scaling down POWER chips to desktop variants
-- The POWER4 was presumably designed with no particular thought about eventually deriving a desktop version, but the POWER5 ought to be a different kettle of fish
Therefore, we may see the 980 a fair bit sooner than we had dared to hope. If there was a three year gap between the 970 and POWER4, I'll guess that there will just be a two year gap between the 980 and POWER5.
As a consequence, I don't think we'll see dual-cored 970s. We'll get a minor speedbump for the 970s down the track as the current process matures, a new 90nm process, and then a minor speedbump as that process matures. Then it's time for the G6, baby!
8)
Originally posted by boy_analog
Therefore, we may see the 980 a fair bit sooner than we had dared to hope. If there was a three year gap between the 970 and POWER4, I'll guess that there will just be a two year gap between the 980 and POWER5.
IBM uses a design process which is heavily reliant on advanced circuit layout software. This allows them to put together a design much more efficiently (in terms of manpower) than Intel currently can. It does mean they don't get the advantages of "hand tweaking" the design, but to me this is directly analogous to the shift from writing software in assembly language to writing in a higher level language -- eventually the machines (or process in this case) becomes so fast that what you do at the higher level matters far more than at the low level. Designer productivity means more effort can be spent getting the overall design right, and changes are easier to make so the R&D cycle can make many more large scale improvements in the design than if you are having to build the low level parts manually and readjusting anything is a huge amount of work. At first there are always the skeptical or specialists at hand coding assembly, but eventually they realize that they've lost and go work on obscure projects or small pieces of the bigger project where their skills are still appropriate.
The reason I tell you all this is that one thing computer design tools are usually good at is applying a large-scale sweeping change according to some specified set of rules (search & replace on text, is a trivial example). If during the development of the 970 they developed a "transformation" from POWER4 layout to 970 layout then during POWER5 development they might be able to get 90+% of the way to the 980 by pushing a button. If the design is assembled in modules, then plugging on the 970's AltiVec unit might come with just another button click. And then that team of excellent IBM engineers can sit down and focus on optimizing the thing rather than just building it. This is where really great stuff can happen.
Since IBM claims POWER5 on the market by Q2 2004, as pointed out in that other thread here, it might not be impossible that we'll see the 980 in 2004. Before Dec 31st.
Apart from the technical interest and bragging-rights appeal of all this, I find it very comforting to hear that our new desktop CPU supplier du jour is adopting such professional strategies. Not wanting to name names, I'll just say that this is a refreshing change from such things as, um, running dirty fabs to cut costs. But let's not go there!
The 970 was a redesign and adaptation of the Power4 and considerable reengineering was necessary. This took time. It seems possible that future Powerx and 9x0 CPUs will be introduced simultaneously.
This technical convergence of CPUs also suggests a marketing and OS convergence.
Heavy iron servers and workstations for big business from IBM and light iron servers, high performance desktops and portables for small business and individuals from Apple; all using OSX with both IBM and Apple systems using compatible software.
This could get interesting.
As for why the VMX unit was done after the fashion of the old G4, it was a time-to-market thing. "What app developers will find is this will still perform pretty well. There wasn't a great advantage to doing it the other way."
But it is something that they're looking at changing. It's not certain that it'll change, but changing it is on the table. (Note that they absolutely could not speak in either specifics or generalities about the future of the chip, so even if it was for sure that they'd change it they'd pretty much have to give the answer that they gave me.)
Perhaps we'll see some slight chip modification during the 970+ stage [process shrink]. At 90nm, I imagine they'd have to re-place [move] some units to new locations to counter any heat-related/latency related problems they would encounter, so why not add/change some units.
The initial blade server will be based on the Power PC 970 processor (known internally as the GPUL), which made its debut this month in Apple Computer Inc.'s Power Mac G5 line. A mid-2004 replacement for the blade as well as the ULE products will run on an updated version of that chip, known as the GPUL2.
IBM product finally surfaces--somewhat.
Originally posted by hmurchison
Well since IBM is adding Auto Vectorization to GCC. I expect nothing less than a kickass Altivec unit with acceleration for Double Floats. That's all I ask.
You won't get it. As noted in some new Apple documentation, if you want to work on vectors of doubles just use the scalar FPUs on the 970. They are as fast as double-precision added to AltiVec would be, and more flexible. Plus if you feel like it you can use the AltiVec unit in parallel.