First off, let's check out semiaccurate's ancient report of NVidia's Fermi troubles:
http://www.semiaccurate.com/2010/02/...and-unfixable/
This article is from Feb 2010, but you really ought to read the whole thing. Nvidias problems are much greater than this small excerpt implies.
Second, let's see what Fermi does when all the GTX 480 compute power is turned on:
http://en.expreview.com/2010/08/09/w...-480/9070.html
Third, let's consider Intel and AMD are integrating their own GPU solutions on-die, choking them out of the IGP market entirely. It's an IGP race Intel can't hope to win, but they persist.
Fourth, let's consider rumours of NVidia developing their own CPU, which is supposedly ARM with a Transmeta-inspired x86 front end, and the speed and efficiency that (does not) implies:
http://www.semiaccurate.com/2010/08/...idias-x86-cpu/
The question: Are the odds of NVidia recovering any time soon good? It appears they are fooked for the next 18 months at least. Their GTX 450 scores signifigantly lower than equivalent GPUs from their previous line up. Fermi doesn't have legs, and NV supposedly loses money on each power hungry, expensive 470 and 480 sold.
Outside of the GTX 460, they don't seem to have appealing products in their portfolio.
And the Radeon 5000 series is so full of win, that ATI/AMD could take the next year off and still clean up in the GPU market.
But the Radeon 6000 series debuts in a month...
http://www.semiaccurate.com/2010/02/...and-unfixable/
This article is from Feb 2010, but you really ought to read the whole thing. Nvidias problems are much greater than this small excerpt implies.
Quote:
A really rough measure of yield is that for similar products, the yield goes down by the square of the die size. A 200mm^2 chip can be expected to have 1/4 the yield of a similar 100mm^2 chip, and a 50mm^2 chip will have about 4 times the yield of the 100mm^2 part. Chip makers put lots of redundant structures into every design in order to repair some kinds of fabrication errors, but there are limits.
Each redundancy adds to the area of the design, so the base cost for the chip is higher. Semiconductor manufacturing is a series of complex tradeoffs, and the cost for redundant area versus yield is one of the simpler ones. If you plan right, you can make very high yielding chips with only a little extra die area.
If things go well, the cost of the redundant area is less than you would lose by not having it there at all. If things go badly, you get large chips that you can't make at anything close to a viable cost. The AMD K6-III CPU was rumored to be an example of this kind of failure.
Last spring and summer, ATI was not shy about telling people that the lessons learned from the RV740 were fed back into the Evergreen 5000 series chips, and it was a very productive learning experience. One of the deep, dark secrets was that there were via (interconnects between the metal layers on the chip) problems. The other was that the TSMC 40nm transistors were quite variable in transistor construction, specifically in the channel length.
Since Anand talked about both problems in his excellent Evergreen history article, any promises to keep this secret are now a moot point. What ATI did with Evergreen was to put two vias in instead of one. It also changed transistor designs and layout to mitigate the variances. Both of these cost a lot of area, and likely burn a more than negligible amount of energy, but they are necessary.
Nvidia on the other hand did not do their homework at all. In its usual 'bull in a china shop' way, SemiAccurate was told several times that the officially blessed Nvidia solution to the problem was engineering by screaming at people. Needless to say, while cathartic, it does not change chip design or the laws of physics. It doesn't make you friends either.
By the time Nvidia found out about the problems, it was far too late to implement them in Fermi GF100. Unless TSMC pulled off a miracle, the design was basically doomed.
Why? GF100 is about 550mm^2 in size, slightly larger than we reported after tapeout. Nvidia ran into severe yield problems with a 100mm^2 chip, a 3 month delay with a 139mm^2 chip, and had to scrap any larger designs due to a complete inability to manufacture them. Without doing the homework ATI did, it is now trying to make a 550mm^2 part.
Basic math says that the GF100 is a hair under 4 times as large as the G215, and they are somewhat similar chips, so you can expect GF100 yields to be around 1/16th that of the smaller part. G215 is not yielding well, but even if it was at a 99 percent yield, you could expect Fermi GF100 to have single digit percentage yields. Last time we heard hard numbers, the G215 was not yielding that high.
Each redundancy adds to the area of the design, so the base cost for the chip is higher. Semiconductor manufacturing is a series of complex tradeoffs, and the cost for redundant area versus yield is one of the simpler ones. If you plan right, you can make very high yielding chips with only a little extra die area.
If things go well, the cost of the redundant area is less than you would lose by not having it there at all. If things go badly, you get large chips that you can't make at anything close to a viable cost. The AMD K6-III CPU was rumored to be an example of this kind of failure.
Last spring and summer, ATI was not shy about telling people that the lessons learned from the RV740 were fed back into the Evergreen 5000 series chips, and it was a very productive learning experience. One of the deep, dark secrets was that there were via (interconnects between the metal layers on the chip) problems. The other was that the TSMC 40nm transistors were quite variable in transistor construction, specifically in the channel length.
Since Anand talked about both problems in his excellent Evergreen history article, any promises to keep this secret are now a moot point. What ATI did with Evergreen was to put two vias in instead of one. It also changed transistor designs and layout to mitigate the variances. Both of these cost a lot of area, and likely burn a more than negligible amount of energy, but they are necessary.
Nvidia on the other hand did not do their homework at all. In its usual 'bull in a china shop' way, SemiAccurate was told several times that the officially blessed Nvidia solution to the problem was engineering by screaming at people. Needless to say, while cathartic, it does not change chip design or the laws of physics. It doesn't make you friends either.
By the time Nvidia found out about the problems, it was far too late to implement them in Fermi GF100. Unless TSMC pulled off a miracle, the design was basically doomed.
Why? GF100 is about 550mm^2 in size, slightly larger than we reported after tapeout. Nvidia ran into severe yield problems with a 100mm^2 chip, a 3 month delay with a 139mm^2 chip, and had to scrap any larger designs due to a complete inability to manufacture them. Without doing the homework ATI did, it is now trying to make a 550mm^2 part.
Basic math says that the GF100 is a hair under 4 times as large as the G215, and they are somewhat similar chips, so you can expect GF100 yields to be around 1/16th that of the smaller part. G215 is not yielding well, but even if it was at a 99 percent yield, you could expect Fermi GF100 to have single digit percentage yields. Last time we heard hard numbers, the G215 was not yielding that high.
Second, let's see what Fermi does when all the GTX 480 compute power is turned on:
http://en.expreview.com/2010/08/09/w...-480/9070.html
Quote:
According to the test, the full spec’ed 512SP GTX 480 just brings a performance improvement of no more than 6% over the 480SP GTX 480.
The power consumption and temperature of GF100 have become NVIDIA’s headache. Even without overclocking, the full-load power consumption reached a horrible 644W, which was 204W than the 480SP model with the same clocks. The full-load temperature was 94℃ even with help of the outstanding Accelero Xtreme Plus cooling solution. But its noise was lower than the reference model.
The power consumption and temperature of GF100 have become NVIDIA’s headache. Even without overclocking, the full-load power consumption reached a horrible 644W, which was 204W than the 480SP model with the same clocks. The full-load temperature was 94℃ even with help of the outstanding Accelero Xtreme Plus cooling solution. But its noise was lower than the reference model.
Third, let's consider Intel and AMD are integrating their own GPU solutions on-die, choking them out of the IGP market entirely. It's an IGP race Intel can't hope to win, but they persist.
Fourth, let's consider rumours of NVidia developing their own CPU, which is supposedly ARM with a Transmeta-inspired x86 front end, and the speed and efficiency that (does not) implies:
http://www.semiaccurate.com/2010/08/...idias-x86-cpu/
Quote:
On the technical side, the problem is simple, speed. ARM A9 CPUs are great for phone level applications, and can reach into the current tablet space, but hit a glass ceiling there. If Eagle doubles the performance per MHz and doubles performance per watt, it will basically be on par with the low end of the Atom-class CPUs, and woefully behind the Nano/Bobcat level of performance.
The question: Are the odds of NVidia recovering any time soon good? It appears they are fooked for the next 18 months at least. Their GTX 450 scores signifigantly lower than equivalent GPUs from their previous line up. Fermi doesn't have legs, and NV supposedly loses money on each power hungry, expensive 470 and 480 sold.
Outside of the GTX 460, they don't seem to have appealing products in their portfolio.
And the Radeon 5000 series is so full of win, that ATI/AMD could take the next year off and still clean up in the GPU market.
But the Radeon 6000 series debuts in a month...







