or Connect
AppleInsider › Forums › Mac Hardware › Future Apple Hardware › 970GX and low power 970s for PowerBooks
New Posts  All Forums:Forum Nav:

970GX and low power 970s for PowerBooks

post #1 of 44
Thread Starter 
From Think Secret:

Since our July report on Antares, the forthcoming dual-core PowerPC 970MP processor, sources have provided Think Secret with additional notes regarding IBM's PowerPC development and the direction it might take Apple.

The biggest news is that Antares will also be available in a single-core version, code-named AntaresSP, which is expected to be named the PowerPC 970GX. At present, Apple's dual-2.5GHz Power Mac G5 uses the PowerPC 970FX processor. Like Antares, the 970GX will initially come in at speeds around 3GHz and is said to feature 1MB of L2 cache, double what the 970FX processor sports. Like the 970FX, however, the processor will not have any L3 cache.

At present, sources suggest that the 970GX might be ready around the first or second quarter of 2005, while the expected availability of the 970MP remains unknown at this point.

Low-power versions of the PowerPC 970 intended for use in the PowerBook G5 remain in development at speeds between 1.6GHz and 1.8GHZ, but little else is known. Apple's current PowerBook line-up hasn't been updated in seven months, however, suggesting a revised model could arrive as soon as Macworld Expo San Francisco in January.

If the PowerBook G5 isn't ready then, sources say Apple may turn to Freescale's PowerPC 7448, which is said to be almost finished. The processor is pin compatible with the PowerPC 7447A used in current PowerBooks, will exceed 1.5GHz, and feature 1MB of L2 cache, double the amount of the 7447A. The PowerPC 7448 will also be manufactured on 90nm silicon-on-insulator process technology, delivering improved power savings over the 130nm 7447A.


Linky
...we have assumed control
Reply
...we have assumed control
Reply
post #2 of 44
Interesting.

However I doubt that this chip will be clocked around 3 ghz.
IBM have huge problems producing 2,5 ghz G5, worse this chip have to be watercooled. Some sources said that without watercooling the 2,5 g5 reach temperatures above 90 ° Celsius.
post #3 of 44
Maybe they added a few extra pipelines. Any news on 256bit Altivec 2?
post #4 of 44
Quote:
Originally posted by MarcUK
Maybe they added a few extra pipelines. Any news on 256bit Altivec 2?

The problem with 256 bit altivec is to find a way to feed that beast. I don't think that many memory controller are able to feed such a thing.
post #5 of 44
Quote:
Originally posted by Powerdoc
Interesting.

However I doubt that this chip will be clocked around 3 ghz.
IBM have huge problems producing 2,5 ghz G5, worse this chip have to be watercooled. Some sources said that without watercooling the 2,5 g5 reach temperatures above 90 ° Celsius.

Without any cooling, it would reach well above 90C.

The water-cooling unit is designed rather poorly as a strict CPU cooling device, but that's not its only function...
I can change my sig again!
Reply
I can change my sig again!
Reply
post #6 of 44
Quote:
Originally posted by Eugene
The water-cooling unit is designed rather poorly as a strict CPU cooling device, but that's not its only function...

Please elaborate!
post #7 of 44
Quote:
Originally posted by MarcUK
Any news on 256bit Altivec 2?

I hope it doesn't exist.

As I just posted in another thread, introducing an AltiVec variation is mostly a bad idea. It fragments the software base (which program supports what?) and adds uncertainy for developers when aiming for their target platform. The transition to all machines having AltiVec took many years, and even still a lot of software that could and should take advantage of it doesn't because it is not present in all PowerPCs.

Expanding the registers to 256bit is not a particuarly worthwhile exercise, either. First of all, dealing with the granularity of the SIMD registers is always an issue and making that size larger means it is less likely you can skirt the issue. Second, the context size increases dramatically. Third, the size of the hardware register file becomes enormous (especially if you want to grow the rename pool or mirror the file between units). Fourth, most of the advantage of larger registers can be gained by simply retaining the current size and adding execution units to allow double the instruction throughput... plus this works with existing code and is more flexible in what work it can be doing at the same time. 256bit registers with double precision support would allow 4-way double vectors, but the hardware cost to support that would be huge.
Providing grist for the rumour mill since 2001.
Reply
Providing grist for the rumour mill since 2001.
Reply
post #8 of 44
Quote:
Originally posted by Eugene
Without any cooling, it would reach well above 90C.

The water-cooling unit is designed rather poorly as a strict CPU cooling device, but that's not its only function...

Well you are right, without cooling the chip will simply burn. What I wanted to say that even cooled this chip is pretty hot. With the current fabbing process the only way to have a 3 ghz PPC 970 FX is to cryogenise it
post #9 of 44
So tempted to buy a new powerbook SHould i wait ? or buy the 1.5g4 ?
The box said win9x or better so i bought a
mac.
Reply
The box said win9x or better so i bought a
mac.
Reply
post #10 of 44
Quote:
Originally posted by Powerdoc
... the only way to have a 3 ghz PPC 970 FX is to cryogenise it

Ummn... How about using this:



Quote:
Introduction...
Peltier devices, also known as thermoelectric (TE) modules, are small
solid-state devices that function as heat pumps. A "typical" unit is a few
millimeters thick by a few millimeters to a few centimeters square. It is
a sandwich formed by two ceramic plates with an array of small Bismuth
Telluride cubes ("couples") in between. When a DC current is applied heat
is moved from one side of the device to the other - where it must be removed
with a heatsink. The "cold" side is commonly used to cool an electronic
device such as a microprocessor or a photodetector. If the current is
reversed the device makes an excellent heater.


As with any device, TE modules work best when applied properly. They are not
meant to serve as room air conditioners. They are best suited to smaller
cooling applications, although they are used in applications as large as
portable picnic-type coolers. They can be stacked to achieve lower
temperatures, although reaching cryogenic temperatures would require great care.
They are not very "efficient" and can draw amps of power. This disadvantage is
more than offset by the advantages of no moving parts, no Freon refrigerant, no
noise, no vibration, very small size, long life, capability of precision
temperature control, etc.
OSX + Duals, Quads & Octos = World Domination
Reply
OSX + Duals, Quads & Octos = World Domination
Reply
post #11 of 44
Quote:
Originally posted by Aphelion
Ummn... How about using this:


I have a small Peltier fridge in my office : no noise at all, but beware to do not touch the back of this fridge : it's insanely hot.
post #12 of 44
Quote:
Originally posted by Programmer
I hope it doesn't exist.

As I just posted in another thread, introducing an AltiVec variation is mostly a bad idea. It fragments the software base (which program supports what?) and adds uncertainy for developers when aiming for their target platform. The transition to all machines having AltiVec took many years, and even still a lot of software that could and should take advantage of it doesn't because it is not present in all PowerPCs.

Expanding the registers to 256bit is not a particuarly worthwhile exercise, either. First of all, dealing with the granularity of the SIMD registers is always an issue and making that size larger means it is less likely you can skirt the issue. Second, the context size increases dramatically. Third, the size of the hardware register file becomes enormous (especially if you want to grow the rename pool or mirror the file between units). Fourth, most of the advantage of larger registers can be gained by simply retaining the current size and adding execution units to allow double the instruction throughput... plus this works with existing code and is more flexible in what work it can be doing at the same time. 256bit registers with double precision support would allow 4-way double vectors, but the hardware cost to support that would be huge.

I hope it does exist!

Altivec has already changed 3 times, the original 7400, the 7450 and 970 have different architectures, IF a 256bit altivec was backwards compatible, I see no problem. Intel's SSE has changed 3 times,

4 way 64 bit vectors would be insanely cool for 3d modelling and gaming, even if the bandwidth was the limiting factor.

Were talking progress, nothing stands still in computing. We'll have the room at 65nm. Its just up to the programmers to work out how to use it. Glad im not a programmer
post #13 of 44
The drawback to a peltier cooling system is that it works so well at pumping heat from the "cold side" to the "hot side" that it can cause heat accumulation problems in normal heat sinks. This is why most peltier equipped computer systems must have water cooling to remove the excess heat.

Of course the Powermac already has the water cooling needed to take care of this problem. Adding a peltier system to the top of the line Powermac would be a trivial engineering exercise for Apple.

If Apple wants a 3 GHz 970FX Powermac they can do it anytime they want.
OSX + Duals, Quads & Octos = World Domination
Reply
OSX + Duals, Quads & Octos = World Domination
Reply
post #14 of 44
Quote:
Originally posted by Powerdoc
Well you are right, without cooling the chip will simply burn. What I wanted to say that even cooled this chip is pretty hot. With the current fabbing process the only way to have a 3 ghz PPC 970 FX is to cryogenise it

You're forgetting that Intels latest 3.8 chucks out far more heat than a 970, and is air cooled. Apple does it mostly for the quietness and elegance. 970's could be aircooled if we accepted the noise of a 120mm fan.
post #15 of 44
Quote:
Originally posted by MarcUK
Altivec has already changed 3 times, the original 7400, the 7450 and 970 have different architectures, IF a 256bit altivec was backwards compatible, I see no problem. Intel's SSE has changed 3 times.

The implementation changed, but the interface did not. AltiVec is still essentially unchanged from its original definition. There are lots of problems with extending an ISA -- just look at the horrible mess Intel has with its 3 version of SSE... that's a perfect example of why its a bad idea.

Quote:

4 way 64 bit vectors would be insanely cool for 3d modelling and gaming, even if the bandwidth was the limiting factor.

Were talking progress, nothing stands still in computing. We'll have the room at 65nm. Its just up to the programmers to work out how to use it. Glad im not a programmer

The 64-bit vectors aren't as "insanely cool" as you seem to think, especially when you consider the cost of what you're giving up by adding this extension.

Progress is important, but random and whimsical progress is just expensive and distruptive. Not everything needs to change, and if you avoid changing interface (i.e. ISA) then you avoid alienating your legacy and your developers... which is a good thing.

Its obvious you're not a developer. Software is always trailing behind hardware, and partially because of that non-standard hardware features have a strong tendency to get ignored. One of Apple's strengths has been its relatively stable hardware platform -- it is far more consistent that the huge breadth of machines in the x86 world. Their market is too small to risk fragmenting it unless the potential reward is massive. Double precision and/or double width registers is not nearly compelling enough to warrant the cost of development and deployment.

We should probably take any further AV2 discussion into the other thread where it is being discussed.
Providing grist for the rumour mill since 2001.
Reply
Providing grist for the rumour mill since 2001.
Reply
post #16 of 44
It would seem that AMD has just applied for a patent to "cryogenise" (as Powerdoc coined the term) their futre CPU's:

AMD Drives Integrated Cooling into Chips

Quote:
Advanced Micro Devices, one of the worlds leading makers of central processing units, has patented a technology that would allow the chipmaker to use so-called Peltier cooler with its future microprocessors for better heat dissipation and more efficient cooling of future chips.

Various embodiments of a semiconductor-on-insulator substrate incorporating a Peltier effect heat transfer device and methods of fabricating the same are provided. In one aspect, a circuit device is provided that includes an insulating substrate, a semiconductor structure positioned on the insulating substrate and a Peltier effect heat transfer device coupled to the insulating substrate to transfer heat between the semiconductor structure and the insulating substrate, says an abstract description of U.S. Patent number 6,800,933 submitted by AMD.

My take on this AMD patent is that it is for imbedding the Peltier cooling element into the chip itself! Powered from the chip's onboard circuitry it would just need an appropriate heat sink (read that as very efficient) to be implemented by system designers.

The patent further mentions "islands" of discrete circuits on the chip that are hotter running than others, with AMD's patented approach is to concentrate peltier nodes over the problem areas.

To me this seems to be an effective counter to the "hot spots" than have plagued the FX970, and prevented the "legs" that Steve Jobs alluded to when he announced the G5.

IBM and AMD have partnered on development of advanced CPU's and I'll bet that they (IBM) can use this technology if they wish.

More on Peltier Coolers
OSX + Duals, Quads & Octos = World Domination
Reply
OSX + Duals, Quads & Octos = World Domination
Reply
post #17 of 44
More on the cryonization of future chips from OSnews :

Quote:
... Another possibility would be to use a technique Intel plan to use for the next Itanium "Montecito," which includes two peltiers in the heat sink. Peltiers actually consume quite a bit of power themselves but reducing the CPU temperature reduces transistor leakage, this lowers the power consumed by the CPU itself allowing boosts in clock frequency which might not otherwise be possible.

Montecito is expected to consume 100 Watts but its heat sink requires a further 75 watts. The end effect is overall power consumption does not change (it may even go up) as part if moved to the heat sink but the CPU itself does not get so hot when working. AMD have filed a patent on an on-chip peltier so they're evidently considering similar technology.

I don't know if the 9x0 will be so hot as to require such aggressive cooling but things are heading that way. "Power density" is becoming a problem and will seemingly only get worse in the future. Power density is the heat generated in a specific area; as CPUs get ever smaller the heat is generated in a smaller area and thus the unit becomes progressively more difficult to cool. The 970FX used in Apple's PowerMacs actually uses less power than the previous 970 but liquid cooling was added because of the higher power density.

It was news to me that the "Monticito" was designed to use two peltier coolers in the heat sink. I think the integrated peltier just patented by AMD is a better solution.

IP sharing agreements between AMD and IBM could bring this technology to a Mac near you in the future.
OSX + Duals, Quads & Octos = World Domination
Reply
OSX + Duals, Quads & Octos = World Domination
Reply
post #18 of 44
Quote:
Originally posted by Programmer
The implementation changed, but the interface did not. AltiVec is still essentially unchanged from its original definition. There are lots of problems with extending an ISA -- just look at the horrible mess Intel has with its 3 version of SSE... that's a perfect example of why its a bad idea.

SSE is not a perfect example of anything. On the other hand the many generations of i86 processors from both Intel and AMD show that it is perfectly possible to extend an ISA and move that ISA forward. This idea that adding new instructions to Altvec or adding additional register width, will suddenly cause huge problem for developers is unwarranted. I mean it is like saying that the move to PPC caused problems for developers. Yeah maybe a little but the payoffs have been worthwhile.
Quote:

The 64-bit vectors aren't as "insanely cool" as you seem to think, especially when you consider the cost of what you're giving up by adding this extension.

Again one has to wonder what is beign given up here?
Quote:

Progress is important, but random and whimsical progress is just expensive and distruptive. Not everything needs to change, and if you avoid changing interface (i.e. ISA) then you avoid alienating your legacy and your developers... which is a good thing.

Agian we have a big pile of disinformation. Extended properly nothing would change from the developers standpoint except for the additional capability. Frankly everything needs to change. Look at it this way has TI or any of the other DSP suppliers given up on improving DSP chips.


Quote:

Its obvious you're not a developer. Software is always trailing behind hardware, and partially because of that non-standard hardware features have a strong tendency to get ignored. One of Apple's strengths has been its relatively stable hardware platform -- it is far more consistent that the huge breadth of machines in the x86 world. Their market is too small to risk fragmenting it unless the potential reward is massive. Double precision and/or double width registers is not nearly compelling enough to warrant the cost of development and deployment.

One aspect of the supposed frequency scaling problems is that manufactures have to look at enhancements to their processors to derive additional performance. They can not ignore ideas with potential big payoffs just to keep a certain element in the customer base happy. Ludites are all around us, it is a shame that they have infiltrated the programming world.
Quote:
We should probably take any further AV2 discussion into the other thread where it is being discussed.
post #19 of 44
Quote:
Originally posted by wizard69
SSE is not a perfect example of anything. On the other hand the many generations of i86 processors from both Intel and AMD show that it is perfectly possible to extend an ISA and move that ISA forward. This idea that adding new instructions to Altvec or adding additional register width, will suddenly cause huge problem for developers is unwarranted. I mean it is like saying that the move to PPC caused problems for developers. Yeah maybe a little but the payoffs have been worthwhile.

PPC transition was a huge headache, but the payoff has enormous. The 680x0 had no future, had fallen way behind x86, and it brought IBM into the mix. If these advantages hadn't existed the platform would have died if Apple tried to force the transition onto developers.

You're looking at the extensions to x86 and seeing that, yes, they successfully added instructions. Whoopy-do. I'm not saying that instructions can't be added, I'm saying that they have a negative effect on the software development for the platform. If a developer wants to use the new instructions (and that's why you put them there) then they either have to abandon the installed base (typically not a wise move) or they have to build, test & support two or more versions of the software (a pain even in the easiest case, real agony in the case of carefully crafted streaming vector code). Or you don't use the extensions, which is the normal developer response if the win isn't big enough. If nobody uses the extensions then the hardware developer just wasted a potentially huge amount of chip real-estate on something nobody is using and that could have been used for something that benefits everyone.

Quote:
Frankly everything needs to change. Look at it this way has TI or any of the other DSP suppliers given up on improving DSP chips.

What is this obsession with changing everything? Should next year's cars all come with 5 pedals and square steering wheels? Change should happen for a reason and benefit, not because you have some compulsion.

Quote:
One aspect of the supposed frequency scaling problems is that manufactures have to look at enhancements to their processors to derive additional performance. They can not ignore ideas with potential big payoffs just to keep a certain element in the customer base happy.

Absolutely, and I'm not saying they should ignore ideas. Any design change and hardware feature has a real and significant cost, and this must be weighed against the potential benefits. What I am telling you is that the win from the AltiVec2 enhancements commonly suggested (256bit & double precision) is that they don't come anywhere near justifying their hardware expense and the impact on the platform. Much more effective changes can be made without changing the instruction set -- SMT, IMC, bigger caches, more execution units, more cores, etc etc etc. Adding expensive hardware features forces software developers to change and it forces support of these features on all future processors of that lineage.
Providing grist for the rumour mill since 2001.
Reply
Providing grist for the rumour mill since 2001.
Reply
post #20 of 44
Quote:
Originally posted by Programmer
PPC transition was a huge headache, but the payoff has enormous. The 680x0 had no future, had fallen way behind x86, and it brought IBM into the mix. If these advantages hadn't existed the platform would have died if Apple tried to force the transition onto developers.

You leave one with the impression that AltVec shuold not have been developed in the first place. After all it is an addition to the original PPC ISA.
Quote:

You're looking at the extensions to x86 and seeing that, yes, they successfully added instructions. Whoopy-do. I'm not saying that instructions can't be added, I'm saying that they have a negative effect on the software development for the platform. If a developer wants to use the new instructions (and that's why you put them there) then they either have to abandon the installed base (typically not a wise move) or they have to build, test & support two or more versions of the software (a pain even in the easiest case, real agony in the case of carefully crafted streaming vector code). Or you don't use the extensions, which is the normal developer response if the win isn't big enough. If nobody uses the extensions then the hardware developer just wasted a potentially huge amount of chip real-estate on something nobody is using and that could have been used for something that benefits everyone.

This mesage being delivered in the above paragraph is just a little to much. I can make a very good argument that developers that did not incoroprate new technologies into their code bases soon see that code base go the way of the dino. Your logic implies tha developer should have never of developed that streaming vector code in the first place and just stuck with the main ALU. It is unfortunate but the reality is that a developer has to choices. One is to concentrate on the installed base and eventually have no one to sell to, the other is to keep his software competitive and desirable in the market place.

AltVec has clear progressed to the point where it benefits everyone. Sure it takes time, just as it took time for developer to make good use of the high performance FPU's that PPC gave us.

The point is that AltVec can be extended without impacting the installed base. It would not be a loosing proposition any more so than the improvements that have been implemented in the FPU.
Quote:
What is this obsession with changing everything? Should next year's cars all come with 5 pedals and square steering wheels? Change should happen for a reason and benefit, not because you have some compulsion.

Good reasons abound not the least of which is the slowing ability to scale processor performance via the traditional ratcheting of frequency. If you can't get the processor to run significantly faster, then it either has to do more per cycle or do more complex operations per cycle. There is alot of potential in the AltVec unit why not try to improve it? Would you take the same attitude with the main ALU and just have stuck with the same number of execution units that where in the 601? I would hope not, the changes made to the G4 and G5 have moved performance forward the same can be done for the vector unit.
Quote:
Absolutely, and I'm not saying they should ignore ideas. Any design change and hardware feature has a real and significant cost, and this must be weighed against the potential benefits. What I am telling you is that the win from the AltiVec2 enhancements commonly suggested (256bit & double precision) is that they don't come anywhere near justifying their hardware expense and the impact on the platform.

Well this can be debated to no end and like most things depends on the software. If we ignore the wider registers and focus on new instructions I don't see where your concerns about hardware expense is justified.
Quote:
Much more effective changes can be made without changing the instruction set -- SMT, IMC, bigger caches, more execution units, more cores, etc etc etc. Adding expensive hardware features forces software developers to change and it forces support of these features on all future processors of that lineage.

Lets see three of those items (SMT, more cores and etc) require that a developer make some pretty signifcant changes to his code to get any advantage out of them. More execution units are what we are talking about here anyways so I'm not sure how you can use that on both sides of the argument. Further some of the items you suggest here are far more expensive, die realestate wise, than enhancements to AltVec. Just SMT itself required IBM to implement a bit of logic on POWER5 in effect you double the number of hardware registers so with AltVec would be as expensive as 256 bit wide registers.

There is nothing wrong with getting developer to change, without change we would all be staring at a Windows 3.1 screen. As to future processors; many of the "features" you suggest would also end up in all derived processors. So is that really a problem?

dave
post #21 of 44
Quote:
Originally posted by wizard69
You leave one with the impression that AltVec shuold not have been developed in the first place. After all it is an addition to the original PPC ISA.

You would only get that impression if you didn't bother reading what I wrote. AltiVec was an excellent addition because they did it right and they did it once. The cost/benefit of adding AV was very good because Apple invested in it heavily (as have other key developers), and it was well designed and implemented.

Quote:
Your logic implies tha developer should have never of developed that streaming vector code in the first place and just stuck with the main ALU. It is unfortunate but the reality is that a developer has to choices. One is to concentrate on the installed base and eventually have no one to sell to, the other is to keep his software competitive and desirable in the market place.

Where do you get that? I thought I was quite clear in saying that developers need to weight the cost against the benefits. AltiVec has been a substantial performance win from the day it was introduced, and as Apple's continuing commitment to it has raised confidence in its longevity (unlike numerous other technologies). As a result more and more developers start using it, and it gains momentum. If there were multiple versions to support (yes, even counting extensions) that process would have to start all over again for each revision.

Quote:
There is alot of potential in the AltVec unit why not try to improve it? Would you take the same attitude with the main ALU and just have stuck with the same number of execution units that where in the 601? I would hope not, the changes made to the G4 and G5 have moved performance forward the same can be done for the vector unit.

Because changing the instruction set isn't necessary. The 601 used the original PPC instruction set and it essentially hasn't changed since then, yet we have the massively faster 970. AltiVec has room to improve without changing the ISA.

Quote:
Lets see three of those items (SMT, more cores and etc) require that a developer make some pretty signifcant changes to his code to get any advantage out of them. More execution units are what we are talking about here anyways so I'm not sure how you can use that on both sides of the argument. Further some of the items you suggest here are far more expensive, die realestate wise, than enhancements to AltVec. Just SMT itself required IBM to implement a bit of logic on POWER5 in effect you double the number of hardware registers so with AltVec would be as expensive as 256 bit wide registers.

Again you are just ignoring what I'm writing. SMT/more cores doesn't require anything different than supporting Apple's existing dual processor machines, and even if you don't do that the user still benefits from multiple threads due to OSX. More execution units doesn't change the ISA. And I'm not talking about saving IBM work... you'll note that POWER5 didn't change the ISA either. Adding SMT to the POWER5 sped up existing software. All existing software.
Providing grist for the rumour mill since 2001.
Reply
Providing grist for the rumour mill since 2001.
Reply
post #22 of 44
Quote:
Originally posted by Programmer
You would only get that impression if you didn't bother reading what I wrote. AltiVec was an excellent addition because they did it right and they did it once. The cost/benefit of adding AV was very good because Apple invested in it heavily (as have other key developers), and it was well designed and implemented.

The problem is I have read what you have written and am bothered by it. It is a very good thing that they did a very good job with the first implementation of AltVec I think everybody can agree with that. The problem is that the world doesn't stop after the applause, just as the rest of PPC has been improved there is an opportunity to do so with the vector unit.
Quote:

Where do you get that? I thought I was quite clear in saying that developers need to weight the cost against the benefits. AltiVec has been a substantial performance win from the day it was introduced, and as Apple's continuing commitment to it has raised confidence in its longevity (unlike numerous other technologies). As a result more and more developers start using it, and it gains momentum. If there were multiple versions to support (yes, even counting extensions) that process would have to start all over again for each revision.

I geuss we will have to disagree here. If you really want to get a buy in with new developers show them that there is a future in the unit. As it is with the 970, vector performance didn't really improve much at all clock for clock with the G4.
Quote:

Because changing the instruction set isn't necessary. The 601 used the original PPC instruction set and it essentially hasn't changed since then, yet we have the massively faster 970. AltiVec has room to improve without changing the ISA.

AltVec certainly can improve, but expanding the instruction set shouldn't be off limit with respect to those improvements. Even the FPU has benefited over time with new instructions, most people have been please with that. As long as the programming model for existing instructions doesn't change then I don't really see a problem.
Quote:
Again you are just ignoring what I'm writing. SMT/more cores doesn't require anything different than supporting Apple's existing dual processor machines, and even if you don't do that the user still benefits from multiple threads due to OSX. More execution units doesn't change the ISA. And I'm not talking about saving IBM work... you'll note that POWER5 didn't change the ISA either. Adding SMT to the POWER5 sped up existing software. All existing software.

Well this is wrong, SMT on Power5 did not speed up all existing code, IBM on their web site reviewed issues where that simply isn't the case. Power5 is faster due to a number of improvements not just SMT. SMT can be a performance negative just like in the Intel world, apparently it is much better than the Intel approach but there are still situations where it just doesn't help.

The point is that AltVec can be extended to speed up future code, without impacting current code. Some of thsoe improvements could be the result of new instructions or data types.

Thanks
Dave
post #23 of 44
I dont understand, if a new Altivec was fully backwards compatible with the old ISA, why would it be a bad thing if the were new instructions for doubles?

Existing Altivec code still runs fine, but new code could do x2 performance, or 64 bit vectors.

Where is the Altivec thread you speak of. Can't find it.
post #24 of 44
the fact that it is lower power means that it will be able to run faster clock speeds at lower temps. Thats the idea guys! The lower power consumption allows it to go faster without frying the chip.
post #25 of 44
Quote:
Originally posted by MarcUK
I dont understand, if a new Altivec was fully backwards compatible with the old ISA, why would it be a bad thing if the were new instructions for doubles?

Existing Altivec code still runs fine, but new code could do x2 performance, or 64 bit vectors.

Where is the Altivec thread you speak of. Can't find it.

The existing Altivec code would still run fine on machines that has Altivec 2. The problem is that if someone wrote code for Altivec 2, only the processors supporting AV2 would be able to run the program, and the developer would have to write a fallback in AV or scalar code for older processors. It translates for additional work (iow. investments in time and money) for the Altivec programmer as he/she'd have to write more processor-specific code, only to have the new code working for a tiny fraction of the installed base. This can only be worth it if performance gains are truly huge (as they are with regular altivec compared to scalar).

As I'm sure somebody else have said; tweak the implementation of existing Altivec on the processor for more effectiveness and remove practical limitations. Neither old nor new code will stop working anywhere.

As for SMT or IMC: Use the transistors on these technologies instead. Both will earn speed increases. Writing code for SMT will not alienate processors without SMT.
post #26 of 44
Interesting discussion. Perhaps an abstraction of the specifics would be enlightening...

I would hazzard to assert, that for every technological platform, there is an optimal balance between advancement and compatability.

If the platform stagnates, users lose, in that they might be more productive with additional improvements. Yet, at the same time, developers can concentrate on delivering code optimized for a rather homogenous platform.

If the platform is constantly evolving, users should theoretically have more powerfull tools in their hands sooner. But... developers are will need to aquire the skills for each variation and tailor code to take advantage of each revision.

With Altivec, I think that currently, we have more to bennefit from uniformity and experienced programmers, than from an incremental improvement that fragments the platform.
post #27 of 44
Quote:
Originally posted by Zapchud
The existing Altivec code would still run fine on machines that has Altivec 2. The problem is that if someone wrote code for Altivec 2, only the processors supporting AV2 would be able to run the program, and the developer would have to write a fallback in AV or scalar code for older processors.

Unlike some here, I don't see this as an additional burden. Developers already have to do this for PPC without AltVec or they phase out support for older processors. It is not an issue.
Quote:

It translates for additional work (iow. investments in time and money) for the Altivec programmer as he/she'd have to write more processor-specific code, only to have the new code working for a tiny fraction of the installed base. This can only be worth it if performance gains are truly huge (as they are with regular altivec compared to scalar).

Any performance gains from an AltVec2, no matter what the update implements, will likely be very domain specific. For those applicaitons where additional functionality in the vector unit improves performance, there will be no resistance what so ever to adopting the new technology. Look at it this way if whole industries switch to PPC simply because of its performance running certain gnomics codes do you really think leaving behind older processors is of concern? The point is performance on yesterdays processor isn't a big issue considering where AltVec is being used extensively.
Quote:

As I'm sure somebody else have said; tweak the implementation of existing Altivec on the processor for more effectiveness and remove practical limitations. Neither old nor new code will stop working anywhere.

This is as much a part of AltVec2 as anything. the reality is that is you are going to add execution units or tweak other things you might as well consider new operations and data types. Either way old code isn't a problem.
Quote:
As for SMT or IMC: Use the transistors on these technologies instead. Both will earn speed increases. Writing code for SMT will not alienate processors without SMT.

Again I have to disagree, code written specifically for a machine supportin SMT is very likely to alienate processors without SMT. At the very least you will have a huge difference in performance.

In any event there is a huge surplus of transistors right now in IBM's 970 implementations. One could implement SMT, and an integrated Memory Controller and a host of toher functionality and still not be at the size of an Intel chip. We could argue about finding the right balance but it is already clear from Power5 that simiply adding SMT will not fill the chip to the same area as one Prescott.

It will be rather sad to have the main core improve continuously and not see any attention paid to the vector side of the chip. On the 970 vector performance is already lagging the G4 in some respects (thankfully other parts of the chip compensate), I just can't see why we have this big resistance to doing better. If not better at least having the same attention paid to it that we see being applied to the other components of the processor.

Thanks
dave
post #28 of 44
Quote:
Originally posted by wizard69
I just can't see why we have this big resistance to doing better. If not better at least having the same attention paid to it that we see being applied to the other components of the processor.

There is absolutely no resistance here 'to doing better'. Do you really characterize this discussion as about that?

This discussion is obviously about the tradeoffs involved with changing or extendeding the current implementation of altivec. Some are arguing that the proposed additions are not worth it at this point in time. Why assume that they want to keep altivec the same for all of eternity?
post #29 of 44
Quote:
Originally posted by wizard69
[B]Unlike some here, I don't see this as an additional burden. Developers already have to do this for PPC without AltVec or they phase out support for older processors. It is not an issue.

They already have to do it once, if at all. Having to do it twice is an issue.

Quote:
Any performance gains from an AltVec2, no matter what the update implements, will likely be very domain specific. For those applicaitons where additional functionality in the vector unit improves performance, there will be no resistance what so ever to adopting the new technology. Look at it this way if whole industries switch to PPC simply because of its performance running certain gnomics codes do you really think leaving behind older processors is of concern? The point is performance on yesterdays processor isn't a big issue considering where AltVec is being used extensively.

I'm not sure what you're trying to say here.

Quote:
This is as much a part of AltVec2 as anything. the reality is that is you are going to add execution units or tweak other things you might as well consider new operations and data types. Either way old code isn't a problem.

It has nothing to do with Altivec 2. Refining and tweaking execution units does not change the programming interface. See the difference between the 7400 and 7450 class of CPUs.

Quote:
Again I have to disagree, code written specifically for a machine supportin SMT is very likely to alienate processors without SMT. At the very least you will have a huge difference in performance.

Why is it likely to alienate processors without SMT? As Programmer said, creating SMT optimized code is no different than creating SMP optimized code. Multithreaded code does work fine on single-threaded processors.
The difference is not likely to be huge, but well worth the transistor and development cost.

Quote:
In any event there is a huge surplus of transistors right now in IBM's 970 implementations. One could implement SMT, and an integrated Memory Controller and a host of toher functionality and still not be at the size of an Intel chip. We could argue about finding the right balance but it is already clear from Power5 that simiply adding SMT will not fill the chip to the same area as one Prescott.

What's up with having to create a chip as physically large as the Prescott? Add some cache, if you aren't satisfied with the die size.

I'd say implementing the above on the 970 would create much more of a performance increase than Altivec 2 over-all, while having none of the disadvantages.

Quote:
It will be rather sad to have the main core improve continuously and not see any attention paid to the vector side of the chip. On the 970 vector performance is already lagging the G4 in some respects (thankfully other parts of the chip compensate), I just can't see why we have this big resistance to doing better. If not better at least having the same attention paid to it that we see being applied to the other components of the processor.

None have said or suggested that the vector unit should be forgotten about and left in the dust. The 970 vector unit is lagging the G4 one in some respects because it is less refined and tweaked. But you can still have one altivec code base that works on both CPUs. You don't need to extend or change the ISA in any way to "fix" the 970 vector unit.
post #30 of 44
Quote:
Originally posted by wizard69
The problem is I have read what you have written and am bothered by it. It is a very good thing that they did a very good job with the first implementation of AltVec I think everybody can agree with that. The problem is that the world doesn't stop after the applause, just as the rest of PPC has been improved there is an opportunity to do so with the vector unit.

My point is that there is no need to change the ISA to accomplish this. The 970's vector implementation isn't nearly as strong as it could be without changing the ISA. The nature of most vector code tends to mean that throwing more execution units at it will have a nearly linear speed up.

Quote:
AltVec certainly can improve, but expanding the instruction set shouldn't be off limit with respect to those improvements. Even the FPU has benefited over time with new instructions, most people have been please with that. As long as the programming model for existing instructions doesn't change then I don't really see a problem.

There have been something like 3 instructions added to the FPU since the 601, and they are rarely used. Since all processors since the 601 (except perhaps the 603, I can't remember offhand) have implemented them developers can, at this point, use them without worrying about compatibliity.

Quote:
Well this is wrong, SMT on Power5 did not speed up all existing code, IBM on their web site reviewed issues where that simply isn't the case. Power5 is faster due to a number of improvements not just SMT. SMT can be a performance negative just like in the Intel world, apparently it is much better than the Intel approach but there are still situations where it just doesn't help.

I phrased that badly... SMT works with all existing code, and in cases where it can help performance the OS can adjust the thread priorities so that it does. Where it doesn't, it can effectively be turned off. This logic is built into AIX, and it uses the POWER5's rather prodigious self-monitoring capabilities.

Even better: SMT, IMC, and more cache can be left out of future processors without impacting the software installed base.

Quote:
The point is that AltVec can be extended to speed up future code, without impacting current code. Some of thsoe improvements could be the result of new instructions or data types.[/B]

Yes, it could. My objection is that doubling the register sizes and/or adding double precision support is hugely expensive in terms of transistors, and it provides marginal benefits to most applications (and only those that are re-written to use the new instructions... not likely to happen until the number of machines in the market with these capabilities has reached a level to make it practical). Double precision support is only an improvement over the dual FPUs if you also go to 256-bit registers... a very significant expense for a small fraction of potential applications.

And don't discount the importance of not forcing complexity on all following chips. AltiVec, as it stands, is a fairly heft investment that Apple is stuck with -- fortunately it has proven to have a substantial payoff and significant software investment has already been made. Doing the 256-bit registers + double precsion math would saddle all future chips for Apple with this heavy cost. If Apple wants to, for example, do a 4 core SMT processor with really long pipelines and a super high clock rate, they can't if they've tied themselves into an AltiVec unit that is 2-3 times as complex.

Perhaps there is an instruction or two that could be added cheaply, but unless they have some really revolutionary instructions (possible, but unlikely) the potential benefit is hardly something to excited over (just like those extra FPU instructions were of only passing interest).


Oh, and by the way.
Providing grist for the rumour mill since 2001.
Reply
Providing grist for the rumour mill since 2001.
Reply
post #31 of 44
Damn you guys are at each other and I'm not even talking for once. Makes me feel better.
onlooker
Registered User

Join Date: Dec 2001
Location: parts unknown




http://www.apple.com/feedback/macpro.html
Reply
onlooker
Registered User

Join Date: Dec 2001
Location: parts unknown




http://www.apple.com/feedback/macpro.html
Reply
post #32 of 44
Quote:
Originally posted by onlooker
Damn you guys are at each other and I'm not even talking for once. Makes me feel better.

We aren't going to let you off that easy, time to wiegh in with your perspective!

It is probally worth noting that we are not going to come to an agreement here. Even if they don't expand the instruction set I think everyone does agree on one thing, that is that we would love to see effort put into ratcheting up vector performance on the 970.

Then again I suppose there is an element that would not want to see better performance.

Dave
post #33 of 44
Quote:
Originally posted by wizard69
Even if they don't expand the instruction set I think everyone does agree on one thing, that is that we would love to see effort put into ratcheting up vector performance on the 970.

Of course.
post #34 of 44
Quote:
Originally posted by wizard69
Then again I suppose there is an element that would not want to see better performance.

There's always a stick in the mud somewhere.


I am really curious to see what IBM is going to do in the POWER6. If they can't make headway in terms of clock rate, what will they turn to? Adding a vector unit to the POWER family would be interesting, but a bit of an also-ran. Will IBM go out on a limb in their flagship server product and try something radical? What would that look like?

And what's after the next thing in the 970 family. The next thing is apparently going to be a refined 90nm process, larger caches, maybe slightly stretched pipelines, a small clock rate bump, and a version with twin cores. After that, however, is a more interesting question. Will we get SMT? IMC? Or something more radical/unconventional? On-chip I/O perhaps?
Providing grist for the rumour mill since 2001.
Reply
Providing grist for the rumour mill since 2001.
Reply
post #35 of 44
Quote:
Originally posted by Programmer
There's always a stick in the mud somewhere.


I am really curious to see what IBM is going to do in the POWER6. If they can't make headway in terms of clock rate, what will they turn to? Adding a vector unit to the POWER family would be interesting, but a bit of an also-ran. Will IBM go out on a limb in their flagship server product and try something radical? What would that look like?

Well I'm not convinced that clock rate growth is completely gone, but there is an obvious need to increase performance through other alternatives. One place to look for these payoff would be additional execution units.

This is beyond AltVec optimizations and would involve special purpose execution units that enhance things such as networking and cryptology. I also still believe that enhanced instructions will play a role in the future. After all if you can't speed the clock rate up and you have gotten as wide as possible with the cores then the only thing really left is to implement instructions that do more. The FPU is one place such enhancements would pay off, on the other hand a whole new execution unit (or adapt the vector unit) to do BCD math could pay off for some usage.

It was my understanding at one time that IBM had a power variant that was extended to do BCD math. I'm sure they have a number of things up their sleeves. I still believe one of those would be an improved vector component optimised more for scientific applications than signal processing.
Quote:

And what's after the next thing in the 970 family. The next thing is apparently going to be a refined 90nm process, larger caches, maybe slightly stretched pipelines, a small clock rate bump, and a version with twin cores.

For me the question is how soon will we see these? I don't really see all that small of a clock rate bump either. I think a 500MHz gain to 3 GHz should be easy on a reoptimized process / core.

For small systems though one thing that I would have to think that IBM and Apple must seriously be looking at is high integration devices beyond simply dual core. Here I'm talking about SoC devices, with the driver being higher performance for certain I/O.
Quote:
After that, however, is a more interesting question. Will we get SMT? IMC? Or something more radical/unconventional? On-chip I/O perhaps?

As noted above I don't see on chip I/O as being radical at all. The drivers in this area will be higher performance and lower costs in that order. I do suspect though that we will see IBM/Apple grow into this slowly with possibly an integrated memory controller sitting next to high speed buses such as Hypertransport or PCI-Express. This would actually be a nice machine with the low pin count I/O busses going directly to the I/O chips.

Right now I see the 970's bus a a big drag on low cost high performance machinery. Put the DMA/Memory interface on chip along with the I/O bus and you have an avenue to low cost and high performance. The current arraingement with the 970 really doesn't permit that and is not likely to ever be usefull in low power devices.

Of course others would see these sorts of ideas as questionable. They will be implemented in part in the near future thoug. Things such as IMC have payoffs in both performance and power usage so there is a existing need.

Dave
post #36 of 44
Ironically, if IBM goes for a SIMD unit in the POWER6 it might very well be AltiVec2. What is reasonable for a high-end server with no legacy of vector code is quite different from what is appropriate to a desktop/laptop machine.

The specialized units you describe, along with on-chip I/O are appealing becuase their existance can be entirely hidden behind the OS. These things aren't radical in the embedded market, but the desktop market hasn't yet seen these things. It'll be interesting to see if Apple moves their chipset IP onto the processor by working closely with IBM's designers.

500 MHz? That's only a 20% increase from today, and I don't consider it particularly dramatic.
Providing grist for the rumour mill since 2001.
Reply
Providing grist for the rumour mill since 2001.
Reply
post #37 of 44
Programmer,

Please note that SIMD for the next generation microprocessor has a different role from the one that used to be. Present high performance processors consume too much power in the instruction sequence units, that manage deep OoOE. To achieve both high IPC and low power consumption, several companies plan to use in-order or simple out-of-order execution pipeline with SIMD. In such processors, native ISA will be converted to internal SIMD instructions by some software or hardware. So, there is no need to change the ISA.
IBM's ultra high frequency microprocessor research and Intel's PARROT are two examples of such architecture.
post #38 of 44
Quote:
Originally posted by Programmer
500 MHz? That's only a 20% increase from today, and I don't consider it particularly dramatic.

Well yeah it may only be 20% increase in performance but that shouldn't be condemed when one consideres that 500MHz ues to represent the maximum clock rate of whole computers. In other words that 500MHz is equivalent to the computing performance of a machine that would still be usefull today. Such a boost would not go unnoticed by the average user.

If Apple could manage this much of a boost every 6 months I'd be very happy with them. As we have seen this hasn't happened consitantly at all with Apple products. It is the thought that 3GHz is still doable that has me excited about that 20% increase. Sure a 50% increase would be fantastic but that doesn't look promising at all.

Dave
post #39 of 44
Quote:
Originally posted by mi0im
Programmer,

Please note that SIMD for the next generation microprocessor has a different role from the one that used to be. Present high performance processors consume too much power in the instruction sequence units, that manage deep OoOE. To achieve both high IPC and low power consumption, several companies plan to use in-order or simple out-of-order execution pipeline with SIMD. In such processors, native ISA will be converted to internal SIMD instructions by some software or hardware. So, there is no need to change the ISA.
IBM's ultra high frequency microprocessor research and Intel's PARROT are two examples of such architecture.

Thanks for the links those are interesting papers. I didn't read them in depth, but its not clear to me how the conversion to SIMD would be accomplished unless it was some form of re-compliation (a la Transmeta -- and vectorizers have rarely been effective to date) or a simple remapping to avoid duplicating execution unit functionality (i.e. they don't become SIMD operations, just the extra results are discarded). Anything more complex would make for a huge decoder, which is expressly what they are attempting to eliminate.

The IBM paper on in-order high frequency processors brings up some interesting issues. The value of that compared to lower frequency OOOE super-scalar processors is far from clear, however. All I can tell you is that from my perspective the OOOE processor is much easier to deal with and acheive decent performance with. The in-order high frequency / high latency processor can very easily be made to perform very poorly. On carefully crafted code and problems well suited to its design, it can be made to perform very well... but the majority of code does not fall into that camp.


Particularly interesting is the Intel admission that their x86 decoder is a huge power hog.
Providing grist for the rumour mill since 2001.
Reply
Providing grist for the rumour mill since 2001.
Reply
post #40 of 44
Quote:
Originally posted by wizard69
Well yeah it may only be 20% increase in performance but that shouldn't be condemed when one consideres that 500MHz ues to represent the maximum clock rate of whole computers. In other words that 500MHz is equivalent to the computing performance of a machine that would still be usefull today. Such a boost would not go unnoticed by the average user.

If Apple could manage this much of a boost every 6 months I'd be very happy with them. As we have seen this hasn't happened consitantly at all with Apple products. It is the thought that 3GHz is still doable that has me excited about that 20% increase. Sure a 50% increase would be fantastic but that doesn't look promising at all.

You misunderstand me -- I think they'll get another 500 MHz or so, and that's it. The extra 20% will be a nice little boost but will deliver noticably less than a 20% improvement in system performance for most tasks. And it will be water cooled. After that further clock rate increases will come only from designs like those discussed in the papers that mi0im linked to, which means we'll lose the benefits of OOOE and various other things.

More interesting is sticking to 2.5 GHz and adding a few more cores. The first multi-core chip out of the gate will double the number of cores and net us something like a 60-90% performance improvement at a system level or on software that is multi-threaded. Right now most software (and in particular most benchmarks) is not multi-threaded, but that will change. The majority of things which people are doing that require high performance these days can be parallelized fairly well, and if it can't at least your machine will still operate smoothly while you're running that task at full speed.
Providing grist for the rumour mill since 2001.
Reply
Providing grist for the rumour mill since 2001.
Reply
New Posts  All Forums:Forum Nav:
  Return Home
  Back to Forum: Future Apple Hardware
AppleInsider › Forums › Mac Hardware › Future Apple Hardware › 970GX and low power 970s for PowerBooks