My guess is IBM is much closer to having a Power4 replacement than some people think. As someone here said previously, they built the Gekko G3 variant for Nintendo. The Power4 replacement could easily be an Apple only chip like the Gekko. The production on the chip wouldn't follow IBM's Power 'roadmap' or timetables.
<strong>There are just a very few cases where a program would gain speed by being remade into a 64 bit program using more 64 bit data types. Most programs would just be slower, a computer that has to shuffle 64 bits where only 32 are needed will waste more resources.
(please correct me if I am wrong).</strong><hr></blockquote>
I an including a link to a great article called, <a href="http://www.ddj.com/documents/s=1038/ddj9301c/9301c.htm" target="_blank">Cray: 64-bit programming in a 32-bit world</a> Not too tough nor detrimental, speed-wise, if you just don't use the machine word size in code. Quoting the first paragraph, "Compared to 16-bit programming, 32 bits means faster programs, more memory with straightforward addressing, and better processor architecture."
<strong>I an including a link to a great article called, <a href="http://www.ddj.com/documents/s=1038/ddj9301c/9301c.htm" target="_blank">Cray: 64-bit programming in a 32-bit world</a> Not too tough nor detrimental, speed-wise, if you just don't use the machine word size in code. Quoting the first paragraph, "Compared to 16-bit programming, 32 bits means faster programs, more memory with straightforward addressing, and better processor architecture."
Same can be said for 64-bit programming.</strong><hr></blockquote>
Except for memory addresses, which are heavily used in modern C++ code. All of these must be 64-bits if the process is has a 64-bit address space.
<strong>My guess is IBM is much closer to having a Power4 replacement than some people think. As someone here said previously, they built the Gekko G3 variant for Nintendo. The Power4 replacement could easily be an Apple only chip like the Gekko. The production on the chip wouldn't follow IBM's Power 'roadmap' or timetables.</strong><hr></blockquote>
I'd look for a powr4-core mobo from IBM for G4s this time next year. The days are numbered with moto and anything with AMD would be last resort.
Jumping from 32bit to 64bit does NOT mean a doubling of speed...
16 to 32 effectively did, though. With 16 bits, integers could only be in a range of –32768 to 32767, but with 32 bits, the range is -2,147,483,648 to 2,147,483,647.
There were plenty of times where 16 bit integers were too small for calculations, but the same can't be said of 32 bit integers, where the only applications that would really benefit from 64 bit integers would be crypto, and some niche engineering programs. If anything, I read from a study that compiling 64-bit code actually on average takes a 5% performance hit. And it makes sense, when you think about it, since 64-bit numbers take up much more space than their 32-bit counterparts.
The only real-world advantage I can see for mainstream 64-bit platforms is the addressing of memory greater than 4 GB. Anyone know how much memory 64-bit registers can accomodate?
The only real-world advantage I can see for mainstream 64-bit platforms is the addressing of memory greater than 4 GB. Anyone know how much memory 64-bit registers can accomodate? </strong><hr></blockquote>
16 billiongigabytes.
64-bit address spaces are useful for some sparse data structures -- i.e. its a huge structure but a very large amount of the space is never touched an thus doesn't actually need any memory pages at those locations. Pretty rare technique though.
Well theres been plenty of time for Apple to realize that Mot isn't going anywhere, or at best is unreliable. They've had two years in which to sit down with IBM and design a custom next generation chip for Macs. That's plenty of time to create a less expensive, smaller, and cooler chip based on the Power4 architecture.
I an including a link to a great article called, <a href="http://www.ddj.com/documents/s=1038/ddj9301c/9301c.htm" target="_blank">Cray: 64-bit programming in a 32-bit world</a> Not too tough nor detrimental, speed-wise, if you just don't use the machine word size in code. Quoting the first paragraph, "Compared to 16-bit programming, 32 bits means faster programs, more memory with straightforward addressing, and better processor architecture."
Same can be said for 64-bit programming.</strong><hr></blockquote>
[ Note: I had to edit this, I misread your comment! I guess we in most parts agree. Sorry! I still let most of the text remain and hope it will give someone something, now that I have written it. ]
No, it can't. Actually, the "faster" statement in most cases is false, it just happens to be true for the PCs' old architecture and heritage and ugly solutions to its shortcomings some 15 years ago, and has therefore been taken as a truth by some. It is sad to see a cray researcher lend himself to such cheap a trick to earn some kind of points for cray hardware, what kind I don't know. I am bent to think that it isn't the researcher himself that wrote that part, since the rest of the article proves that he understands the differences of different register sizes.
The rest of the article is quite good though.
I'll try to keep this short:
Different processors have different shortcomings in handling different sizes of data, which the article also mentions. The only thing the article says regarding speed is that it is typically faster, and on some processors only possible with compiler tricks which means slowdown, to address data on its natural alignment. Nothing else. This is true for the ppc too. The ppc can handle 8, 16, 32 and soon probably 64 bit data very well with no special treatment or difference in speed, but reading a 64 bit word that not lies on its natural alignment would require two bus cycles on a 64 bit bus instead of one. It can actually handle larger data sizes too, depending on what you want to do with what of its units. The FPU has been using 64 bit data and the addition altivec 128 bit data as their basic types all along.
The size of the registers matters for the size of the data you want to handle is, that is all. If you want to handle data that is larger than the register size, you have to emulate larger registers with more instructions and registers, which is bad for speed. If we want to handle 64 bit address spaces, which we want more and more often, we want 64 bit registers.
Many programs of today are fine with 32 bit integer/fixed point registers and 64 bit floating point registers as they are in the ppc today, and are just using those datatypes, even when only 5 or 14 or whatever are really required. Shuffling those extra never used bits just takes bandwith from the bus. This is only if the programmer is sloppy, but they (we) most of the time are, and analyzing how many bits would really be needed would require a lot of work and be error prone (with the tools of today at least).
When switching to 64 bit architecture, many programs will use 64 bit data where 32 bits are used today, shuffling even more bits that will never be used.
This is probably not a very big deal, but that is why I say that there for many programs is no speed gain in going 64 bit, it can even slow things down.
The PPC is a derivate of the POWER family, which has long before the PPC and maybe from the very beginning been designed to be able to have both 32 and 64 bit implementations. The 32 bit implementations are just lacking half of the register bits and some instructions and instruction modes for handling those extra bits and a little more glue to make it emulate a 32 bit implementaion (maybe a chip designer would argue on how small the differences are :-). The 64 bit implementations can easily be switched to and from 32 bit mode, the logical differences are really small. On a 64 bit ppc, processes can be 32 or 64 bit running side by side with no performance hit (except for the instruction that has to be run to switch mode).
Many programs will probably go faster if they are just run in 32 bit mode, and we will probably see a lot of those for a long while.
If one would like to do comparisons (which often is dangerous, so I probably shouldn't), one could compare it to shipping a certain load with a larger lorry. It won't go faster just because the lorry is larger, it will only take up more space on the road - consuming more bandwith for transporting things. A larger load COULD fit, though, without the work of splitting it up and reassembling it, but if that isn't used, the it will just be wasted.
I hope it is now clear why a 64 bit ppc will not magically buy you speed, it won't more than in a few special cases and applications.
As mentioned here before, one good thing with 64 bit addressing is that you could have sparse mode addressing which can do some nice tricks, but not necessarily gain any speed.
<strong>On a 64 bit ppc, processes can be 32 or 64 bit running side by side with no performance hit (except for the instruction that has to be run to switch mode).</strong><hr></blockquote>
Great post! One nitpick: the mode switch to/from 32/64 bit mode will probably be on a per-process basis and the mode bit will be carried for free in the context switch -- i.e. zero overhead.
32-bit applications will likely remain the norm for a long time since there will be 32-bit PPC chips around for a long time and developers will normally write 32-bit code to ensure they can run on all the PPC machines. Only in particular situations will developers target just the 64-bit machines. And as alluded to above, even when targetting a 64-bit machine it might be faster to compile as 32-bit to get better performance.
Apple has said it's going with the G4 for now. So if an IBM-Power4/5 solution is to be, it will be several years before we see it. Years. Does Apple have the sales numbers to convince IBM to make a custom chip for PowerMacs? Plan on seeing Motorola chips in your Macs for the foreseeable future.
<strong>Apple has said it's going with the G4 for now. So if an IBM-Power4/5 solution is to be, it will be several years before we see it. Years. Does Apple have the sales numbers to convince IBM to make a custom chip for PowerMacs? Plan on seeing Motorola chips in your Macs for the foreseeable future.</strong><hr></blockquote>
Yes and no. Apple (and Moto) has said that the G4 "has plenty of life left in it". That doesn't neccesarily mean that the G4 will be the only proccessor used. They've had a "low-end", "high-end" solution pretty consistently for quite a while now....
Although, I do agree that we may very well have a modified G4 'til the desktop Power5 comes available. Early in 2004 I seem to recall reading...
Whoops. Just remembered that statement was made about the G3. That's what I get for reading/posting while drinking my first coffee. <img src="graemlins/oyvey.gif" border="0" alt="[No]" />
<strong>Whoops. Just remembered that statement was made about the G3. That's what I get for reading/posting while drinking my first coffee. </strong><hr></blockquote>
Actually Motorola said essentially that about the G4, and Apple said it about the G3 (for the iBook). I think IBM has an VelocityEngine-equipped RapidIO G3 on the way that would suit the iBook just fine (look at their roadmap). Just because everybody (including me) thinks the POWER4 will form the basis for the next high-end PowerPC chip, doesn't mean they'll suddenly drop all of their existing low-power stuff.
Interesting to note that this paper, written by an IBM guy and on an IBM site, talks about AltiVec programming and refers to AltiVec articles. It also mentions the term "VMX".
Interesting to note that this paper, written by an IBM guy and on an IBM site, talks about AltiVec programming and refers to AltiVec articles. It also mentions the term "VMX".
Hmmm....</strong><hr></blockquote>
More specifically, the article says,"Altivec (also called VMX)"....interesting
I think we're gonna get 1167MHz G4's from Motor-scro-la and everything from there on out will be from IBM, that's what I think (or what I hope, more specifically). Good riddance.
It's great to see IBM is working with Altivec on some level, has a new high-tech fab with a process that's more advanced than the one MOS13 uses (the new IBM fab operates at .09µ correct?), Apple is talking about "options", etc. The signs seem positive for a change....
Comments
<strong>There are just a very few cases where a program would gain speed by being remade into a 64 bit program using more 64 bit data types. Most programs would just be slower, a computer that has to shuffle 64 bits where only 32 are needed will waste more resources.
(please correct me if I am wrong).</strong><hr></blockquote>
I an including a link to a great article called, <a href="http://www.ddj.com/documents/s=1038/ddj9301c/9301c.htm" target="_blank">Cray: 64-bit programming in a 32-bit world</a> Not too tough nor detrimental, speed-wise, if you just don't use the machine word size in code. Quoting the first paragraph, "Compared to 16-bit programming, 32 bits means faster programs, more memory with straightforward addressing, and better processor architecture."
Same can be said for 64-bit programming.
<strong>I an including a link to a great article called, <a href="http://www.ddj.com/documents/s=1038/ddj9301c/9301c.htm" target="_blank">Cray: 64-bit programming in a 32-bit world</a> Not too tough nor detrimental, speed-wise, if you just don't use the machine word size in code. Quoting the first paragraph, "Compared to 16-bit programming, 32 bits means faster programs, more memory with straightforward addressing, and better processor architecture."
Same can be said for 64-bit programming.</strong><hr></blockquote>
Except for memory addresses, which are heavily used in modern C++ code. All of these must be 64-bits if the process is has a 64-bit address space.
<strong>My guess is IBM is much closer to having a Power4 replacement than some people think. As someone here said previously, they built the Gekko G3 variant for Nintendo. The Power4 replacement could easily be an Apple only chip like the Gekko. The production on the chip wouldn't follow IBM's Power 'roadmap' or timetables.</strong><hr></blockquote>
I'd look for a powr4-core mobo from IBM for G4s this time next year. The days are numbered with moto and anything with AMD would be last resort.
16 to 32 effectively did, though. With 16 bits, integers could only be in a range of –32768 to 32767, but with 32 bits, the range is -2,147,483,648 to 2,147,483,647.
There were plenty of times where 16 bit integers were too small for calculations, but the same can't be said of 32 bit integers, where the only applications that would really benefit from 64 bit integers would be crypto, and some niche engineering programs. If anything, I read from a study that compiling 64-bit code actually on average takes a 5% performance hit. And it makes sense, when you think about it, since 64-bit numbers take up much more space than their 32-bit counterparts.
The only real-world advantage I can see for mainstream 64-bit platforms is the addressing of memory greater than 4 GB. Anyone know how much memory 64-bit registers can accomodate?
<strong>
The only real-world advantage I can see for mainstream 64-bit platforms is the addressing of memory greater than 4 GB. Anyone know how much memory 64-bit registers can accomodate? </strong><hr></blockquote>
16 billion gigabytes.
64-bit address spaces are useful for some sparse data structures -- i.e. its a huge structure but a very large amount of the space is never touched an thus doesn't actually need any memory pages at those locations. Pretty rare technique though.
<strong>
16 billion gigabytes.
</strong><hr></blockquote>
Mmmm... Looks like I'd have to solder on a lot more RAM slots...
<strong>
I an including a link to a great article called, <a href="http://www.ddj.com/documents/s=1038/ddj9301c/9301c.htm" target="_blank">Cray: 64-bit programming in a 32-bit world</a> Not too tough nor detrimental, speed-wise, if you just don't use the machine word size in code. Quoting the first paragraph, "Compared to 16-bit programming, 32 bits means faster programs, more memory with straightforward addressing, and better processor architecture."
Same can be said for 64-bit programming.</strong><hr></blockquote>
[ Note: I had to edit this, I misread your comment! I guess we in most parts agree. Sorry! I still let most of the text remain and hope it will give someone something, now that I have written it. ]
No, it can't. Actually, the "faster" statement in most cases is false, it just happens to be true for the PCs' old architecture and heritage and ugly solutions to its shortcomings some 15 years ago, and has therefore been taken as a truth by some. It is sad to see a cray researcher lend himself to such cheap a trick to earn some kind of points for cray hardware, what kind I don't know. I am bent to think that it isn't the researcher himself that wrote that part, since the rest of the article proves that he understands the differences of different register sizes.
The rest of the article is quite good though.
I'll try to keep this short:
Different processors have different shortcomings in handling different sizes of data, which the article also mentions. The only thing the article says regarding speed is that it is typically faster, and on some processors only possible with compiler tricks which means slowdown, to address data on its natural alignment. Nothing else. This is true for the ppc too. The ppc can handle 8, 16, 32 and soon probably 64 bit data very well with no special treatment or difference in speed, but reading a 64 bit word that not lies on its natural alignment would require two bus cycles on a 64 bit bus instead of one. It can actually handle larger data sizes too, depending on what you want to do with what of its units. The FPU has been using 64 bit data and the addition altivec 128 bit data as their basic types all along.
The size of the registers matters for the size of the data you want to handle is, that is all. If you want to handle data that is larger than the register size, you have to emulate larger registers with more instructions and registers, which is bad for speed. If we want to handle 64 bit address spaces, which we want more and more often, we want 64 bit registers.
Many programs of today are fine with 32 bit integer/fixed point registers and 64 bit floating point registers as they are in the ppc today, and are just using those datatypes, even when only 5 or 14 or whatever are really required. Shuffling those extra never used bits just takes bandwith from the bus. This is only if the programmer is sloppy, but they (we) most of the time are, and analyzing how many bits would really be needed would require a lot of work and be error prone (with the tools of today at least).
When switching to 64 bit architecture, many programs will use 64 bit data where 32 bits are used today, shuffling even more bits that will never be used.
This is probably not a very big deal, but that is why I say that there for many programs is no speed gain in going 64 bit, it can even slow things down.
The PPC is a derivate of the POWER family, which has long before the PPC and maybe from the very beginning been designed to be able to have both 32 and 64 bit implementations. The 32 bit implementations are just lacking half of the register bits and some instructions and instruction modes for handling those extra bits and a little more glue to make it emulate a 32 bit implementaion (maybe a chip designer would argue on how small the differences are :-). The 64 bit implementations can easily be switched to and from 32 bit mode, the logical differences are really small. On a 64 bit ppc, processes can be 32 or 64 bit running side by side with no performance hit (except for the instruction that has to be run to switch mode).
Many programs will probably go faster if they are just run in 32 bit mode, and we will probably see a lot of those for a long while.
If one would like to do comparisons (which often is dangerous, so I probably shouldn't), one could compare it to shipping a certain load with a larger lorry. It won't go faster just because the lorry is larger, it will only take up more space on the road - consuming more bandwith for transporting things. A larger load COULD fit, though, without the work of splitting it up and reassembling it, but if that isn't used, the it will just be wasted.
I hope it is now clear why a 64 bit ppc will not magically buy you speed, it won't more than in a few special cases and applications.
As mentioned here before, one good thing with 64 bit addressing is that you could have sparse mode addressing which can do some nice tricks, but not necessarily gain any speed.
[ 07-26-2002: Message edited by: jerk ]</p>
<strong>On a 64 bit ppc, processes can be 32 or 64 bit running side by side with no performance hit (except for the instruction that has to be run to switch mode).</strong><hr></blockquote>
Great post! One nitpick: the mode switch to/from 32/64 bit mode will probably be on a per-process basis and the mode bit will be carried for free in the context switch -- i.e. zero overhead.
32-bit applications will likely remain the norm for a long time since there will be 32-bit PPC chips around for a long time and developers will normally write 32-bit code to ensure they can run on all the PPC machines. Only in particular situations will developers target just the 64-bit machines. And as alluded to above, even when targetting a 64-bit machine it might be faster to compile as 32-bit to get better performance.
<strong>Apple has said it's going with the G4 for now. So if an IBM-Power4/5 solution is to be, it will be several years before we see it. Years. Does Apple have the sales numbers to convince IBM to make a custom chip for PowerMacs? Plan on seeing Motorola chips in your Macs for the foreseeable future.</strong><hr></blockquote>
Yes and no. Apple (and Moto) has said that the G4 "has plenty of life left in it". That doesn't neccesarily mean that the G4 will be the only proccessor used. They've had a "low-end", "high-end" solution pretty consistently for quite a while now....
Although, I do agree that we may very well have a modified G4 'til the desktop Power5 comes available. Early in 2004 I seem to recall reading...
Whoops. Just remembered that statement was made about the G3. That's what I get for reading/posting while drinking my first coffee. <img src="graemlins/oyvey.gif" border="0" alt="[No]" />
The 2 proccessor comment still stands, tho'....
[ 07-28-2002: Message edited by: taboo ]</p>
<strong>Whoops. Just remembered that statement was made about the G3. That's what I get for reading/posting while drinking my first coffee. </strong><hr></blockquote>
Actually Motorola said essentially that about the G4, and Apple said it about the G3 (for the iBook). I think IBM has an VelocityEngine-equipped RapidIO G3 on the way that would suit the iBook just fine (look at their roadmap). Just because everybody (including me) thinks the POWER4 will form the basis for the next high-end PowerPC chip, doesn't mean they'll suddenly drop all of their existing low-power stuff.
<a href="http://www-106.ibm.com/developerworks/library/l-ppc/" target="_blank">IBM PowerPC assembly</a>
[QB]
I think IBM has an VelocityEngine-equipped RapidIO G3 on the way that would suit the iBook just fine (look at their roadmap). [QB]<hr></blockquote>
That would be a G4... G3 is not a single chip-design, but a chip family, so is the G4.
<strong>
That would be a G4... G3 is not a single chip-design, but a chip family, so is the G4.</strong><hr></blockquote>
Okay, substitute "Sahara" for G3 then. IBM might number it in the 7xx series instead of the 7xxx series, which only Motorola has used.
<strong>An excellent article on IBM, Altivec, ppc64 and various 32-bit and 64-bit programming notes:
<a href="http://www-106.ibm.com/developerworks/library/l-ppc/" target="_blank">IBM PowerPC assembly</a></strong><hr></blockquote>
Interesting to note that this paper, written by an IBM guy and on an IBM site, talks about AltiVec programming and refers to AltiVec articles. It also mentions the term "VMX".
Hmmm....
<strong>
Interesting to note that this paper, written by an IBM guy and on an IBM site, talks about AltiVec programming and refers to AltiVec articles. It also mentions the term "VMX".
Hmmm....</strong><hr></blockquote>
More specifically, the article says,"Altivec (also called VMX)"....interesting
It's great to see IBM is working with Altivec on some level, has a new high-tech fab with a process that's more advanced than the one MOS13 uses (the new IBM fab operates at .09µ correct?), Apple is talking about "options", etc. The signs seem positive for a change....
Buh bye, Motorola!
[ 08-06-2002: Message edited by: Moogs ]</p>