970 Questions
I've looked at the various information available on the 970, but have a few questions that someone here might be able to answer.
If the 970 is a 64-bit chip, does the VMX allow for two 64-bit values or is it still limited to 4 32-bit, 8 16-bit, and 16 8-bit values? Assuming it doesn't, wouldn't it have been easy for IBM to include instructions to support 2 64-bit values? This would give many FFT users double precision, which is sorely lacking from Altivec. Will the performance of add/subtract in VMX be substantially better than it is in Altivec?
As I understand, the size of an int (in bits) is the same size as a GPR. Does this mean that an int on the 970 is 64-bits? If not, is a long 64-bits? What is a long long? What is a short? What about floats and doubles? Is the pending introduction of the 970 related to Apples deprecation of vector signed long and vector unsigned long in OS X 10.2?
Looking for answers....
If the 970 is a 64-bit chip, does the VMX allow for two 64-bit values or is it still limited to 4 32-bit, 8 16-bit, and 16 8-bit values? Assuming it doesn't, wouldn't it have been easy for IBM to include instructions to support 2 64-bit values? This would give many FFT users double precision, which is sorely lacking from Altivec. Will the performance of add/subtract in VMX be substantially better than it is in Altivec?
As I understand, the size of an int (in bits) is the same size as a GPR. Does this mean that an int on the 970 is 64-bits? If not, is a long 64-bits? What is a long long? What is a short? What about floats and doubles? Is the pending introduction of the 970 related to Apples deprecation of vector signed long and vector unsigned long in OS X 10.2?
Looking for answers....
Comments
Why not put double precision FP through the standard FPU? I suspect that "double precision lacking in Altivec" is because the G4 could do with some more Hz/scalar FPU units to challenge the Athlon.
Why does the P4 do this? Any advantages for the P4? Do these translate to 970 advantages?
Double precision FFT users currently have their code in scalar form. The 970 will run that code twice as fast (at the same clock rate) thanks to the 2 FPUs. This will be as fast as the suggested enhancement to VMX without having to rewrite the code.
An integer register is 64-bits wide on the 970. In the compilers this will probably be represented as a "long long", but I might be wrong about this since Apple hasn't announced their 64-bit OS APIs yet. "int" will likely stay 32-bits, but "long" may go to 64-bits. If it does then "long long" will likely be promoted to 128-bits.
"float" is defined by IEEE to be 32-bits and "double" to be 64-bits -- the PowerPC execution model defines the FPU registers to be the size of a double.
"vector signed long" and "vector unsigned long" may have been eliminated to remove confusion about this issue going forward, they certainly were redundant.
BTW: The P4 only puts double precision through their "vector" unit because the x86 FPU architecture is so incredibly stupid that they can't figure out how to get decent performance out of it. Rather than trying they added scalar double support to the SSE2 unit, avoiding the whole problem with the FPU register stack and how it is shared with the MMX unit. It also means that AMD needs to implement SSE2 if they want to stay compatible with P4-optimized code.
<strong>In 32-bit mode, PowerPC uses an "ILP32" model where int, long, and void* are 32 bits. In 64-bit mode it uses an "LP64" model where int is 32 bits and long and void* are 64 bits. I'm not sure about long long.</strong><hr></blockquote>
This is actually a function of the C/C++ compiler. Different compilers (or even different settings on the same compiler) could present different size data types.
<strong>This is actually a function of the C/C++ compiler. Different compilers (or even different settings on the same compiler) could present different size data types.</strong><hr></blockquote>
True, but AFAIK there is some kind of standard for C on PowerPC, so all the compilers do it the same way.