From what I am hearing, these new intel macs are only 32bit. Why did they not implement a 64bit instruction set? Doesn't this seam a little backwards since we just came from a 64bit G5? Hopefully my information is just wrong but would anyone like to expand on this?
Comments
As much as everybody wants 64 bit we had this discussion last summer when the Intel plans were introduced.
- Do you need to address more than 4 GB of RAM?
- Do you need to do 64-bit INTEGER arithmetic?
If not, 64-bit gains you nothing. It might even slow you down if you are pushing 8-byte pointers around that are only carrying 4-byte values.
Intel expects to launch the Merom core in the third quarter of 2006. Merom will use Intel's Next Generation Microarchitecture, a microarchitecture that is different from today's Pentium M and NetBurst, but features some characteristics of both. While little is known about Merom, Intel has revealed that it is a dual core processor supporting Vanderpool Virtualization Technology and EM64T, which is Intel's version of AMD64.
According to Intel, Merom's design places emphasis on both high performance and low power consumption. On a performance per watt basis, Intel claims Merom will outperform Yonah by a 2-1 margin. Ultra low voltage Merom chips will consume as little as 0.5W of power, enabling ultra portable laptops to have battery lives in the tens of hours.
Merom will serve as the basis for a new desktop core codenamed Conroe.
Emphasis on the performance figures is mine (Yonah is the CPU in the Intel iMac).
Originally posted by lundy
64-bit adds nothing for 99% of users.
- Do you need to address more than 4 GB of RAM?
- Do you need to do 64-bit INTEGER arithmetic?
If not, 64-bit gains you nothing. It might even slow you down if you are pushing 8-byte pointers around that are only carrying 4-byte values.
This isn't true with x86 as you gain 8 extra registers. Then again OS X wouldn't support that anyway so the point is moot.
Originally posted by lundy
64-bit adds nothing for 99% of users.
- Do you need to address more than 4 GB of RAM?
- Do you need to do 64-bit INTEGER arithmetic?
If not, 64-bit gains you nothing. It might even slow you down if you are pushing 8-byte pointers around that are only carrying 4-byte values.
Moving data around 64-bits at a time might be quicker.
Originally posted by Telomar
This isn't true with x86 as you gain 8 extra registers. Then again OS X wouldn't support that anyway so the point is moot.
Number of registers have nothing to do with pointer size. Each individual register has to be bigger to accommodate the standard 64-bit word size. So it takes twice the bandwidth, or twice the time to move a 64-bit pointer than a 32-bit one. Seeing as the bus isn't automagically twice as fast for a 64-bit CPU, that just leaves twice the in transit time to transfer the longer data.
Also, an OS doesn't support CPU registers, a compiler does. At runtime, the OS code couldn't care less how many registers a CPU has as it can only use the ones the compiler assigned.
Originally posted by ThinkingDifferent
Moving data around 64-bits at a time might be quicker.
The word size is not the bus width, and doubling the word size has not doubled the bus width, so it takes extra clock strobes to push the same number of words when you move from 32-bit to 64-bit CPUs.
Originally posted by ThinkingDifferent
Moving data around 64-bits at a time might be quicker.
Yeah, except that's not what 64-bit means. It does not mean that you will move 2 32-bit pointers at once; it means you will move 2 32-bit pointers, each in its own (unneeded) 64-bit container.
Each data item still goes in a word or halfword; you don't pack two data items in a word just because the word is twice as large.
E.G.
int a,b,c;
....
...
c = a + b;
in 32 bit PPC
load 4 bytes at location a into lower 32 bits of register 1
load four bytes at location b into lower 32 bits of register 2
add registers 1 and 2, result in register 1
store 32 bits of register 1 into 4 bytes at location c
in 64 bit PPC
load 8 bytes at location a into all 64 bits of register 1
load 8 bytes at location b into all 64 bits of register 2
add registers 1 and 2, result in register 1
store 64 bits of register 1 into 8 bytes at location c
So unless the integers a, b, and c are larger than 2^32-1, then the moving of the extra 4 bytes is wasted.
Originally posted by Hiro
Number of registers have nothing to do with pointer size.
In general, but there is a quirk with x86 where 64-bit mode has more registers than 32-bit mode. Thus some programs run a little faster in 64-bit mode even if they don't really need 64-bit. As you said, this may be offset by increased memory traffic in 64-bit mode.
Also, an OS doesn't support CPU registers, a compiler does. At runtime, the OS code couldn't care less how many registers a CPU has as it can only use the ones the compiler assigned.
The OS needs to save and restore registers on context switches, so it definitely cares how many there are.
Originally posted by wmf
In general, but there is a quirk with x86 where 64-bit mode has more registers than 32-bit mode. Thus some programs run a little faster in 64-bit mode even if they don't really need 64-bit. As you said, this may be offset by increased memory traffic in 64-bit mode.
True there are more registers available in x86 64-bit mode, but you bring up an orthogonal issue to the bus penalties of going 64-bit. The time it takes to populate those registers does not go down because of how many there are. More registers also makes CPU context switches take longer, which is exasperated in 64-bit modes because those extra registers are also twice as big.
The OS needs to save and restore registers on context switches, so it definitely cares how many there are.
This is just so much semantics. Sure the OS causes the context switch and flushes/repopulates the CPU registers, but none of that has anything determined at runtime. You either swap processes or swap modes, the steps taken after that were hardcoded by the compiler for that CPU. It it just so much blind execution, no comparisons of attributes, no caring what the execution environment is.
The extra registers really don't have anything to do with the 64-bitness itself either. It was just a opportune time to fix a long standing x86 bottleneck without breaking ALL the x86 32-bit compilers. As an architecture advance it is nice, with the extra registers allowing the compiler to optimize machine code to force feed the CPU a bit more effectively.
Moving data around 64-bits at a time might be quicker.
and
This isn't true with x86 as you gain 8 extra registers. Then again OS X wouldn't support that anyway so the point is moot.
Of course OS X compiled for that chip would support the extra registers. It's all about the compiler there! Even though the OS doesn't have runtime clue 1 about the architecture.
It has been quite the roundabout journey of misdirection and false start segues to get right back here to the same point with the same result. Those 2 statements are still completely wrong.
Originally posted by Anders
Intel can´t provide fast processors with 64bit. IBM can´t provide 64 bit processors that are fast. Apple made the sane choice.
As much as everybody wants 64 bit we had this discussion last summer when the Intel plans were introduced.
And AMD can provide both. Go Apple!
Originally posted by Hiro
Of course OS X compiled for that chip would support the extra registers. It's all about the compiler there! Even though the OS doesn't have runtime clue 1 about the architecture.
It has been quite the roundabout journey of misdirection and false start segues to get right back here to the same point with the same result. Those 2 statements are still completely wrong.
Yes because as everybody knows the compilers of this world do everything
At the end of the day OS X has no support for IA-32e yet and I doubt it will prior to Leopard.
Originally posted by Telomar
Yes because as everybody knows the compilers of this world do everything
It's obvious you don't know what they do and don't do. Or the first thing about how they generate CPU specific code. What about -mtune=cpu-type in GCC, I'm quite sure ICC has an equivalent. Not automagic multi-thread from vapor ultra code, but just good old fashioned CPU specific machine code.
At the end of the day OS X has no support for IA-32e yet and I doubt it will prior to Leopard.
At the end of the day, it doesn't matter because Apple hasn't shipped a x86-64 CPU yet.
**C0NF1r|\\/|ED** I predict with guaranteed perfect precision Apple will support x86-64 the day they ship with that CPU hardware! **C0NF1r|\\/|ED**
And I am a local hero from preventing anyone in a 50NM radius from being bit by Hyena's last night!! [no matter there are no hyenas or IA-32e Macs]
[edit: gawd I can't type]