1) You don't always know the size of the things you are pointing at, and sometimes they are things that contain other pointers so copying them is non-trivial.
2) Sometimes the location is important for more than just the duration of the function. Example: if you are establishing a communications buffer that both caller and callee will repeatedly access. If you copied it invisibly they wouldn't be using the same buffer.
Yes. So APIs like that will need to have fully-functional 64-bit versions.
Quote:
3) This would be extremely slow! Apple is putting a lot of emphasis on speed, and doing this would put them under a big disadvantage.
All the more reason to hurry up and provide real 64-bit versions.
Quote:
No, the better solution is to give the user "malloc32" and make sure that they follow the rule that says they must use that if using a 32-bit API. The API can both check the pointers and mode switch internally.
This is what IBM did when MVS moved from 24-bit addressing to 32-bit addressing. For years you had to remember for each API at which version you could move its parameters "over-the-line".
Once you've exposed malloc32 to developers, it goes into the code (probably by some search-and-replace on malloc) and never comes out again. Three years down the road, all APIs are 64-bit, but a lot of code still uses malloc32. MVS code is full of 24-bit code for just that reason.
My way allows the same application to become more and more 64-bit with each version without recompile.
This is what IBM did when MVS moved from 24-bit addressing to 32-bit addressing. For years you had to remember for each API at which version you could move its parameters "over-the-line".
Once you've exposed malloc32 to developers, it goes into the code (probably by some search-and-replace on malloc) and never comes out again. Three years down the road, all APIs are 64-bit, but a lot of code still uses malloc32. MVS code is full of 24-bit code for just that reason.
My way allows the same application to become more and more 64-bit with each version without recompile.
This isn't as bad as you make it out to be. The API itself can take the 64-bit pointers (converting them down and doing the 32<->64 mode switch), and rather than being called "malloc32" you can name it something more along the lines of "system_safe_malloc" (possibly one per API, like "AllocateWindow", etc). When you go fully 64-bit you just reimplement "system_safe_malloc" to call the normal "malloc" and everything works file.
Maybe this isn't the place to put this but, I didn't want to start another thread.
Just read this over at Arstechnica. Hannibal interviewed IBM on the 970. The one thing that I thought was interesting was that he said that the chip can be clocked at 3:1, 4:1 and 6:1 bus speed. I assumed it would be possible at speeds other than 2:1 since Apple indicated that the chips will be at 3 GHz by next year. It would be hard to believe that we will see more than a 1 GHz bus in the near future. This leads me to two thoughts/questions.
1. When faster chips come out what will be the tradeoff between clock speed and frontside bus speed? For example would a 2.2 GHz machine with a 733 MHz front side bus be faster than a single 2 GHz machine with a 1 GHz bus? faster than the current 1.8 GHz machine with the 900 MHz bus? Or would Apple skip that speed and go straight to a 2.4 GHz machine? Several comments in this forum discussed their ability to scale linearly with clock speed. I assume this will change when the multipliers change.
2. Would it make sense to make the consumer lines with lower front side bus speeds? That might make them cheaper to build and allow for higher GHz speed of the main chip. (Marketing would love that.) I assume the interface chip generates some heat too and if it is not clocked too high it may help keep the heat down inside the box. This would make sense for portables and iMacs.
I can only offer opinion, agreeing with your thought about going to lower bus speed to save power and use lower cost RAM. The G5 can also run at lower voltage and clock rates to cut power dissipation. Regarding much higher clock rate G5s, Apple would have no choice but to use a different multiplier. The upper limit of the controller chip can't be exceeded, whatever it happens to be. I wouldn't expect higher clock rate G5s til they are available on the 90 nanometer process, however.
Shameless bump of this thread. I thought my last post would get at least one comment.
Does anyone even know if the lower speed CPU's are run at lower voltages and, therefore, they are all the same chip? Haven't seen the chip specs.
Maybe you won't be able to overclock them because the software controls the fans and, thus, the cooling system (seems like it would require some type of chip/MB identifier (sp?) to stop OC)
Does anyone even know if the lower speed CPU's are run at lower voltages and, therefore, they are all the same chip? Haven't seen the chip specs. . .
I'm pretty certain the Power Mac G5 are all running at normal voltage, which allows for maximum clock rate for a given chip. As I'm sure you know, individual chips are tested and some will run faster than others.
What determines power dissipation is the clock rate and the supply voltage. The slower a chip runs, the lower the power. So that is one way to cut back on the 42 Watts of a 1.8 GHz CPU. Now, once you cut back on the clock rate, you can also reduce the supply voltage a little bit too. In this way you get even lower power dissipation. IBM was giving the example of a 1.8 GHz 970 at 42 Watts. If you run that chip at 1.2 GHz and cut back the voltage it will dissipate 19 Watts. That was a while ago, and the exact figures are likely a little different.
Someone posted information of the fan controllers. Each CPU fan evidently gets chip temperature from the CPU it is cooling. If you did over clock it, the fan would run faster. The question is whether it would run fast enough under highest load.
Shameless bump of this thread. I thought my last post would get at least one comment.
Apple has definitely differentiated machines based on bus & memory speeds before. Its very interesting news that a clock multiplier is included in the 970, and it is very very good news that Apple's first machines ship with the fastest possible bus. It tells us that Apple is taking performance seriously in their PowerMac line. The lower end machines that get the G5 might not be as maxed-out, but at least the high end ones are.
I'm pretty certain the Power Mac G5 are all running at normal voltage, which allows for maximum clock rate for a given chip. As I'm sure you know, individual chips are tested and some will run faster than others.
What determines power dissipation is the clock rate and the supply voltage. The slower a chip runs, the lower the power. So that is one way to cut back on the 42 Watts of a 1.8 GHz CPU. Now, once you cut back on the clock rate, you can also reduce the supply voltage a little bit too. In this way you get even lower power dissipation. IBM was giving the example of a 1.8 GHz 970 at 42 Watts. If you run that chip at 1.2 GHz and cut back the voltage it will dissipate 19 Watts. That was a while ago, and the exact figures are likely a little different.
Someone posted information of the fan controllers. Each CPU fan evidently gets chip temperature from the CPU it is cooling. If you did over clock it, the fan would run faster. The question is whether it would run fast enough under highest load.
hmmm, I thought you could lower the clock rate (or increase it) of any chip by lowering (increasing) the voltage, as in the MDD 867/1GHz w/the 867 being able to run at 1 GHz by removing a resistor in the voltage path. So if they are the same chip then maybe the same could be done. But I haven't seen any detailed specs for the chip anywhere.
And power consumption on a chip varies with approx. Voltage^2.
hmmm, I thought you could lower the clock rate (or increase it) of any chip by lowering (increasing) the voltage, as in the MDD 867/1GHz w/the 867 being able to run at 1 GHz by removing a resistor in the voltage path. So if they are the same chip then maybe the same could be done. But I haven't seen any detailed specs for the chip anywhere.
And power consumption on a chip varies with approx. Voltage^2.
Maybe an expert should comment on this, but a couple items I am fairly sure about. The clock rate is not controlled by the supply voltage. There is a clock circuit of some kind and it might be crystal controlled. The supply voltage will determine the highest frequency a CPU will run, however. When supply voltage is lowered, the CPU's maximum rated speed decreases. So a 1.8 GHz rated chip will only be rated for say 1.2 GHz at its lowest permitted operating voltage.
Also, I do not believe a resistor would be inserted in the supply line to lower the voltage. The supply must be well regulated and not change with current drain, as it would with a voltage dropping resistor in place. If there is a resistor - capacitor decoupling network in the supply line, the resistance would be very low so it does not significantly affect supply voltage. The resistor you speak of might be in the clock circuit.
A good hardware engineer could correct our errors here. I'm pretty sure you are correct that power varies with the square of supply voltage. It likely changes linearly with clock rate.
Once you've exposed malloc32 to developers, it goes into the code (probably by some search-and-replace on malloc) and never comes out again. Three years down the road, all APIs are 64-bit, but a lot of code still uses malloc32. MVS code is full of 24-bit code for just that reason.
My way allows the same application to become more and more 64-bit with each version without recompile.
But the 32-to-64 bit transition is different from all the previous transitions.
There was no clamoring for a longer list of instructions. (4 billion distinct instructions fits no ones definition of RISC )
There was no strong demand from general profiling: typical code (say, i++) is far more likely to be <500 somewhere than >4,000,000,000 somewhere.
The only real pressure was from a desire for more physical memory. And that is from a pretty small subset of 'all' applications.
So, why would Apple ever want to _remove_ malloc32? Take iCal. The only reason it would ever be forcibly made a pure 64-bit app with pure 64-bit libraries etc is if the hardware no longer supported a 32-bit mode, _and_ running it through the software support was a huge penalty.
I don't see that happening. It doesn't seem like the '640k is all someone would ever need' comments for a couple of reasons:
1) A full 64-bits of memory is one heck of a lot of memory. Even extrapolating from current PC RAM-maxes it'll be decades++ before anyone even comes close.
2) There's some downside to every increase in 'CPU-bitness'. Bandwidth, program size... On all of the previous transitions there's been far more 'upside' potential. Enough that the 'problems' it brought were well worth it. Here the fraction of programs that _need_ the increase is very small.
So it may well be worth stick at the 32-to-64 bit transition for quite a long time. And I really, really like Apple's approach (AFAICT). There won't be any reason for iCal to chew up twice as much memory as it needs (filled entirely with (0x0000)), nor to chew up twice as much bandwidth as it needs (passing (0x0000) around)- it can run as a 32-bit app with nigh-unto-zero penalty. And Oracle, a 64-bit app that needs more-than-4GB-RAM can run at exactly zero penalty.
So unless it is a major PITA to keep the CPU able to run at both 32-bits _and_ 64-bits, I'd stay right here for a long time. (And it really shouldn't be a big deal for future PPCs, its pretty easy to just do the 64-bit calc and just drop the top 32-bits-worth & fix flags. Hmm, rotate's sneakier)
So it may well be worth stick(ing) at the 32-to-64 bit transition for quite a long time. . .
So unless it is a major PITA to keep the CPU able to run at both 32-bits _and_ 64-bits, I'd stay right here for a long time. . .
From everything I've read, the PPC architecture is designed to run both 32 and 64-bit applications basically forever. So, there is no transition to a 64-bit only OS. We are jumping right to an OS that runs both 64 and 32-bit applications. The brief transition, which should last a year or two, is to allow all the code in OS X to catch up with its mission of running both 64 and 32-bit applications. That is what this bridge hardware in the 970 is all about. It will eventually go away.
So, why would Apple ever want to _remove_ malloc32? Take iCal. The only reason it would ever be forcibly made a pure 64-bit app with pure 64-bit libraries etc is if the hardware no longer supported a 32-bit mode, _and_ running it through the software support was a huge penalty.
Judging by the fact that IBM's current mainframe processors support all three modes (24, 31 and 64 bits), I agree that they will probably keep the 32-bit compatability for a long long time.
Thinking it over, I don't think there really is a need for malloc32. Just keep malloc 32-bit forever. People like Yevgeny who need 10Gb chunks of memory (of a lot of small chunks of memory) will use the new function "malloc64". That way, old programs needn't be modified. New programs will use malloc64 just for large chunks which are not passed to system APIs.
Comments
Originally posted by Programmer
. . . Many functions in an API take a memory location as a parameter. . .
The problem arises when the 64-bit app can specify any of 2^64 locations, but the 32-bit library only knows about 2^32 locations.
Ah yes, the fatal flaw in the scheme. Thanks for pointing it out.
No, the better solution is to give the user "malloc32" and make sure that they follow the rule that says they must use that if using a 32-bit API.
So, it looks like 64-bit application will simply ask for memory allocation within the 32-bit address space.
I'm just waiting for the 64 bit version of Doom 3 so I can own all your azzez!
Originally posted by Programmer
Three problems here (at least):
1) You don't always know the size of the things you are pointing at, and sometimes they are things that contain other pointers so copying them is non-trivial.
2) Sometimes the location is important for more than just the duration of the function. Example: if you are establishing a communications buffer that both caller and callee will repeatedly access. If you copied it invisibly they wouldn't be using the same buffer.
Yes. So APIs like that will need to have fully-functional 64-bit versions.
3) This would be extremely slow! Apple is putting a lot of emphasis on speed, and doing this would put them under a big disadvantage.
All the more reason to hurry up and provide real 64-bit versions.
No, the better solution is to give the user "malloc32" and make sure that they follow the rule that says they must use that if using a 32-bit API. The API can both check the pointers and mode switch internally.
This is what IBM did when MVS moved from 24-bit addressing to 32-bit addressing. For years you had to remember for each API at which version you could move its parameters "over-the-line".
Once you've exposed malloc32 to developers, it goes into the code (probably by some search-and-replace on malloc) and never comes out again. Three years down the road, all APIs are 64-bit, but a lot of code still uses malloc32. MVS code is full of 24-bit code for just that reason.
My way allows the same application to become more and more 64-bit with each version without recompile.
Originally posted by synp
This is what IBM did when MVS moved from 24-bit addressing to 32-bit addressing. For years you had to remember for each API at which version you could move its parameters "over-the-line".
Once you've exposed malloc32 to developers, it goes into the code (probably by some search-and-replace on malloc) and never comes out again. Three years down the road, all APIs are 64-bit, but a lot of code still uses malloc32. MVS code is full of 24-bit code for just that reason.
My way allows the same application to become more and more 64-bit with each version without recompile.
This isn't as bad as you make it out to be. The API itself can take the 64-bit pointers (converting them down and doing the 32<->64 mode switch), and rather than being called "malloc32" you can name it something more along the lines of "system_safe_malloc" (possibly one per API, like "AllocateWindow", etc). When you go fully 64-bit you just reimplement "system_safe_malloc" to call the normal "malloc" and everything works file.
Just read this over at Arstechnica. Hannibal interviewed IBM on the 970. The one thing that I thought was interesting was that he said that the chip can be clocked at 3:1, 4:1 and 6:1 bus speed. I assumed it would be possible at speeds other than 2:1 since Apple indicated that the chips will be at 3 GHz by next year. It would be hard to believe that we will see more than a 1 GHz bus in the near future. This leads me to two thoughts/questions.
1. When faster chips come out what will be the tradeoff between clock speed and frontside bus speed? For example would a 2.2 GHz machine with a 733 MHz front side bus be faster than a single 2 GHz machine with a 1 GHz bus? faster than the current 1.8 GHz machine with the 900 MHz bus? Or would Apple skip that speed and go straight to a 2.4 GHz machine? Several comments in this forum discussed their ability to scale linearly with clock speed. I assume this will change when the multipliers change.
2. Would it make sense to make the consumer lines with lower front side bus speeds? That might make them cheaper to build and allow for higher GHz speed of the main chip. (Marketing would love that.) I assume the interface chip generates some heat too and if it is not clocked too high it may help keep the heat down inside the box. This would make sense for portables and iMacs.
Originally posted by Kurt
Shameless bump of this thread. I thought my last post would get at least one comment.
Does anyone even know if the lower speed CPU's are run at lower voltages and, therefore, they are all the same chip? Haven't seen the chip specs.
Maybe you won't be able to overclock them because the software controls the fans and, thus, the cooling system (seems like it would require some type of chip/MB identifier (sp?) to stop OC)
Originally posted by Bigc
Does anyone even know if the lower speed CPU's are run at lower voltages and, therefore, they are all the same chip? Haven't seen the chip specs. . .
I'm pretty certain the Power Mac G5 are all running at normal voltage, which allows for maximum clock rate for a given chip. As I'm sure you know, individual chips are tested and some will run faster than others.
What determines power dissipation is the clock rate and the supply voltage. The slower a chip runs, the lower the power. So that is one way to cut back on the 42 Watts of a 1.8 GHz CPU. Now, once you cut back on the clock rate, you can also reduce the supply voltage a little bit too. In this way you get even lower power dissipation. IBM was giving the example of a 1.8 GHz 970 at 42 Watts. If you run that chip at 1.2 GHz and cut back the voltage it will dissipate 19 Watts. That was a while ago, and the exact figures are likely a little different.
Someone posted information of the fan controllers. Each CPU fan evidently gets chip temperature from the CPU it is cooling. If you did over clock it, the fan would run faster. The question is whether it would run fast enough under highest load.
Originally posted by Kurt
Shameless bump of this thread. I thought my last post would get at least one comment.
Apple has definitely differentiated machines based on bus & memory speeds before. Its very interesting news that a clock multiplier is included in the 970, and it is very very good news that Apple's first machines ship with the fastest possible bus. It tells us that Apple is taking performance seriously in their PowerMac line. The lower end machines that get the G5 might not be as maxed-out, but at least the high end ones are.
Originally posted by snoopy
I'm pretty certain the Power Mac G5 are all running at normal voltage, which allows for maximum clock rate for a given chip. As I'm sure you know, individual chips are tested and some will run faster than others.
What determines power dissipation is the clock rate and the supply voltage. The slower a chip runs, the lower the power. So that is one way to cut back on the 42 Watts of a 1.8 GHz CPU. Now, once you cut back on the clock rate, you can also reduce the supply voltage a little bit too. In this way you get even lower power dissipation. IBM was giving the example of a 1.8 GHz 970 at 42 Watts. If you run that chip at 1.2 GHz and cut back the voltage it will dissipate 19 Watts. That was a while ago, and the exact figures are likely a little different.
Someone posted information of the fan controllers. Each CPU fan evidently gets chip temperature from the CPU it is cooling. If you did over clock it, the fan would run faster. The question is whether it would run fast enough under highest load.
hmmm, I thought you could lower the clock rate (or increase it) of any chip by lowering (increasing) the voltage, as in the MDD 867/1GHz w/the 867 being able to run at 1 GHz by removing a resistor in the voltage path. So if they are the same chip then maybe the same could be done. But I haven't seen any detailed specs for the chip anywhere.
And power consumption on a chip varies with approx. Voltage^2.
Originally posted by Bigc
hmmm, I thought you could lower the clock rate (or increase it) of any chip by lowering (increasing) the voltage, as in the MDD 867/1GHz w/the 867 being able to run at 1 GHz by removing a resistor in the voltage path. So if they are the same chip then maybe the same could be done. But I haven't seen any detailed specs for the chip anywhere.
And power consumption on a chip varies with approx. Voltage^2.
Maybe an expert should comment on this, but a couple items I am fairly sure about. The clock rate is not controlled by the supply voltage. There is a clock circuit of some kind and it might be crystal controlled. The supply voltage will determine the highest frequency a CPU will run, however. When supply voltage is lowered, the CPU's maximum rated speed decreases. So a 1.8 GHz rated chip will only be rated for say 1.2 GHz at its lowest permitted operating voltage.
Also, I do not believe a resistor would be inserted in the supply line to lower the voltage. The supply must be well regulated and not change with current drain, as it would with a voltage dropping resistor in place. If there is a resistor - capacitor decoupling network in the supply line, the resistance would be very low so it does not significantly affect supply voltage. The resistor you speak of might be in the clock circuit.
A good hardware engineer could correct our errors here. I'm pretty sure you are correct that power varies with the square of supply voltage. It likely changes linearly with clock rate.
Originally posted by synp
Once you've exposed malloc32 to developers, it goes into the code (probably by some search-and-replace on malloc) and never comes out again. Three years down the road, all APIs are 64-bit, but a lot of code still uses malloc32. MVS code is full of 24-bit code for just that reason.
My way allows the same application to become more and more 64-bit with each version without recompile.
But the 32-to-64 bit transition is different from all the previous transitions.
There was no clamoring for a longer list of instructions. (4 billion distinct instructions fits no ones definition of RISC
There was no strong demand from general profiling: typical code (say, i++) is far more likely to be <500 somewhere than >4,000,000,000 somewhere.
The only real pressure was from a desire for more physical memory. And that is from a pretty small subset of 'all' applications.
So, why would Apple ever want to _remove_ malloc32? Take iCal. The only reason it would ever be forcibly made a pure 64-bit app with pure 64-bit libraries etc is if the hardware no longer supported a 32-bit mode, _and_ running it through the software support was a huge penalty.
I don't see that happening. It doesn't seem like the '640k is all someone would ever need' comments for a couple of reasons:
1) A full 64-bits of memory is one heck of a lot of memory. Even extrapolating from current PC RAM-maxes it'll be decades++ before anyone even comes close.
2) There's some downside to every increase in 'CPU-bitness'. Bandwidth, program size... On all of the previous transitions there's been far more 'upside' potential. Enough that the 'problems' it brought were well worth it. Here the fraction of programs that _need_ the increase is very small.
So it may well be worth stick at the 32-to-64 bit transition for quite a long time. And I really, really like Apple's approach (AFAICT). There won't be any reason for iCal to chew up twice as much memory as it needs (filled entirely with (0x0000)), nor to chew up twice as much bandwidth as it needs (passing (0x0000) around)- it can run as a 32-bit app with nigh-unto-zero penalty. And Oracle, a 64-bit app that needs more-than-4GB-RAM can run at exactly zero penalty.
So unless it is a major PITA to keep the CPU able to run at both 32-bits _and_ 64-bits, I'd stay right here for a long time. (And it really shouldn't be a big deal for future PPCs, its pretty easy to just do the 64-bit calc and just drop the top 32-bits-worth & fix flags. Hmm, rotate's sneakier)
Originally posted by Nevyn
So it may well be worth stick(ing) at the 32-to-64 bit transition for quite a long time. . .
So unless it is a major PITA to keep the CPU able to run at both 32-bits _and_ 64-bits, I'd stay right here for a long time. . .
From everything I've read, the PPC architecture is designed to run both 32 and 64-bit applications basically forever. So, there is no transition to a 64-bit only OS. We are jumping right to an OS that runs both 64 and 32-bit applications. The brief transition, which should last a year or two, is to allow all the code in OS X to catch up with its mission of running both 64 and 32-bit applications. That is what this bridge hardware in the 970 is all about. It will eventually go away.
Originally posted by Nevyn
So, why would Apple ever want to _remove_ malloc32? Take iCal. The only reason it would ever be forcibly made a pure 64-bit app with pure 64-bit libraries etc is if the hardware no longer supported a 32-bit mode, _and_ running it through the software support was a huge penalty.
Judging by the fact that IBM's current mainframe processors support all three modes (24, 31 and 64 bits), I agree that they will probably keep the 32-bit compatability for a long long time.
Thinking it over, I don't think there really is a need for malloc32. Just keep malloc 32-bit forever. People like Yevgeny who need 10Gb chunks of memory (of a lot of small chunks of memory) will use the new function "malloc64". That way, old programs needn't be modified. New programs will use malloc64 just for large chunks which are not passed to system APIs.
For C++, how about this:
<!-- php buffer start --><code><span style="color: #000000">
<span style="color: #0000BB">int </span><span style="color: #007700">*</span><span style="color: #0000BB">pint </span><span style="color: #007700">= </span><span style="color: #0000BB">new64 int</span><span style="color: #007700">;
<br /></span><span style="color: #0000BB"></span>
</span>