do you have an idea how much overhead is involved in doing _pure_ 64-bit computation on a modern 32-bit processor such as AMDs and Intels? I'd be interested in hearing how badly the theoretical "twice as fast" 32-bit processor loses to a 64-bit processor on the 64-bit processor's homeground.
do you have an idea how much overhead is involved in doing _pure_ 64-bit computation on a modern 32-bit processor such as AMDs and Intels? I'd be interested in hearing how badly the theoretical "twice as fast" 32-bit processor loses to a 64-bit processor on the 64-bit processor's homeground.
First of all, the cost of being a 64-bit processor doesn't come close to the cost to double processor performance. I don't think the 970 really gives up much to be 64-bit -- 10%, maybe.
Theoretically: If all you were doing was 100% 64-bit integer math, then I'd guess a 32-bit processor would need roughly 3 times as many instructions and twice as many registers. Depending on the nature of the math and how much you were doing this could be more than 3 times slower because of data dependencies between instructions causing stalls. On the other hand the extra math could be hidden behind other stalls so in some cases it might not cost anything extra at all. There is really no way to predict without looking at the algorithm in question.
In reality 64-bit integer calculations are relatively rare and thus will have almost no measureable impact on most application's performance. Any applications working on numbers that big should probably consider using double precision floating point instead (as always there are perfectly reasonable exceptions, but they are exceptional).
Pointer arithmetic is probably a bigger factor and if the same app was using 64-bit pointers instead of 32-bit pointers (and didn't need >4 GB of memory) then the 64-bit version would likely be a percent or two slower. Possibly more in pointer heavy code, possibly less in pointer light code. Not a huge difference either way.
I've said it before and I'll say it again: the big difference between a 64-bit and a 32-bit machine (of the same generation) is the memory addressability and the new capabilities it provides, even if you don't have >4 GB of memory. Of course you need a 64-bit OS first. Since nobody builds processors which are identical except for the 32-bit vs. 64-bit capability there is no way (and no point) in measuring the exact "cost" of the extra capabilities.
In fact, the big boosts the 970 has to its number crunching capabilities is the native, full-precision square root instruction and a second floating point unit (the G4 has one). Neither of these features requires 64 bit anything.
The fact that the 970 is 64 bit is mostly a reflection of IBM's own interest in the CPU for boxes much bigger than Apple will build running "real" operating systems like AIX. It has had other benefits, but given that Apple itself was apparently shocked to its toenails by the Va Tech project, it's clear that they didn't anticipate some of the big ones.
I really don't think it'll take that long for the professional space to find uses for the extra precision, though, especially in FP. The advantages of working at a higher precision than you'll eventually output to are universally known at this point, and there are a lot of people interested in the accuracy afforded by representing color (for example) as a 64 bit float. (Not that they couldn't before, but this is something that can go mainstream now that the dominant creative platform can work with 64 bit floats quickly).
In fact, the big boosts the 970 has to its number crunching capabilities is the native, full-precision square root instruction and a second floating point unit (the G4 has one). Neither of these features requires 64 bit anything.
There are a couple of less obvious bottlenecks that the G4 had in its floating point unit which are now gone in the G5, and they are responsible for a fair bit of the improvement in performance. The square root is nice, but not universally applicable to all FPU code... having 2 FPUs with far fewer bottlenecks is the big deal.
The square root is nice, but not universally applicable to all FPU code... having 2 FPUs with far fewer bottlenecks is the big deal.
For the record, I didn't mean to imply that the square root was useful generally; but for the not insignificant code base that does use it it's a big boost. Pre-970, code had to choose between fast results and accurate ones.
The 32 bit vs. 64 bit issue is interesting, as we haven't heard a great deal from developers as to exactly how/why this will improve applications.
The biggest single marketable issue (as mentioned above by Programmer) is the huge increase in the maximum size of addressable memory. 4GB of RAM still seems large for most personal use, but academic and profession applications have no problem coming up with the need for more.
On the PC side of the fence, a recent analogy would be the limits of the old FAT16 file system (~2GB) for hard drive storage, which meant that alternatives (such as FAT32 and NTFS) were needed.
So, given that I was wrong (according to programmer) in assuming that apps would, in general, gravitate towards being 64 bit, and given that subsequent posters seemed to have a hard time in pinpointing a real 64 bit advantage, what was the point in making the PPC 970 a 64 bit proc?
Was it solely because that was what IBM wanted for their server line?
Will OS X ever migrate to being a 64 bit OS?
I guess I'm having some trouble in understanding why so much effort was expended for so little (apparent) gain.
So, given that I was wrong (according to programmer) in assuming that apps would, in general, gravitate towards being 64 bit, and given that subsequent posters seemed to have a hard time in pinpointing a real 64 bit advantage, what was the point in making the PPC 970 a 64 bit proc?
Was it solely because that was what IBM wanted for their server line?
Will OS X ever migrate to being a 64 bit OS?
I guess I'm having some trouble in understanding why so much effort was expended for so little (apparent) gain.
Most of the effort in building the 970 wasn't to make it 64-bit, that was just a feature IBM built into it for their customers who need it. Most of the effort went into making it fast. The PPC620 and 630 were 64-bit back in 96-97, but they weren't any faster than the 604/604e. With the 970, IBM didn't bother making a 32-bit version so Apple has no choice but to use the 64-bit version. Fortunately the PPC architecture makes that easy, and MacOS X will enable Apple to support the 64-bit memory addressing features relatively soon.
Don't get me wrong -- 64-bit addressing is very cool and I'd like to write software that takes advantage of it when MacOS X supports it, but your average mainstream piece of software doesn't need it. That's not to say that some of Apple's users aren't going to benefit from it greatly. Apple is likely to get some new users because of it as well (especially in the scientific supercomputing crowd).
64-bit isn't the reason that the 970 is fast, however.
Well, I guess I see the difference between using 64 bit numbers in calculations (unlikely according to previous posts) and addressing 64 bit space (more likely?)
Would it then be true to modify my earlier post to something along the lines of:
for now.....most apps don't need to address 64 bit memory space, but as they become even bigger, many more will. Software companies will move in this direction simply because (a) thye can and (b) marketing will require it, to make 64 bit purchases seem justified.
Regarding the 4 GB memory limit. Log scale is deciving we are very close is we now are at the 512 MB RAM range. That is only 8 times or as when I went from 4 MB to 32 MB about 10 years ago.
There is no sign of the RAM race slowing down. But is it really 32 bit adressing is it not 36 or was it 48 for memory space adressing?
Would it then be true to modify my earlier post to something along the lines of:
for now.....most apps don't need to address 64 bit memory space, but as they become even bigger, many more will. Software companies will move in this direction simply because (a) thye can and (b) marketing will require it, to make 64 bit purchases seem justified.
or not?
If we are talking about a 1000 year time frame, then maybe. Once portable computers become fast enough to rival the human brain and enslave mankind I can imagine most widely used apps requiring 64-bit memory addressing.
[Edit] To put things in perspective, the human brain is believed to have around 100 billion neurons. With 64-bit addressing, a computer trying to simulate a human brain would be able to allocate up to 180 million bytes of memory for each single neuron. If each neuron only requires 1 million bytes of code and data to simulate, the computer would be able to simulate 180 complete human brains simultaneously.
There is no sign of the RAM race slowing down. But is it really 32 bit adressing is it not 36 or was it 48 for memory space adressing?
The 32-bit processors can have a 4 GB linear address space. Moto's G4 has support for 36-bit physical addresses, but Apple doesn't support it in their OS (yet, and probably never). If they did then the total physical RAM you could have in your machine would be 64 GB, but a single application could still only see 4 GB.
The 970 can have a 64-bit address space which gives a 16 billion GB linear address space. Its bus can currently only generate 42 bit addresses, however, which means it is "limited" to 4096 GB of physical memory. Changing what the hardware can handle is much easier than changing what all the software can handle, however, so future 64-bit PowerPCs could easily increase the physical addressibility limit and every bit added doubles the physical RAM limit.
If we are talking about a 1000 year time frame, then maybe. Once portable computers become fast enough to rival the human brain and enslave mankind I can imagine most widely used apps requiring 64-bit memory addressing.
[Edit] To put things in perspective, the human brain is believed to have around 100 billion neurons. With 64-bit addressing, a computer trying to simulate a human brain would be able to allocate up to 180 million bytes of memory for each single neuron. If each neuron only requires 1 million bytes of code and data to simulate, the computer would be able to simulate 180 complete human brains simultaneously.
Unlikely to happen. Useful memory spaces may soon grow beyond the 32 bit limit. It will be a long time before they grow to require a full 64 bits.
Simply accessing every byte once in a 64 bit memory space using 1 ns RAM would require something like 700 years (if my arithmetic is right).
If, however, you have 100 billion or so simple computers working in parallel then a huge memory might be practical.
Simply accessing every byte once in a 64 bit memory space using 1 ns RAM would require something like 700 years (if my arithmetic is right).
Are you taking into account wide memory buses that access many thousands of bytes per cycle? For example, if the memory bus is 1 megabit wide, it would only take 39 hours to go over the entire 64-bit address space with 1 ns RAM. It is also likely computers of the future will use more than 2 logic levels per bit. That would reduce the 39 hour figure by a significant amount.
Also, it is belived that most neurons in the brain aren't being actively used at any given time, so it wouldn't be necessary to iterate over every neuron in a simulation all the time.
I think enslavement of mankind by machines is quite feasible.
actualy ur right about the timing. Every neuron runs on its own clock. It would actualy be ideal to have a million or so chips linked together like IBM's little doo-dad except on some serious steroids. Like 100k of them.
I would like to take a second to thank Programmer for his posts. It certainly helps to have someone to explain these complex things in such a cogent and elucidative way. (And thanks for having patience with a few of us "non-programmers")
I would like to take another second to thank programmer for every time I have used one of his arguments to smite some brainiac who has no idea what he is talking about.
Where is Intel on SOI ? Are they doing SOI in the current offerings?
Intel does not currently implement any form of SOI in their processors. They have successfully demonstrated SOI processes in their research labs. From their published papers their position is that the current generation of SOI devices, known as partially depleted SOI does not provide sufficient advantage when scaling under 100nm to justify the added processing complexity and cost. They have expressed interest in FDSOI (fully depleted SOI) in the future when that technology matures enough to become manufacturable.
The difference between PDSOI and FDSOI is really a matter of how thin you can make the layer of silicon on top of the insulator. Ideally you would want it to be fully depleted where the space charge region formed around the source and drain would reach all the way down to the insulating layer. This would provide dramatically lower junction capacitances and reduce power/speed up designs even more.
Intel does not currently implement any form of SOI in their processors. They have successfully demonstrated SOI processes in their research labs. From their published papers their position is that the current generation of SOI devices, known as partially depleted SOI does not provide sufficient advantage when scaling under 100nm to justify the added processing complexity and cost. They have expressed interest in FDSOI (fully depleted SOI) in the future when that technology matures enough to become manufacturable.
The difference between PDSOI and FDSOI is really a matter of how thin you can make the layer of silicon on top of the insulator. Ideally you would want it to be fully depleted where the space charge region formed around the source and drain would reach all the way down to the insulating layer. This would provide dramatically lower junction capacitances and reduce power/speed up designs even more.
Comments
do you have an idea how much overhead is involved in doing _pure_ 64-bit computation on a modern 32-bit processor such as AMDs and Intels? I'd be interested in hearing how badly the theoretical "twice as fast" 32-bit processor loses to a 64-bit processor on the 64-bit processor's homeground.
Originally posted by Gon
Hey Programmer..
do you have an idea how much overhead is involved in doing _pure_ 64-bit computation on a modern 32-bit processor such as AMDs and Intels? I'd be interested in hearing how badly the theoretical "twice as fast" 32-bit processor loses to a 64-bit processor on the 64-bit processor's homeground.
First of all, the cost of being a 64-bit processor doesn't come close to the cost to double processor performance. I don't think the 970 really gives up much to be 64-bit -- 10%, maybe.
Theoretically: If all you were doing was 100% 64-bit integer math, then I'd guess a 32-bit processor would need roughly 3 times as many instructions and twice as many registers. Depending on the nature of the math and how much you were doing this could be more than 3 times slower because of data dependencies between instructions causing stalls. On the other hand the extra math could be hidden behind other stalls so in some cases it might not cost anything extra at all. There is really no way to predict without looking at the algorithm in question.
In reality 64-bit integer calculations are relatively rare and thus will have almost no measureable impact on most application's performance. Any applications working on numbers that big should probably consider using double precision floating point instead (as always there are perfectly reasonable exceptions, but they are exceptional).
Pointer arithmetic is probably a bigger factor and if the same app was using 64-bit pointers instead of 32-bit pointers (and didn't need >4 GB of memory) then the 64-bit version would likely be a percent or two slower. Possibly more in pointer heavy code, possibly less in pointer light code. Not a huge difference either way.
I've said it before and I'll say it again: the big difference between a 64-bit and a 32-bit machine (of the same generation) is the memory addressability and the new capabilities it provides, even if you don't have >4 GB of memory. Of course you need a 64-bit OS first. Since nobody builds processors which are identical except for the 32-bit vs. 64-bit capability there is no way (and no point) in measuring the exact "cost" of the extra capabilities.
The fact that the 970 is 64 bit is mostly a reflection of IBM's own interest in the CPU for boxes much bigger than Apple will build running "real" operating systems like AIX. It has had other benefits, but given that Apple itself was apparently shocked to its toenails by the Va Tech project, it's clear that they didn't anticipate some of the big ones.
I really don't think it'll take that long for the professional space to find uses for the extra precision, though, especially in FP. The advantages of working at a higher precision than you'll eventually output to are universally known at this point, and there are a lot of people interested in the accuracy afforded by representing color (for example) as a 64 bit float. (Not that they couldn't before, but this is something that can go mainstream now that the dominant creative platform can work with 64 bit floats quickly).
Originally posted by Amorph
In fact, the big boosts the 970 has to its number crunching capabilities is the native, full-precision square root instruction and a second floating point unit (the G4 has one). Neither of these features requires 64 bit anything.
There are a couple of less obvious bottlenecks that the G4 had in its floating point unit which are now gone in the G5, and they are responsible for a fair bit of the improvement in performance. The square root is nice, but not universally applicable to all FPU code... having 2 FPUs with far fewer bottlenecks is the big deal.
Originally posted by Programmer
The square root is nice, but not universally applicable to all FPU code... having 2 FPUs with far fewer bottlenecks is the big deal.
For the record, I didn't mean to imply that the square root was useful generally; but for the not insignificant code base that does use it it's a big boost. Pre-970, code had to choose between fast results and accurate ones.
The biggest single marketable issue (as mentioned above by Programmer) is the huge increase in the maximum size of addressable memory. 4GB of RAM still seems large for most personal use, but academic and profession applications have no problem coming up with the need for more.
On the PC side of the fence, a recent analogy would be the limits of the old FAT16 file system (~2GB) for hard drive storage, which meant that alternatives (such as FAT32 and NTFS) were needed.
Was it solely because that was what IBM wanted for their server line?
Will OS X ever migrate to being a 64 bit OS?
I guess I'm having some trouble in understanding why so much effort was expended for so little (apparent) gain.
Originally posted by jouster
So, given that I was wrong (according to programmer) in assuming that apps would, in general, gravitate towards being 64 bit, and given that subsequent posters seemed to have a hard time in pinpointing a real 64 bit advantage, what was the point in making the PPC 970 a 64 bit proc?
Was it solely because that was what IBM wanted for their server line?
Will OS X ever migrate to being a 64 bit OS?
I guess I'm having some trouble in understanding why so much effort was expended for so little (apparent) gain.
Most of the effort in building the 970 wasn't to make it 64-bit, that was just a feature IBM built into it for their customers who need it. Most of the effort went into making it fast. The PPC620 and 630 were 64-bit back in 96-97, but they weren't any faster than the 604/604e. With the 970, IBM didn't bother making a 32-bit version so Apple has no choice but to use the 64-bit version. Fortunately the PPC architecture makes that easy, and MacOS X will enable Apple to support the 64-bit memory addressing features relatively soon.
Don't get me wrong -- 64-bit addressing is very cool and I'd like to write software that takes advantage of it when MacOS X supports it, but your average mainstream piece of software doesn't need it. That's not to say that some of Apple's users aren't going to benefit from it greatly. Apple is likely to get some new users because of it as well (especially in the scientific supercomputing crowd).
64-bit isn't the reason that the 970 is fast, however.
Would it then be true to modify my earlier post to something along the lines of:
for now.....most apps don't need to address 64 bit memory space, but as they become even bigger, many more will. Software companies will move in this direction simply because (a) thye can and (b) marketing will require it, to make 64 bit purchases seem justified.
or not?
There is no sign of the RAM race slowing down. But is it really 32 bit adressing is it not 36 or was it 48 for memory space adressing?
Originally posted by jouster
Would it then be true to modify my earlier post to something along the lines of:
for now.....most apps don't need to address 64 bit memory space, but as they become even bigger, many more will. Software companies will move in this direction simply because (a) thye can and (b) marketing will require it, to make 64 bit purchases seem justified.
or not?
If we are talking about a 1000 year time frame, then maybe. Once portable computers become fast enough to rival the human brain and enslave mankind I can imagine most widely used apps requiring 64-bit memory addressing.
[Edit] To put things in perspective, the human brain is believed to have around 100 billion neurons. With 64-bit addressing, a computer trying to simulate a human brain would be able to allocate up to 180 million bytes of memory for each single neuron. If each neuron only requires 1 million bytes of code and data to simulate, the computer would be able to simulate 180 complete human brains simultaneously.
Originally posted by DrBoar
There is no sign of the RAM race slowing down. But is it really 32 bit adressing is it not 36 or was it 48 for memory space adressing?
The 32-bit processors can have a 4 GB linear address space. Moto's G4 has support for 36-bit physical addresses, but Apple doesn't support it in their OS (yet, and probably never). If they did then the total physical RAM you could have in your machine would be 64 GB, but a single application could still only see 4 GB.
The 970 can have a 64-bit address space which gives a 16 billion GB linear address space. Its bus can currently only generate 42 bit addresses, however, which means it is "limited" to 4096 GB of physical memory. Changing what the hardware can handle is much easier than changing what all the software can handle, however, so future 64-bit PowerPCs could easily increase the physical addressibility limit and every bit added doubles the physical RAM limit.
Where is Intel on SOI ? Are they doing SOI in the current offerings?
Originally posted by Tidris
If we are talking about a 1000 year time frame, then maybe. Once portable computers become fast enough to rival the human brain and enslave mankind I can imagine most widely used apps requiring 64-bit memory addressing.
[Edit] To put things in perspective, the human brain is believed to have around 100 billion neurons. With 64-bit addressing, a computer trying to simulate a human brain would be able to allocate up to 180 million bytes of memory for each single neuron. If each neuron only requires 1 million bytes of code and data to simulate, the computer would be able to simulate 180 complete human brains simultaneously.
Unlikely to happen. Useful memory spaces may soon grow beyond the 32 bit limit. It will be a long time before they grow to require a full 64 bits.
Simply accessing every byte once in a 64 bit memory space using 1 ns RAM would require something like 700 years (if my arithmetic is right).
If, however, you have 100 billion or so simple computers working in parallel then a huge memory might be practical.
Originally posted by neutrino23
Simply accessing every byte once in a 64 bit memory space using 1 ns RAM would require something like 700 years (if my arithmetic is right).
Are you taking into account wide memory buses that access many thousands of bytes per cycle? For example, if the memory bus is 1 megabit wide, it would only take 39 hours to go over the entire 64-bit address space with 1 ns RAM. It is also likely computers of the future will use more than 2 logic levels per bit. That would reduce the 39 hour figure by a significant amount.
Also, it is belived that most neurons in the brain aren't being actively used at any given time, so it wouldn't be necessary to iterate over every neuron in a simulation all the time.
I think enslavement of mankind by machines is quite feasible.
8)
Your arguments make me sound smart! Thank you!
Originally posted by Jubelum
A (perhaps silly) question.
Where is Intel on SOI ? Are they doing SOI in the current offerings?
Intel does not currently implement any form of SOI in their processors. They have successfully demonstrated SOI processes in their research labs. From their published papers their position is that the current generation of SOI devices, known as partially depleted SOI does not provide sufficient advantage when scaling under 100nm to justify the added processing complexity and cost. They have expressed interest in FDSOI (fully depleted SOI) in the future when that technology matures enough to become manufacturable.
The difference between PDSOI and FDSOI is really a matter of how thin you can make the layer of silicon on top of the insulator. Ideally you would want it to be fully depleted where the space charge region formed around the source and drain would reach all the way down to the insulating layer. This would provide dramatically lower junction capacitances and reduce power/speed up designs even more.
Originally posted by Eskimo
Intel does not currently implement any form of SOI in their processors. They have successfully demonstrated SOI processes in their research labs. From their published papers their position is that the current generation of SOI devices, known as partially depleted SOI does not provide sufficient advantage when scaling under 100nm to justify the added processing complexity and cost. They have expressed interest in FDSOI (fully depleted SOI) in the future when that technology matures enough to become manufacturable.
The difference between PDSOI and FDSOI is really a matter of how thin you can make the layer of silicon on top of the insulator. Ideally you would want it to be fully depleted where the space charge region formed around the source and drain would reach all the way down to the insulating layer. This would provide dramatically lower junction capacitances and reduce power/speed up designs even more.